All recent builds failing with "signal: illegal instruction"

Some time between roughly a month ago (so ~mid Sept 2017) and this past week (~mid Oct 2017), my CircleCI 2.0 builds started failing with new error:

signal: illegal instruction

(sample build: https://circleci.com/gh/drausin/libri/723). The error is consistent. I believe I have narrowed the issue down to something involved in the statically-built RocksDB lib that I’m using via cgo in my Golang application. Build 723 linked above has exactly the same git commit as build 719 (https://circleci.com/gh/drausin/libri/719), which ran ~month ago and passed.

I can’t replicate the error running the same tests within the build container defined in my .circleci/config.yml, but I can replicate it by SSH’ing into a CircleCI container and running the tests manually.

I know you guys are migrating some of the underlying machine types/architectures (hence the advice to add {{ arch }} to our cache keys, which I also tried w/ no avail), so I’m wondering if the problem I’m seeing could be related to some sort of leak b/t the underlying CI machine and the Docker build container my CI runs in.

FWIW, I managed to hack/“fix” this issue temporarily by just re-building the RocksDB lib in my CI script (see https://circleci.com/gh/drausin/libri/736). I’ve confirmed that this binary (/usr/local/lib/librocksdb.a) is different if I’m building from within the build container running on my local (OSX) machine vs. building from within the same build container running in CircleCI. This fact violates one of the fundamental assumptions I’ve had about Docker containers, but maybe I’m missing something obvious.

Thanks for sharing this info. Since Docker uses the underlying host’s kernel, compiling on machines with different CPU capabilities can result in different binaries.

As noted in this post: https://discuss.circleci.com/t/use-the-arch-cache-template-key-if-you-rely-on-cached-compiled-binary-dependencies/16129/2

It’s restricted to specific compiled assets, namely ones that are compiled to use Intel Xeon E5-2666 v3 (Haswell) specific cpu instruction sets that aren’t available in Xeon E5-2680 v2 (Ivy Bridge)

The ones we noticed in testing are security related libraries used for hashing and whose optimization level made them use the newer cpu hasing/crypto instructions. Libraries such as nokogiri did not present a problem.

It looks like RocksDB also takes advantage of new instruction sets.

1 Like

Ah, good to know! I hadn’t thought about the different kernels making a difference, but that makes sense.