CircleCI 2.0 Illegal Instruction errors (AVX2/BMI2 Instruction Sets Introduced in 2013-)


#1

We are in the process of setting up our large project workflow in CircleCI 2.0, and we ran across an intermittent issue where some of our build binaries return an “Illegal Instruction” error.

Logging into the containers and debugging, reveals that the errors are due to the underlying CircleCI 2.0 server hardware not supporting AVX2 and BMI2 instruction sets introduced by Intel in Q2 2013 as part of their Haswell microarchitecture, and present (and expanded) in all Intel servers since then.

The errors seen have nothing to do with Caching or Workspace persistence but rather the CircleCI 2.0 server being chosen to build/execute our code not supporting these instruction sets from 2013.

Here are the errors seen:

  1. Support for MULX missing (part of the BMI2 instruction set introduced in Haswell in 2013):
  • thread #1, name = ‘run_indexer_tes’, stop reason = signal SIGILL: illegal instruction operand
    frame #0: 0x0000000001589873 run_indexer_testsrocksdb::HistogramBucketMapper::HistogramBucketMapper() + 355 run_indexer_testsrocksdb::HistogramBucketMapper::HistogramBucketMapper:
    -> 0x1589873 <+355>: mulxq %rdi, %rsi, %rax
    0x1589878 <+360>: shrq $0x3, %rax
    0x158987c <+364>: addq %rcx, %rcx
    0x158987f <+367>: leaq (%rcx,%rcx,4), %rcx
  1. Support for VINSERTI128 missing (introduced as part of the AVX2 instruction set with Haswell in 2013):
  • thread #1, name = ‘run_indexlet_te’, stop reason = signal SIGILL: illegal instruction operand
    frame #0: 0x00000000009aea00 run_indexlet_tests__fastpackwithoutmask9(unsigned int const*, unsigned int*) + 304 run_indexlet_tests__fastpackwithoutmask9:
    -> 0x9aea00 <+304>: vinserti128 $0x1, %xmm4, %ymm5, %ymm4
    0x9aea06 <+310>: vinserti128 $0x1, %xmm2, %ymm3, %ymm2
    0x9aea0c <+316>: vpsllvd -0x6b3b55(%rip), %ymm4, %ymm3
    0x9aea15 <+325>: vpor %ymm2, %ymm3, %ymm2

  • thread #1, name = ‘inferencenode_t’, stop reason = signal SIGILL: illegal instruction operand
    frame #0: 0x0000000000974c18 inferencenode_test__fastpackwithoutmask14(unsigned int const*, unsigned int*) + 136 inferencenode_test__fastpackwithoutmask14:
    -> 0x974c18 <+136>: vinserti128 $0x1, %xmm2, %ymm3, %ymm2
    0x974c1e <+142>: vmovd (%rdi), %xmm3 ; xmm3 = mem[0],zero,zero,zero
    0x974c22 <+146>: vpinsrd $0x1, %ecx, %xmm3, %xmm3
    0x974c28 <+152>: vpinsrd $0x2, %eax, %xmm3, %xmm3

We have tried to run our binaries on CircleCI all within the same job (a workflow with a single job/container), and it will still fail with the illegal instruction error randomly (I guess depending if the underlying server is pre-2014 or not).

Ideally, the ability to select an {{ arch }} for the entire workflow would solve the incompatibility issue, as long as the {{ arch }} includes the architectures over the past 5 years at least (Haswell, Broadwell, Skylake, Kaby Lake, Coffee Lake? :wink: )

This is impacting our ability to test our product in CircleCI 2.0. Any assistance would be greatly appreciated!


#2

I had exactly the same error with avx2 and noticed that it was failing even when the avx2 processor flag is present in /proc/cpuinfo


#3

Hi Giacomo,

It is possible that your compiler needs to be upgraded. For example, GCC only started supporting AVX2 in v4.7. Here’s a reference link: https://gcc.gnu.org/gcc-4.7/changes.html

If you are using LLVM, support was added in LLVM 3.1 with additional optimizations in LLVM 3.2 (reference link: https://releases.llvm.org/3.1/docs/ReleaseNotes.html)

In my case, proc/cpuinfo showed that CircleCI 2.0’s build server did not have the AVX2 nor the BMI2 instruction sets (correlation between the CPU type, Xeon E5-2680, and AWS instance types, reveals the CircleCI servers were most likely c3 generation servers)


#4

Thank you,

what I mean though is that I build a binary with avx2 support, then move the binary on the machine and get illegal instruction even if the machine supports avx2


#5

I assume you’re on the Docker executor, @rcfaria01. I think Machine is closer to bare metal, and I wonder if that would fix this? It’s a free option, for now at least. AFAICR, Circle is based on AWS, and their CPUs would be bang up to date.


#6

Hi Jon. Indeed, I have not tried that option, and it is certainly a good suggestion. Thanks!

What still concerns me, though, is that the AWS servers that get allocated by CircleCI for my builds are always c3 servers (3rd generation servers, based on the CPU type in /proc/cpuinfo). However, AWS only introduced AVX2 support with instances from c4 onwards (and AVX512 with c5, last year).

AVX2 support per server type is provided here: https://aws.amazon.com/ec2/instance-types/#instance-type-matrix

My concern is that even if I have docker machine, or even bare-metal server access, if the CPU (Xeon 2650) doesn’t support AVX2, it will not properly execute the test binaries.


#7

Righto. I wonder if CircleCI would need some sort of allocation system to pick server types from the farm based on CPU/hardware requirements. I don’t know what types they have in their farm at present. Could that perhaps be logged as an idea?


#8

Hi @giacomodabisias Giacomo, I think that one possibility for your particular scenario is that the CircleCI AMI on which your binary is placed is running in paravirtual mode (pure or hybrid pv on hvm) . If this is the case, AVX2 will not work properly.

Also, you can try checking which instruction sets the server you are running the binary on supports by executing this command:
gcc -O2 -march=native -E -v - </dev/null 2>&1 | grep cc1

You should get an output similar to this:
/usr/lib/gcc/x86_64-linux-gnu/5/cc1 -E -quiet -v -imultiarch x86_64-linux-gnu - -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=6144 -mtune=haswell -O2 -fstack-protector-strong -Wformat -Wformat-security

if you see something like -mno-avx2 (key is the no part) then it will not run.


#9

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.