Use the `arch` cache template key if you rely on cached compiled binary dependencies


#1

In preparation for supporting more architecture types, we’ve introduced a new cache template key {{ arch }}. This is documented here: https://circleci.com/docs/2.0/caching/#using-keys-and-templates

This is useful if you rely on cached binary dependencies that you’ve compiled as part of the build process. Such dependencies can have specific optimizations based on the CPU architecture they were compiled on and will cause your job to fail on a different architecture.

To prevent your builds breaking if you meet the criteria described above, please add the {{ arch }} template key to your cache strategy. This will cause the cache to be invalidated when we detect a different architecture.

.circleci/config.yml example:

    steps:
      - checkout
      - restore_cache:
          keys:
            - gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
      - save_cache:
          key: gem-cache-{{ arch }}-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
          paths:
            - vendor/bundle

Notification for customs compiling and caching binary dependencies on CircleCI 2.0
#2

Some more information on this:

We expect the impact of this change to be limited. it’s restricted to specific compiled assets, namely ones that are compiled to use Intel Xeon E5-2666 v3 (Haswell) specific cpu instruction sets that aren’t available in Xeon E5-2680 v2 (Ivy Bridge)

The ones we noticed in testing are security related libraries used for hashing and whose optimization level made them use the newer cpu hasing/crypto instructions. Libraries such as nokogiri did not present a problem.

Users impacted will see “Illegal instruction” error and typically an application panic/crash

/home/circleci/org/vendor/bundle/ruby/2.3.0/gems/cityhash-0.8.1/lib/cityhash.rb:17: [BUG] Illegal instruction at 0x007f98aa38e6bb
ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0156 p:---- s:0633 e:000632 CFUNC  :hash64
c:0155 p:0040 s:0629 e:000628 METHOD /home/circleci/org/vendor/bundle/ruby/2.3.0/gems/cityhash-0.8.1/lib/cityhash.rb:17
c:0154 p:0016 s:0623 e:000622 METHOD /home/circleci/org/vendor/bundle/ruby/2.3.0/gems/identity_cache-0.4.1/lib/identity_cache/cache_hash.rb:26
c:0153 p:0062 s:0619 e:000618 METHOD /home/circleci/org/vendor/bundle/ruby/2.3.0/gems/identity_cache-0.4.1/lib/identity_cache/cache_key_generation.rb:23

Of ruby projects we tested, cityhash was the primary ruby library that came up.


Notification for customs compiling and caching binary dependencies on CircleCI 2.0
All recent builds failing with "signal: illegal instruction"
#3

We’re transitioning the 2.0 fleet over to different machines. During this time, our fleet had a mix of 2 CPU types (different family/model, but same architecture). At least one user ran into an issue where a workflow with a dependencies job ran on a different CPU architecture than a downstream job. While the jobs did properly use the {{ arch }} key for the caches, this resulted in the downstream job finding no cache and failing due to missing dependencies. This is a temporary problem and won’t be an issue once we’ve finished running jobs on the old CPU type and removed them from the 2.0 fleet.

After chatting with some of our engineers, I think the most resilient workflow configuration will check dependencies in each job downstream of its dependency job(s). This should use each tool’s own checker like yarn check or bundle check, as opposed to simply checking for the existence of ~/node_modules or vendor/bundle.

I can think of 3 scenarios that happen with a downstream job:

  1. Cache hit. This is the ideal case. The job doesn’t need to fetch dependencies.
  2. Partial cache hit. The job has some dependencies met, but not all.
  3. Cache miss. The job can’t find dependencies that an upstream job should have fetched. (I’ve seen this be a case of user error, where a workflow had a dependencies job that wasn’t required in the workflow by a “downstream” job. The “downstream” job started first.)

A simple check for the existence of ~/node_modules would be fast and satisfy situations 1 and 3, but fail on situation 2. To write jobs/workflows that are resilient to architecture changes, each job has to perform a check that all its dependencies are satisfied. Our engineers don’t plan on making these sorts of architecture changes often–currently they don’t have another one planned. Still, this approach to writing your config is the most resilient.

Since dependency checks can take some time to run, your team may want to ignore the dependency checks in downstream jobs. I’m presenting this information above so you can make an informed decision. It can save time in the typical case, but it will come at the cost of debugging time in situations like these.


#4