Checksums doesn't match

We’ve a problem with one of our workflows.

The first step is checkout code and cached with key src-v1-{{ .Environment.CIRCLE_SHA1 }}. This is all fine, every build step after will be able to use this cache.

But, we also bundle our dependencies with yarn and stores the node_modules directory with key yarn-v1-{{ checksum "yarn.lock" }}.

This is where the freaky stuff starts. After the build step we branch into two different job, lint and test. Both these jobs will try to mount the cache for src-v1-somechecksum and yarn-v1-someotherchecksum. But the cachekey for yarn is not the same as the one we stored previously.

If we store the yarn with checksum abd132 in the jobs after it tries to read it from def456.

And even when I SSH into the server and do a sha1sum yarn.lock and compare between our different buildsteps, they don’t match.

Just diffed the to yarn.lock files

diff bundle_dependencies.yarn.lock lint.yarn.lock
                                                                                                                 
5903,5906d5902
< mime-db@~1.30.0:
<   version "1.30.0"
<   resolved "https://registry.yarnpkg.com/mime-db/-/mime-db-1.30.0.tgz#74c643da2dd9d6a45399963465b26d5ca7d71f01"
<
5911c5907
< mime-types@^2.0.7, mime-types@~2.1.15, mime-types@~2.1.18:
---
> mime-types@^2.0.7, mime-types@~2.1.18:

I’m trying to understand the significance of the diff in the context of your original question. There’s a difference between two files, so they will produce a different checksum, right?

Are you asking why the files bundle_dependencies.yarn.lock and lint.yarn.lock are different?

No I wonder why step1 stores a cache with key abc123 and the following step reads (what is suppose to be the same cache) from def123

In this case , it will only read the same cache is the hash for the file is the same which it isn’t.

Ok, I wan’t clear in my initial post so here’s a breakdown.

Our flow has 3 steps

  1. Checkout code
  2. Bundle dependencies
    3.a. yarn test
    3.b. yarn linst

This is how it all works and what cache keys we store/restore

  • The first step will simply git checkout and store that as cache-key src-v1-{{ .Environment.CIRCLE_SHA1 }}
  • The second step will restore the cache key src-v1-{{ .Environment.CIRCLE_SHA1 }} , run yarn install and then store the node_modules path as yarn-v1-{{ checksum "yarn.lock" }}

Then we will run step 3.a and step 3.b in parallell. And this is where it gets weird, so here’s a “log” of the flow.

  • Stored Cache to src-v1-517e613b63367b6aecb0235e5627e3f22d2100ba
  • Found a cache from build 294 at src-v1-517e613b63367b6aecb0235e5627e3f22d2100ba
  • yarn install
  • Stored Cache to yarn-v1-EaB9BI2GC9JS3N+eq+63xs+GPgjWn4XdunuQizTEMuE=

3.a

  • Found a cache from build 294 at src-v1-517e613b63367b6aecb0235e5627e3f22d2100ba
  • No cache is found for key: yarn-v1-bgEWCf8ao+UPVVqDaz4I0OvwtXiqbGOLVAhvzgkWlB0=
  • error An unexpected error occurred: "Command failed.

3.b

  • Found a cache from build 294 at src-v1-517e613b63367b6aecb0235e5627e3f22d2100ba
  • No cache is found for key: yarn-v1-bgEWCf8ao+UPVVqDaz4I0OvwtXiqbGOLVAhvzgkWlB0=
  • error An unexpected error occurred: "Command failed.

As you can see, it stores yarn-v1 as one checksum but tries to restore it from another one.

I believe that since yarn install modifies yarn.lock you are seeing the expected behaviour.

in initial job

  1. checkout from git
  2. sha1sum yarn.lock = aaaa (example)
  3. yarn install
  4. sha1sum yarn.lock = bbbb (because yarn install modified yarn.lock)
  5. create cache with hash bbbb

and in future jobs

  1. checkout from git
  2. sha1sum yarn.lock = aaaa (because this yarn.lock hasn’t been modified)
  3. get cache from aaaa = error, because cache was saved with hash of the post-yarn install hash
2 Likes

Damn!

Didn’t even think of that as a possibility. You are most probably correct, I’ll try to verify it asap.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.