I’m looking into using CircleCI for our company-wide CI solution, but due to the immutable nature of the cache, has not yet run a build - I’m in the process of creating my config.yaml
, but something is really bugging me.
I think I’ve actually figured this out on my own, but have left this wall of text for context, as I feel it’s important, and that the docs should be updated to be clear. You can skip to the end for my conclusion.
This is somewhat of a necro of https://discuss.circleci.com/t/circle-2-0-caching-is-too-limited-to-be-very-useful/11694/10?u=choliver
Everything talks about the cache being immutable, which is ok. The comment linked to above explains how the cache restore looks up cache keys w/ partial string matches, instead of exact string matches.
That too seems alright, and makes sense. However, this example is then given:
- restore_cache:
keys:
# Find a cache corresponding to this specific package.json checksum
# when this file is changed, this key will fail
- projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }}
# Find a cache corresponding to any build in this branch, regardless of package.json
# checksum. The most recent one will be used.
- projectname-npm-deps-{{ .Branch }}
# Find the most recent cache used from any branch
- projectname-npm-deps-
The problem I have w/ this, is in my mind the use of “the most recent” is contradictory, and implies mutability.
My understanding from the post & the docs is that since the cache is immutable, once it’s been written once, that’s it. It’ll never change. You can make as many caches as you like, but once a key has a cache, that’s it.
In my mind, “the most recent cache” implies that the cache changes - but we’re been told it doesn’t. So “the most recent cache” is the first write to that cache, which is pointless.
Now, we have some control over the cache, since we can do things like - projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }}
, but to me that’s very limiting.
In the linked to comment, Eric gives the above example, and also says:
- restore_cache: keys: - projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }} - projectname-npm-deps-{{ .Branch }} - projectname-npm-deps-
It’s because projectname-npm-deps-{{ .Branch }} will match projectname-npm-deps-feature1-123, projectname-npm-deps-feature1-456, and projectname-npm-deps-feature1-789.
Which I’m sure is true, but to me doesn’t solve the original problem.
As an example, say I’m doing package updates. I create a new branch update-packages
.
So I update a package or two, making a commit per package updated - now everytime I update a package, package.json
& package-lock.json
are changed - so already any caches with those as keys, won’t be matched, and so we can just totally ignore projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }}
.
Now, I push to my branch. Thus, a new cache is created, w/ the key projectname-npm-deps-update-packages
, and that is immutable.
So, now I update some more packages - as before, I do a commit per package, which means projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }}
style caches will never be hit, and thus when CI runs, projectname-npm-deps-update-packages
is used.
This means that every commit I make on a branch will be using the cache from first commit I made to that branch, which inefficient.
If this is all correct, what I need is the ability to cache based off the last commit. There was a comment on that thread further down that spoke about using git log --pretty=format:'%H' -n 1 -- app/assets > assets_checksum
, and that’s what I’ll be trying, but I think that it’s not too much to ask for a variable that holds that already - literally just {{ .PreviousBranchCommit }}
.
To me, this shouldn’t cause any major problems, b/c the cache_write
happens after a task, and you can’t say “do only if no cache”. Hence the cache is only written if the previous step passes, preventing a bad cache from being written.
For example, if I cache npm ci
, then so long as npm ci
is successful, the result can be cached, even if my lint
, tsc
, jest
steps afterwards fail.
Overall, this has me treading very lightly, as I feel I can’t play around w/ the cache to test this stuff due to it’s immutable nature.
One possible way caching could make sense in this situation is if the partial matching happened against the whole key, which I hope is the case, but not what I got from Erics phrasing:
It’s because
projectname-npm-deps-{{ .Branch }}
will matchprojectname-npm-deps-feature1-123
,projectname-npm-deps-feature1-456
, andprojectname-npm-deps-feature1-789
.
If he actually meant ``projectname-npm-deps-{{ .Branch }}-{{ checksum “package.json” }}`, that would make a lot more sense to me, and as would “the most recent”…
After all of this, I think I’ve worked out how caching works
The cache is immutable, but a single “cache” is actually the whole “restore_cache” block, NOT each key in “keys”.
If that’s true, then the whole thing would make a lot more sense, as that’s how you can have “the most recent” cache:
- restore_cache:
keys:
- projectname-npm-deps-{{ .Branch }}-{{ checksum "package.json" }}
- projectname-npm-deps-{{ .Branch }}
- projectname-npm-deps-
That represents a single immutable cache. i.e, if any of the “keys” are matched, the same cache is returned. Originally, I thinking was that each “key” in “keys” was naming one cache, and so the matched key would return “its” cache.
This also means that when you’re saving the cache, you should provide the most specific key possible.
This might seem like just a massive ramble, and maybe it’s b/c I’m just “special”, but it really confused me and took a lot of reading up before this clicked (if I’ve got it right - otherwise, my original question stands).
The more I think about it, the more and more this all makes sense w/ the above logic, but it was a long road to get here - and it’s something that I feel could be explained very easily by just giving a basic example of the step-by-step process CircleCI uses internally, or a nice diagram.
I would still love it if someone from CircleCI could confirm if my thinking is correct, cause it’s doing my head in XD