Add Mechanism to Update Existing Cache Key

cache

#1

Hey,

My PIP dependencies can change over time because not all have a version spec. but my cache folder does not update, instead I see this:
Skipping cache generation, cache already exists for key

If I give a dynamic key, it wont be loaded in future builds, because the only dep change is done because the package was remotely updated and PIP will take the latest version.

The current CircleCI does apply cache changes, but 2.0 does not. what can I do?


Using an outdated cache instead of none
Still need to add mechanism to update existing cache key
Why do tickets keep being locked?
Frequently Asked Questions
#2

Would you be able to provide how you are setting your key from circle.yml?


#3

Sure,
Since the cache won’t change unless I change the PIP file, I have a static key like “app-4”. and I update it when I change the PIP file. I can have a checksum like the examples, but that wont solve my initial problem where sometimes I have no version on a dependency spec.


#4

In fact I cache the entire virtualenv dir. this is the same way the old python image from the circle 1 uses.


#5

@yardensachs I encourage you to specify the specific versions of your dependencies for reliability and repeatability. Otherwise, a solution could be to zip your dependencies and check the hash of it to determine if your cache needs to be updated. To take it one additional step, you could install your dependencies in your Docker image.


#6

I’m running into problems here, too. We could really use the ability to write to the same key multiple times…

I’m working on a rails app, and I have a set of steps where I run rake assets:precompile and then cache the results. The result of that rake task would give me a manifest file that I could hash as part of the cache key, BUT that file wouldn’t be available at the beginning to generate the cache key for the restore step.

Beyond that, I would like to actually use the following logic:

  1. Restore from “assets-precompile-master” key
  2. Restore from “assets-precompile-{{ .Branch }}” key
  3. Run rake assets:precompile step
  4. Store to “assets-precompile-{{ .Branch }}” key

That way, when I start a new branch, I’ve got the cache from master priming things. I don’t want to have to spend 3 minutes on asset:precompilation just because I started a new branch.

And then, as I make changes, I want to be able to leverage the cache. I want every build on master to store the latest version to “assets-precompile-master”. Etc.


#7

Maybe another way to put it is: there are a lot of cases where a slightly-out-of-date cache is a lot better than no cache at all. Both rake assets:precompile and bundle install can leverage some slightly-out-of-date data to drastically speed up what they’re doing. If I’ve just changed one asset file or added one gem, there’s no need to re-run the entire operation.


#8

That is awesome feedback. Thank you! I definitely understand your point about having a primed cache instead of all or nothing. Outside of using one cache for all your branches, I do not see a way to do that right now. I will open a feature request for you.


#9

@benhutton you can perform more than one cache save and cache restore in a build.

You may be able to do this now. You will have one save step, keyed to {{ .Branch }} and two restore steps, keyed to -master and -{{ .Branch }}. This will be the same branch on master, so it’s possible you’ll do a double-restore on master. If that doesn’t work as expected, you might want to use 2 saves to different keys, such as {{ .Branch }} and {{ .Branch }}-{{ checksum "app/assets/application.css" }}.


#10

@Eric yeah, that’s what I’m doing already. And cache-restore even takes a keys array, I found out: https://github.com/circleci/circleci-2.0-docs/blob/master/configuration.md#cache-restore

So here is the actual circle.yml snippet I am wanting/actively trying to use:

      - type: cache-restore
        keys:
          - desiring-god-new-assets-{{ .Branch }}
          - desiring-god-new-assets-master

      - type: shell
        name: "Assets Precompile"
        command: bundle exec rake assets:precompile assets:clean

      - type: cache-save
        key: desiring-god-new-assets-{{ .Branch }}
        paths:
          - /home/ubuntu/desiring-god-new/public/assets
          - /home/ubuntu/desiring-god-new/tmp/cache/assets/sprockets

The problem is that things indeed don’t “work as expected” (using your terms), but they do work as documented here: https://github.com/circleci/circleci-2.0-docs/blob/master/configuration.md#cache-save. “if key is already present in the cache, it will not be recreated.”

Specifically, here is what happens right now on every master branch build:

That build ran on 2017-01-06, and I imagine, per your documentation, that every single build that follows will use that same 2017-01-04 cache value (unless the cache gets purged for some reason).

I could try to key off of {{ checksum "app/assets/application.css" }}, but that’s barely going to do anything for us. That file merely links into a lot of other files that contain our actual css. It gets changed very rarely compared to the other files.

The file we actually want to checksum off of is the manifest.yml file, but that is only available after the asset compilation task, and so cannot be used to properly restore a cache off of.

THUS, the only way I see to actually do this “right” is to make it so that keys can get overwritten. Maybe you could add another knob to the cache-save step that configures the behavior of the overwriting? Default it to the current behavior but then let us change it if we want the cache-save step to always overwrite the key.


#11

Okay, I see the need and the problem. I’ve turned this into a feature request internally. Thank you for the feedback!


#12

Agreed – this is the use case of recreating how circle v1 works. It cached certain paths every build, you could bust that cache as needed. There’s no way to replicated v1 today.

My use case are git submodules. I’d much prefer checkout to have an option to automatically init and sync submodules, and therefore cache them, but I have to do this in another step. This is a common git pattern.


#13

EDIT. New style config for this is now documented:


So it turns out we overlooked this behavior when documenting the epoch cache key. I’ve opened a pull request to get this into our docs, but that might require some review and back and forth. Since you all have been so helpful in sharing what you need, I’ll share it here as well so you can get it sooner.

All cache restores look up a key as a prefix, not an exact match. In my sandbox repo, I have:

      - type: cache-restore
        key: support-sandbox-{{ .Branch }}
      - type: shell
        command: |
          ls /
          touch /foo
      - type: cache-save
        key: support-sandbox-{{ .Branch }}-{{ epoch }}
        paths:
          - /foo
      - type: cache-save
        key: support-sandbox-{{ .Branch }}

Either of those cache-save steps will trigger that cache-restore. Which one gets chosen is regardless of the closeness of match in the key name. The most recent match is chosen.

Since this match is prefix based, {{ epoch }} must be the last part of your cache-save key.

You may want to rebuild, possibly a few times, to check that a rebuild is using the cache you expect. Check for this under “Restoring cache”:

Found a cache from build 138 at support-sandbox-master
Size: 91 B
Cached paths:
  * /foo

As I rebuild, the latest cache will be used, even though support-sandbox-master is the closest match in name.


#14