Caching installed Python packages


#1

Hi,

We’re able to cache the downloaded packages from pip by caching ~/.cache/pip. However, when pip install is re-run, though it can use the cached downloaded packages and it doesn’t have to reinstall from PyPI, it does still have to re-install every single package. In the old CircleCI v1.0, pip would only install the new packages, and wouldn’t re-install every package. What are we doing wrong here?


#2

I think you have to specify cache-save/cache-restore steps manually in your circle.yml. If you can share corresponding bits it might be easier to answer that question.

We do use tox in our projects and I simply call cache-restore (as early as possible) and cache-save as a last step:

stages:
  build:
    workDir: ~/console-api
    ...
      - type: checkout

      - type: cache-restore
        key: tox-env-{{ checksum "requirements.txt" }}-{{ checksum "requirements-testing.txt" }}

       ...

      - type: cache-save
        key: tox-env-{{ checksum "requirements.txt" }}-{{ checksum "requirements-testing.txt" }}
        paths: .tox

This way we make tox to reuse the env if requirements.txt and requirements-testing.txt are same.


#3

We currently have this:

      - type: cache-save
        key: tcj-{{ .Branch }}-{{ checksum "requirements.txt" }}
        paths:
          - "~/.cache/pip"

#4

Looks like this will make cache reusable only in scope of a single branch. Also I think you can try setting paths to ../.cache (relative to your work dir) – maybe it can’t expand the ~, just a guess.


#5

It does actually cache the downloaded packages–the issue is that it doesn’t cache the installation.


#6

This is because pip installs packages to a different folder. It’s possibly /usr/local/lib, but you can run a Python script with site.getsitepackages() to be sure. source

At the end of the day, caching can be thought of as a tar/gzip and copy to and from external storage.

@alexander for your cache-restore step, I suggest dropping the checksum portions, so that it looks like this:

 - type: cache-restore
        key: tox-env-

It will still match the keys you specified, because cache-restore uses a key-prefix match, not a full string match. From the matching caches, CircleCI selects the newest one. The benefit of doing this is that you’ll get a cache restore when you get old

Edit: the code snippet was for a different user; tagged him.


#7

Eric’s suggestion works great for locating the site packages. You can add:

      - run:
          # this can be removed
          name: Locate site Packages
          command: python -c "import site; print(site.getsitepackages())"

to config.yml and then use the resulting output to specify a cache. On my python:3.6.0 image it was: /usr/local/lib/python3.6/site-packages

It’s possible to cache root owned directories on CircleCI 2.0.


#8

Is there documentation somewhere on config.yml? Haven’t seen a reference to that file before, nor the run command. Also, how do I use output from a command in the cache key name?


#9

New style config is documented here: https://circleci.com/docs/2.0/configuration-reference/

For a guided walkthrough using the new config there is: https://circleci.com/docs/2.0/project-walkthrough/

Going forward, all early Beta users are encouraged to update their config to the new style in a .circleci directory using config.yml


#10