Can you pass a docker and virtual environment from one job to the next?

Here is my config.yml file, which is very basic because I’ve just started:

# Python CircleCI 2.0 configuration file
version: 2
jobs:
  build:
    docker:
      - image: circleci/python:3.8

    working_directory: ~/repo

    steps:
      # Step 1: obtain repo from GitHub
      - checkout
      # Step 2: create virtual env and install dependencies
      - run:
          name: create venv and install dependencies
          command: |
            python3 -m venv venv
            . venv/bin/activate
            pip install -r requirements.txt
  test:
    docker:
      - image: circleci/python:3.8
      # Step 3: run linter and tests
    steps :
      - checkout
      - run:
          name: run tests
          command: |
            python3 -m venv venv 
            . venv/bin/activate
            pip install -r requirements.txt
            pytest -v --cov

workflows:
  version: 2
  build_and_test:
    jobs:
      - build
      - test:
          requires:
            - build

Okay so my question is this:

Since I have a build and a test job that require the same dependencies, is there a way to cache the steps:

python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt

Which occur in the build job, so I don’t have to run them again inside the test job?

I know I could put the build and test into a single job, but is there anyway to do it without merging the two jobs? I just figured out how to unmerge them after a whole day of troubleshooting.

You can certainly cache the pip install step. There are some pretty good docs about it if you haven’t checked those out already.
https://circleci.com/docs/2.0/caching/#basic-example-of-dependency-caching
There is some info specific to Python but makes use of pipenv, which is a bit different than what you are running.
https://circleci.com/docs/2.0/language-python/#cache-dependencies

Basically, you’ll want to do something similar to this:

version: 2.1

jobs:
  tests:
    docker:
      - image: circleci/python:3.7
    steps:
      - checkout
      - restore_cache:
          key: v1-python-cache-{{ checksum "requirements.txt" }}
      - run:
          name: Install Dependencies and Run Tests
          command: |
            python3 -m venv venv
            . venv/bin/activate
            pip install -r requirements.txt
            pytest
      - save_cache:
          paths:
            - venv
          key: v1-python-cache-{{ checksum "requirements.txt" }}

workflows:
  example:
    jobs:
      - tests

There is also a Python Orb which abstracts away a lot of the caching details. You might want to take a look at that as well.

2 Likes

Hi @mike, thanks for your response!

I just want to make sure I understand your answer. It’s the restore_cache: step that pulls the dependencies from the build job, right?

I’m looking through the docs you linked now! It looks like I need a corresponding save_cache step in the ‘build’ job. Would that look something like this:

jobs:

   # Job 1: Successfully install dependencies and create venv
   build:
       docker:
          - image: circleci/python:3.8

   working_directory: ~/repo

    steps:

        # Step 1: obtain repo from GitHub
        - checkout

        # Step 2: create virtual env and install dependencies
        - run:
            name: create venv and install dependencies
            command: |
                python3 -m venv venv
                . venv/bin/activate
                pip install -r requirements.txt
       
        # Step 3: save dependencies for use in 'tests' job. 
        - save_cache:
            key: dep-cache
            paths:
                - path/to/dependencies

    # Job 2: Run unit tests:
    tests: 
        docker: 
            - image: circleci/python:3.8
        steps: 
            - checkout
            - restore_cache:
                key: dep-cache
            - run: 
                name: setup venv and run tests
                command: |
                    python3 -m venv venv
                    # CAN SKIP INSTALLING DEPENDENCIES BECAUSE OF CACHE?
                    pytest -v --cov

I have to say that I understand checksums only at a very basic conceptual level (in general or here at circleci). I’m not sure what the line key: v1-python-cache-{{ checksum "requirements.txt" }} does, or if I need it to save my dependencies?

I see these checksums in the docs, explained as ways to verify that circleci has the latest version? It’s a little vague.

Thanks for your help!

Wes

1 Like

You can actually have multiple save_cache sections. If the cache already exists(based on the cache key) it will just skip it.

The “checksum” really just creates a hash of the file. A big long string like tZSIvBYwnefhyHSa1LxnSBBGVujwgt4cZ7wQfqO0r5k=. That will always be the same string for the same file, but will be a different string if the content of the file changes. So that if that specific file changes, the key will change and the cache will miss.

requirements.txt typically changes when you add new dependencies via pip and do a pip freeze > requirements.txt. So you can use that file to determine if you have updated the dependencies because the requirements.txt might have changed.

One quick note about your last step

                command: |
                    python3 -m venv venv
                    # CAN SKIP INSTALLING DEPENDENCIES BECAUSE OF CACHE?
                    pytest -v --cov

Make sure to also do a . venv/bin/activate before pytest. Because of how CircleCI steps run, you’ll want to do that in any run step before you run your Python command like pytest

1 Like

Right, okay!

And when I use

python3 -m venv venv
. /venv/bin/activate
pip install -r requirements.txt

followed by

- save_cache:
    key: dep-cache
    paths:
        - venv/

Does that mean when I later restore the cache that I don’t have to create a new virtual environment with python -m venv venv?

I can just do this in my test job:

run: 
    name: activate cached venv, run pytest
    command: |
        . /venv/bin/activate
        pytest -v --cov

Sorry if these questions are annoying. I’ve read through the dependency caching docs two full times now, and I’m still getting errors saying that the venv/bin/activate can’t be found in the test job, after restoring the cache.

Well, I think I’ve tried everything. As far as I can tell, caching my virtual environment directory from the “Build” job does nothing for the “Test” job. I have decided to just install dependencies twice. This will probably annoy me after a while and I will merge building and testing back into one job. It’s a shame this doesn’t work intuitively. My saving grace is that I’m an amateur and my project is a small one, so waiting a few extra seconds doesn’t matter.