Use the same cache key for jobs running in parallel?

Hi,

I have a project in which I want to run 3 jobs in parallel, but I want to install npm dependencies only once and cache the resulting node_modules:

  • test-unit - run unit tests
  • test-integration - run integration tests
  • build-sam - Build an AWS SAM (Serverless Application Model)

The installation of node modules and caching them is encapsulated in the install-deps-with-cache command which I use in all the above 3 jobs.

My intuition is that if the cache key already exists (if the package.json and package-lock.json weren’t changed from a previous commit) then I guess the 3 parallel jobs will be able to use it.

However, what happens if the cache key doesn’t exist (either package.json or package-lock.json were changed)?
Will each of the 3 jobs that run in parallel install the npm dependencies, which is a bit wasteful?
Or is CircleCI somehow manage to reuse the cache even though the jobs run in parallel?

commands:
  install-deps-with-cache:
    description: Install dependencies by using the cache if exists
    steps:
      - restore_cache:
          key: v1-{{ checksum "package.json" }}-{{ checksum "package-lock.json" }}
          working_directory: ~/project

      - run:
          name: Install dependencies
          command: |
            if [ -d 'node_modules' ]
            then 
              echo "restored node_modules from cache!"
            else
              npm ci
            fi
          working_directory: ~/project

      - save_cache:
          key: v1-{{ checksum "package.json" }}-{{ checksum "package-lock.json" }}
          paths: node_modules
          working_directory: ~/project

jobs:
  test-unit:
    docker:
      - image: cimg/node:18.16.1
    steps:
      - checkout
      - install-deps-with-cache
      - run:
          name: Test Unit
          command: npm run test:unit
          working_directory: ~/project

  test-integration:
    docker:
      - image: cimg/node:18.16.1
    steps:
      - checkout
      - install-deps-with-cache
      - run:
          name: Test Integration
          command: npm run test:integration
          working_directory: ~/project
  
  build-sam:
    docker:
      - image: 563186419109.dkr.ecr.us-east-1.amazonaws.com/build-images:sam-node-18    
    steps:
      - checkout
      - install-deps-with-cache
      - run:
          name: Build template
          command: sam build
          working_directory: ~/project

workflows:
  version: 2
  package:
    jobs:
      - test-unit:
          context: all
      - test-integration:
          context: all
      - build-sam:
          context: all
      - deploy:
          name: deploy-staging
          context: all
          deploy-env: staging
          notify-slack: false
          requires:
            - build-sam

There is no documented ‘advanced’ cache management. All indications are that the cache commands run within the environment with no independent controlling process. So using your example, each of the 3 jobs would create its own unique environment if no cache object exists when they start.

You could modify your process so that before executing the parallel tasks you run a task that makes sure that the cached environment exists and is up to date.

@rit1010 thanks for your reply.

You could modify your process so that before executing the parallel tasks you run a task that makes sure that the cached environment exists and is up to date.

When you write “task” you mean a job?
Basically do something like this?

jobs:
  install-deps-job:
    docker:
      - image: cimg/node:18.16.1

    steps:
      - npmregistry
      - checkout
      - install-deps-with-cache

workflows:
  version: 2
  package:
    jobs:
      - install-deps-job:
          context: all

      - test-unit:
          context: all
          requires:
            - install-deps-job

      - test-integration:
          context: all
          requires:
            - install-deps-job

      - build-sam:
          context: all
          requires:
            - install-deps-job

That would probably work, the downside is that it performs the git checkout of the project just for the sake of installing dependencies and populating the cache. But I guess “git checkout” is much “cheaper” than installing dependencies so it’s probably worth it.

Are there any recommended approaches for things like this?
Like is it considered a best practice to have a job that just installs dependencies and populate the cache?

Yes, that is the type of thing I was proposing.

As for recommendations, I’ve not come across any documents or past forum posts that cover your use case, so there is no best practice I can point to.