Persist_to_workspace vs `save_cache` performance

I’m trying to optimize our build. Up until recently, we had

  1. A dependencies task that checked out from git, ran bundle install and yarn install, and yarn compile, with caching around each of the four steps
  2. A dependencies job that just ran that task
  3. Three jobs that ran in parallel after the first job. Each ran the dependencies task before running tests. Because of the workflow graph, they would (almost) always get cache hits.

Based on this article, we recently changed that to use the workspace for persistence:

  1. A dependencies job that pulls from git, runs bundle install and yarn install, then yarn compile, and pushes the whole working directory into the workspace. This job uses caches for all the dependencies.
  2. Three jobs that run in parallel after the first job. Each pulls the working directory from the workspace.

It turns out that’s slower. Unfortunately the persist_to_workspace task doesn’t offer any transparency into what’s taking time but it definitely takes longer to persist to workspace than to write to cache.

Additionally, this post suggests that persist_to_workspace can accept glob patterns, but I don’t see that information in the docs.

So my questions:

  1. Why is persist_to_workspace slower than save_cache?
  2. Should we use cache or workspaces for git? For ruby gems? For node modules? For compiled assets?
  3. Does persist_to_workspace accept a glob? Does it accept a negative glob so I can ignore ./node_modules/ and ./.git/ but persist everything else?
  4. Can you update the docs with general guidelines about when to use each tool?
2 Likes

Things I use caches for:

  • git directories (for the speed issues you’ve identified)
  • node_modules folders (cannonical use case. a cache of node_modules with a yarn.lock hash for the name is the official recommendation)

Things I use workspaces for:

  • handing folders off to orbs that expect them (check out wealthforge/cypress@1.0.0, it’s just like regular cypress but with more features)
  • storing mutable variables within a build

your dependencies job sound an awful lot like our code_switch one

code_switch:
    executor: default
    working_directory: /home/circleci/build-dir
    steps:
      - checkout:
          path: /home/circleci/build-dir
      # populate-deploy-branch and checkout-diffs are commands for handling building from a mono-repo
      - populate-deploy-branch
      - checkout-diffs 
      - run: rm -rf .git
      # get rid of that big excess folder
      - save_cache:
          key: git-sha-{{ .Revision }}
          paths:
            - /home/circleci/build-dir

and on the receiving side, one of the mono-projects will do something like

  ui_ruby:
    executor: default
    # you can work from any arbitrary file path, and caches will populate appropriately
    working_directory: /home/circleci/build-dir/clients/ui-ruby
    steps:
     # bail-if-current will gracefully kill the job if there are no changes on the detect-path conditionally
      - bail-if-current:
          detect-path: "clients/ui-ruby"
      # 
      - restore_cache:
          key: git-sha-{{ .Revision }}
      # another command that will build and push a docker image.
      - docker-build-deploy:
          repository-image: "ui-ruby"
          docker-layer-caching: true
          # dlc is a life saver on ruby gems

hopefully this is helpful

1 Like

Here is persist_to_workspace accepting 2 files, it’s not a literal glob, but get’s the same stuff done.

  - persist_to_workspace:
      root: /tmp/dir
      paths:
        - FILE_TO_STORE
        - ALT_FILE