Sbt recompiles everything between workflow steps

Cross-posted on Stack Overflow:

I can’t seem to find a way to compile sources using scala/sbt in one workflow step and avoid full project recompilation on the next step.

I’ve looked at posts like How to cache SBT incremental compilation and it is not relevant to my setup, or at least not solving it.

My approach is basically this:

  1. attach workspace /home/circleci/myorg
  2. checkout code (to /home/circleci/myorg/myproj)
  3. compile the project (all compilation artifacts should reside at or below the git/checkout directory)
  4. persist myorg/myproj, ~/.sbt, ~/.ivy2/cache to the workspace

In the next workflow step (job):

  1. Restore workspace
  2. Move .sbt and .ivy2/cache back to the /home/circleci dir from the workspace
  3. run sbt test

However sbt test recompiles the full project every time. I am unable to determine why that’s the case. The workspace with all source code and resulting compiled .class files should all still exist in the workspace; nothing should appear to it to have changed.

Relevant circleci config:

version: 2

  # compile and cache compilation
    working_directory: /home/circleci/myteam/myproj
      - image: myorg/teika-myproj-base:sbt-1.2.8
      # the directory to be persisted (cached/restored) to the next step
      - attach_workspace:
          at: /home/circleci/myteam
      # git pull to /home/circleci/myteam/myproj
      - checkout
      - restore_cache:
          # look for a pre-existing set of ~/.ivy2/cache, ~/.sbt dirs 
          # from a prior build
            - sbt-artifacts-{{ checksum "project/"}}-{{ checksum "build.sbt" }}-{{ checksum "project/Dependencies.scala" }}-{{ checksum "project/plugins.sbt" }}-{{ .Branch }}
      - restore_cache:
          # look for pre-existing set of 'target' dirs from a prior build
            - build-{{ checksum "project/"}}-{{ checksum "build.sbt" }}-{{ checksum "project/Dependencies.scala" }}-{{ checksum "project/plugins.sbt" }}-{{ .Branch }}
      - run:
          # the compile step
          working_directory: /home/circleci/myteam/myproj
          command: sbt test:compile
      # per:
      # Cleanup the cached directories to avoid unnecessary cache updates
      - run:
          working_directory: /home/circleci
          command: |
            rm -rf /home/circleci/.ivy2/.sbt.ivy.lock
            find /home/circleci/.ivy2/cache -name "ivydata-*.properties" -print -delete
            find /home/circleci/.sbt -name "*.lock" -print -delete
      - save_cache:
          # cache ~/.ivy2/cache and ~/.sbt for subsequent builds
          key: sbt-artifacts-{{ checksum "project/"}}-{{ checksum "build.sbt" }}-{{ checksum "project/Dependencies.scala" }}-{{ checksum "project/plugins.sbt" }}-{{ .Branch }}-{{ .Revision }}
            - /home/circleci/.ivy2/cache
            - /home/circleci/.sbt
      - save_cache:
          # cache the `target` dirs for subsequenet builds
          key: build-{{ checksum "project/"}}-{{ checksum "build.sbt" }}-{{ checksum "project/Dependencies.scala" }}-{{ checksum "project/plugins.sbt" }}-{{ .Branch }}-{{ .Revision }}
            - /home/circleci/myteam/myproj/target
            - /home/circleci/myteam/myproj/project/target
            - /home/circleci/myteam/myproj/project/project/target
      # in circle, a 'workflow' undergoes several jobs, this first one 
      # is 'compile', the next will run the tests (see next 'job' section
      # 'test-run' below). 
      # 'persist to workspace' takes any files from this job and ensures 
      # they 'come with' the workspace to the next job in the workflow
      - persist_to_workspace:
          root: /home/circleci/myteam
          # bring the git checkout, including all target dirs
            - myproj
      - persist_to_workspace:
          root: /home/circleci
          # bring the big stuff
            - .ivy2/cache
            - .sbt

  # actually runs the tests compiled in the previous job
      SBT_OPTS: -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions  -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -Duser.timezone=Etc/UTC -Duser.language=en
      # run tests in the same image as before, but technically 
      # a different instance
      - image: myorg/teika-myproj-base:sbt-1.2.8
      # bring over all files 'persist_to_workspace' in the last job
      - attach_workspace:
          at: /home/circleci/myteam
      # restore ~/.sbt and ~/.ivy2/cache via `mv` from the workspace 
      # back to the home dir
      - run:
          working_directory: /home/circleci/myteam
          command: |
            [[ ! -d /home/circleci/.ivy2 ]] && mkdir /home/circleci/.ivy2

            for d in .ivy2/cache .sbt; do
              [[ -d "/home/circleci/$d" ]] && rm -rf "/home/circleci/$d"
              if [ -d "$d"  ]; then
                mv -v "$d" "/home/circleci/$d"
                echo "$d does not exist" >&2
                ls -la . >&2
                exit 1
      - run:
          # run the tests, already compiled
          # note: recompiles everything every time!
          working_directory: /home/circleci/myteam/myproj
          command: sbt test
          no_output_timeout: 3900s

  version: 2
      - test-compile
      - test-run:
            - test-compile

I’m not an sbt wizard, so hoping someone else can help here.

From the other thread:

Is this something that would help, or was that project specific to them? I’m assuming sbt has to cache the compiled state somewhere that you aren’t adding to the workspace, but I have no clue where that would be.

I don’t see how it’s applicable here. I’m persisting the entirety of my checkout to the workspace, and that would bring lib_managed along with it, wouldn’t it?

Another issue is that “lib_managed” only applies if you set retrieveManaged := true in build.sbt, which I understand to mean jars are downloaded here instead of (or in addition to) ~/.ivy2/cache. I don’t have this setting enabled, so there is no lib_managed, plus I should still have these artifacts in ~/.ivy2/cache persisted in the workspace.

True, I missed that and thought you doing a single sub-directory.

Will have to wait for someone who knows sbt more then I.

I haven’t solved the problem so I’m not updating the post, but rather am replying, but I do have a workaround for my immediate problem that I might as well share.

The gist is, I’m trying to separate ‘test compile’ with ‘test run’ so that I can customize JVM properties appropriately and spun up dependencies at different times to lower total machine memory pressure.

What I’ve done, in a nutshell, is run scalatest from scala -cp ... rather than via sbt test so that avoids any attempt at recompilation. The runner can operate against a directory of .class files.

The short version is this:

  1. docker container: augmented to include a scala CLI install instead of using the one SBT pulls down (unfortunate as I now need to keep these versions in sync)
  2. build phase: sbt test:compile 'inspect run' 'export test:fullClasspath' | tee >(grep -F '.jar' > ~test-classpath.txt)
    • compiles but also records a copy-patseable classpath string, suitable for passing into scala -cp VALUE_HERE to run tests
  3. test phase: scala -cp "$(cat test-classpath.txt)" -R target/scala-2.12/test-classes/ -u target/test-reports -oD
    • runs scalatest via the runner, using compiled .class files in target/scala-2.12/test-classes, using the classpath reported on in the compile phase, and printint to stdout as well as a reports directory

I don’t love this and it has some problems, but figured I’d share this workaround.

1 Like