Workspaces and share artifacts limitations

memory-usage
circle.yml

#1

Hi,

Is there a size limit for shared artifacts across workspaces? Would speeds start degrading if the workspaces have 10,000s of files?

I’m thinking about setting up a “dependency” job that does the following:

  • checkout the codebase (the .git folder and our codebase are quite large)
  • restore caches
  • install dependencies (node_modules in our case which is 10,000s of files)
  • compiles vendor DLLs
  • updates caches
  • attach workspace

Do you recommend this approach?


#2

I’ve had a chance to try out the new workflows and thought I’d share some thoughts.

Shared artifacts:

  • uses the same mechanism as artifact caching so the same limitation applies. To give you an idea, it took ~20s to store the artifacts and ~5s to restore them (~300megs in size)
  • it’ll take some trial/error to figure out whether checking out the code from Github is faster than restoring from S3
  • once a workflow is successful, the artifacts is removed so retrying a build with a shared workspace will fail

UI

  • the branch name doesn’t show in the side panel under “My branches”. You have to look for in the list of builds or list of workflows of the project
  • each step in a workflow is a separate build process which means a workflow with multiple steps will make the list of builds grow really quickly
  • in the build list view, the only way to tell which build belongs to which step is by opening it (which is inconvenient especially when you have lots of parallel builds)

Until the UI catches up, it might a little too soon to use workflows especially if you have a big team generating builds constantly. I’m really excited about the possibilities this brings. We can run testing and building in parallel which will reduce overall build times in half. I’ll probably checkout the source with each step and use workspace sharing the same way we cache (carefully).


#3

Hello,

Thanks for sharing with us!

Could you please explain to us how you managed to cache the workspace?


#4

Hi @Zephir77167,

I basically exported the entire working directory.

Here’s the entire config to give you all the details:

version: 2

defaults: &defaults
  working_directory: /home/circleci/rosetta
  docker:
    - image: circleci/node:6-browsers

jobs:

  setup:
    <<: *defaults
    parallelism: 1
    steps:
      - checkout
      - add_ssh_keys
      - run:
          name: Add known hosts for git operations
          command: .circleci/configure_ssh

      - restore_cache:
          name: Restoring node_modules
          key: v1-node_modules-{{ checksum "yarn.lock" }}
      - restore_cache:
          name: Restoring vendor
          key: v1-vendor_dev-{{ checksum "yarn.lock" }}-{{ checksum "lib/webpack/vendor.js" }}

      - run:
          name: Install dependencies
          command: yarn
      - run:
          name: Install dependencies for static_html
          command: |
            cd tools/static_html
            yarn
      - run: bin/vendor

      # note: cache node_modules immediately after install so pkg artifacts aren't captured
      - save_cache:
          name: Caching node_modules
          key: v1-node_modules-{{ checksum "yarn.lock" }}
          paths:
            - node_modules
      - save_cache:
          name: Caching vendor.js
          key: v1-vendor_dev-{{ checksum "yarn.lock" }}-{{ checksum "lib/webpack/vendor.js" }}
          paths:
            - tmp/vendor.js
            - tmp/vendor_dev.dll.json

      - persist_to_workspace: 
          root: /home/circleci
          paths:
            - rosetta

  tests:
    <<: *defaults
    parallelism: 8
    steps:
      - attach_workspace:
          at: /home/circleci
      - run:
          name: Dependency tree
          command: bin/deps --max-cpus=12
      - run:
          name: Lint
          command: |
            FILES=$(circleci tests glob "*.js" "bin/*" "lib/**/*.js" "src/**/*.{js,jsx,ts,tsx}" "tests/**/*.js" | circleci tests split)
            echo $FILES
            bin/lint --quiet --max-cpus=12 $FILES
      - run:
          name: Test
          command: |
            FILES=$(circleci tests glob "src/**/*.{js,jsx,ts,tsx}" "tests/spec/**/*.js" | circleci tests split)
            echo $FILES
            bin/test --allow-src --lcov --junit --reports-dir=tmp/reports $FILES
      - run:
          name: Upload code coverage
          command: node_modules/.bin/codecov --file=tmp/reports/lcov.info
      - store_test_results:
          path: tmp/reports

  build:
    <<: *defaults
    parallelism: 5
    steps:
      - attach_workspace:
          at: /home/circleci
      - run:
          name: Build
          command: |
            FILES=$(circleci tests glob "lib/webpack/bundles/*.js" | circleci tests split)
            echo $FILES
            bin/build $FILES
      - run:
          name: Upload
          command: bin/upload --version=$CIRCLE_BUILD_NUM
      - run:
          name: Verify
          command: |
            FILES=$(circleci tests glob "lib/webpack/bundles/*.js" | circleci tests split)
            echo $FILES
            bin/verify --bundles $FILES
      - store_artifacts:
          destination: rosetta
          path: tmp/

  deploy:
    <<: *defaults
    parallelism: 1
    steps:
      - attach_workspace:
          at: /home/circleci
      - run:
          name: Add known hosts for git operations
          command: .circleci/configure_ssh
      - deploy:
          name: Build and upload manifest
          command: bin/manifest --version=$CIRCLE_BUILD_NUM
      - deploy:
          name: Deploy master
          command: .circleci/deploy_master
      - store_artifacts:
            destination: rosetta
            path: tmp/

workflows:
  version: 2
  
  rosetta:
    jobs:
      - setup
      - tests:
          requires:
            - setup
      - build:
          requires:
            - setup
      - deploy:
          requires:
            - tests
            - build

Couple notes:

  • I tried not repeating the attach_workspace step but couldn’t find a way to do it
  • the following short-hand gave me an error - persist_to_workspace: /home/circleci/rosetta

#5

I had similar findings to you @gmathieu. I ended up using circleci’s own frontend repo to base my workspace config off. They use an anchor to reuse the attach_workspace command:

workspace_root: &workspace_root
    /tmp/workspace
    
attach_workspace: &attach_workspace
  attach_workspace:
    at: *workspace_root

#6

Thank you, you helped me a lot! I personnally found that using cache instead of workspace persistance for node modules was quicker. Any reason why you chose the second approach?


#7

That is exactly what you should do :slight_smile: :100:


#8

@Zephir77167 I was hoping workspaces used docker’s data volumes which keeps files on the parent machine (i.e. locally) instead sharing them via s3. It would have been a lot faster.

Using the cache makes sense and we will continue to use it extensively. It also requires checking out the source which for a large codebase takes about 30s. This config was an experiment to see if workspace artifact sharing was faster.


#9

Yes! Please!


#10

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.