2.0 persist_to_workspace does not preserve symlinks

nodejs
workflow
2.0

#1

We’re running npm install in a build container, then calling persist_to_workspace to share the node_modules with all subsequent jobs in our workflow. It seems persist_to_workspace is copying the contents of the symlinked file, rather than the symlink itself. This causes relative require() calls in node_modules/.bin to fail.

I would expect circleci to preserve the symlinks. I’m not sure how to work around this in the short term, short of writing a script to manually recreate the symlinks. It seems calling rm -rf node_modules/.bin && npm install in the subsequent jobs does not recreate the .bin dir.

Here is a listing of node_modules/.bin after running npm install in the build container:

$ ls -alF ./node_modules/.bin
total 40
drwxr-xr-x   2 circleci circleci  4096 Jul 14 14:35 ./
drwxr-xr-x 917 circleci circleci 36864 Jul 14 14:37 ../
lrwxrwxrwx   1 circleci circleci    18 Jul 14 14:35 acorn -> ../acorn/bin/acorn*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 babel -> ../babel-cli/bin/babel.js*
lrwxrwxrwx   1 circleci circleci    32 Jul 14 14:35 babel-doctor -> ../babel-cli/bin/babel-doctor.js*
lrwxrwxrwx   1 circleci circleci    42 Jul 14 14:35 babel-external-helpers -> ../babel-cli/bin/babel-external-helpers.js*
lrwxrwxrwx   1 circleci circleci    30 Jul 14 14:35 babel-node -> ../babel-cli/bin/babel-node.js*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 babylon -> ../babylon/bin/babylon.js*
lrwxrwxrwx   1 circleci circleci    22 Jul 14 14:35 browserslist -> ../browserslist/cli.js*
lrwxrwxrwx   1 circleci circleci    22 Jul 14 14:35 codecov -> ../codecov/bin/codecov*
lrwxrwxrwx   1 circleci circleci    20 Jul 14 14:35 cssesc -> ../cssesc/bin/cssesc*
lrwxrwxrwx   1 circleci circleci    16 Jul 14 14:35 csso -> ../csso/bin/csso*
lrwxrwxrwx   1 circleci circleci    24 Jul 14 14:35 dateformat -> ../dateformat/bin/cli.js*
lrwxrwxrwx   1 circleci circleci    15 Jul 14 14:35 errno -> ../errno/cli.js*
lrwxrwxrwx   1 circleci circleci    29 Jul 14 14:35 escodegen -> ../escodegen/bin/escodegen.js*
lrwxrwxrwx   1 circleci circleci    30 Jul 14 14:35 esgenerate -> ../escodegen/bin/esgenerate.js*
lrwxrwxrwx   1 circleci circleci    23 Jul 14 14:35 eslint -> ../eslint/bin/eslint.js*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 eslint_d -> ../eslint_d/bin/eslint_d.js*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 esparse -> ../esprima/bin/esparse.js*
lrwxrwxrwx   1 circleci circleci    28 Jul 14 14:35 esvalidate -> ../esprima/bin/esvalidate.js*
lrwxrwxrwx   1 circleci circleci    21 Jul 14 14:35 extract-zip -> ../extract-zip/cli.js*
lrwxrwxrwx   1 circleci circleci    28 Jul 14 14:35 handlebars -> ../handlebars/bin/handlebars*
lrwxrwxrwx   1 circleci circleci    34 Jul 14 14:35 har-validator -> ../har-validator/bin/har-validator*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 in-install -> ../in-publish/in-install.js*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 in-publish -> ../in-publish/in-publish.js*
lrwxrwxrwx   1 circleci circleci    22 Jul 14 14:35 istanbul -> ../istanbul/lib/cli.js*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 jasmine -> ../jasmine/bin/jasmine.js*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 js-yaml -> ../js-yaml/bin/js-yaml.js*
lrwxrwxrwx   1 circleci circleci    18 Jul 14 14:35 jsesc -> ../jsesc/bin/jsesc*
lrwxrwxrwx   1 circleci circleci    19 Jul 14 14:35 json5 -> ../json5/lib/cli.js*
lrwxrwxrwx   1 circleci circleci    18 Jul 14 14:35 karma -> ../karma/bin/karma*
lrwxrwxrwx   1 circleci circleci    22 Jul 14 14:35 loose-envify -> ../loose-envify/cli.js*
lrwxrwxrwx   1 circleci circleci    32 Jul 14 14:35 miller-rabin -> ../miller-rabin/bin/miller-rabin*
lrwxrwxrwx   1 circleci circleci    14 Jul 14 14:35 mime -> ../mime/cli.js*
lrwxrwxrwx   1 circleci circleci    20 Jul 14 14:35 mkdirp -> ../mkdirp/bin/cmd.js*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 node-gyp -> ../node-gyp/bin/node-gyp.js*
lrwxrwxrwx   1 circleci circleci    26 Jul 14 14:35 node-sass -> ../node-sass/bin/node-sass*
lrwxrwxrwx   1 circleci circleci    19 Jul 14 14:35 nopt -> ../nopt/bin/nopt.js*
lrwxrwxrwx   1 circleci circleci    31 Jul 14 14:35 not-in-install -> ../in-publish/not-in-install.js*
lrwxrwxrwx   1 circleci circleci    31 Jul 14 14:35 not-in-publish -> ../in-publish/not-in-publish.js*
lrwxrwxrwx   1 circleci circleci    35 Jul 14 14:35 phantomjs -> ../phantomjs-prebuilt/bin/phantomjs*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 prettier -> ../prettier/bin/prettier.js*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 regjsparser -> ../regjsparser/bin/parser*
lrwxrwxrwx   1 circleci circleci    31 Jul 14 14:35 remarkable -> ../remarkable/bin/remarkable.js*
lrwxrwxrwx   1 circleci circleci    16 Jul 14 14:35 rimraf -> ../rimraf/bin.js*
lrwxrwxrwx   1 circleci circleci    27 Jul 14 14:35 sassgraph -> ../sass-graph/bin/sassgraph*
lrwxrwxrwx   1 circleci circleci    20 Jul 14 14:35 semver -> ../semver/bin/semver*
lrwxrwxrwx   1 circleci circleci    16 Jul 14 14:35 sha.js -> ../sha.js/bin.js*
lrwxrwxrwx   1 circleci circleci    19 Jul 14 14:35 shjs -> ../shelljs/bin/shjs*
lrwxrwxrwx   1 circleci circleci    23 Jul 14 14:35 sshpk-conv -> ../sshpk/bin/sshpk-conv*
lrwxrwxrwx   1 circleci circleci    23 Jul 14 14:35 sshpk-sign -> ../sshpk/bin/sshpk-sign*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 sshpk-verify -> ../sshpk/bin/sshpk-verify*
lrwxrwxrwx   1 circleci circleci    22 Jul 14 14:35 strip-indent -> ../strip-indent/cli.js*
lrwxrwxrwx   1 circleci circleci    16 Jul 14 14:35 svgo -> ../svgo/bin/svgo*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 uglifyjs -> ../uglify-js/bin/uglifyjs*
lrwxrwxrwx   1 circleci circleci    19 Jul 14 14:35 user-home -> ../user-home/cli.js*
lrwxrwxrwx   1 circleci circleci    16 Jul 14 14:35 uuid -> ../uuid/bin/uuid*
lrwxrwxrwx   1 circleci circleci    25 Jul 14 14:35 webpack -> ../webpack/bin/webpack.js*
lrwxrwxrwx   1 circleci circleci    47 Jul 14 14:35 webpack-dev-server -> ../webpack-dev-server/bin/webpack-dev-server.js*
lrwxrwxrwx   1 circleci circleci    18 Jul 14 14:35 which -> ../which/bin/which*

Here is a listing of the same directory after attaching the workspace to a subsequent job:

$ ls -alF ./node_modules/.bin
total 480
drwxr-xr-x   2 circleci circleci   4096 Jul 14 15:33 ./
drwxr-xr-x 917 circleci circleci  36864 Jul 14 15:34 ../
-rwxr-xr-x   2 circleci circleci   2156 Jul  6 07:36 acorn*
-rwxr-xr-x   2 circleci circleci     46 Oct 17  2016 babel*
-rwxr-xr-x   2 circleci circleci     72 Apr  1 14:50 babel-doctor*
-rwxr-xr-x   2 circleci circleci     63 Oct 17  2016 babel-external-helpers*
-rwxr-xr-x   2 circleci circleci     51 Oct 17  2016 babel-node*
-rwxr-xr-x   2 circleci circleci    341 Jun 11 20:51 babylon*
-rwxr-xr-x   2 circleci circleci   2827 Feb 22 11:23 browserslist*
-rwxr-xr-x   2 circleci circleci   2143 May 10 01:17 codecov*
-rwxr-xr-x   2 circleci circleci   3232 Aug  9  2013 cssesc*
-rwxr-xr-x   2 circleci circleci    292 Mar 10 21:36 csso*
-rwxr-xr-x   2 circleci circleci   2121 Nov 27  2014 dateformat*
-rwxr-xr-x   2 circleci circleci    424 Sep  8  2012 errno*
-rwxr-xr-x   2 circleci circleci   2710 Apr 28  2015 escodegen*
-rwxr-xr-x   2 circleci circleci   2415 Apr 28  2015 esgenerate*
-rwxr-xr-x   2 circleci circleci   2285 Mar 31 19:56 eslint*
-rwxr-xr-x   2 circleci circleci   1281 Nov 16  2016 eslint_d*
-rwxr-xr-x   2 circleci circleci   4568 Aug 16  2016 esparse*
-rwxr-xr-x   2 circleci circleci   6775 Aug 16  2016 esvalidate*
-rwxr-xr-x   2 circleci circleci    399 Nov  9  2015 extract-zip*
-rwxr-xr-x   2 circleci circleci   3363 May 15 21:49 handlebars*
-rwxr-xr-x   2 circleci circleci   1578 Nov 24  2015 har-validator*
-rwxr-xr-x   2 circleci circleci    115 Jul  7  2015 in-install*
-rwxr-xr-x   2 circleci circleci    115 Jul  7  2015 in-publish*
-rwxr-xr-x   2 circleci circleci   2506 Jan 11  2016 istanbul*
-rwxr-xr-x   2 circleci circleci    435 Feb 22  2016 jasmine*
-rwxr-xr-x   2 circleci circleci   2727 May 11  2016 js-yaml*
-rwxr-xr-x   2 circleci circleci   3833 May 20  2016 jsesc*
-rwxr-xr-x   2 circleci circleci   1159 Sep 28  2016 json5*
-rwxr-xr-x   2 circleci circleci     50 Jan 14 19:32 karma*
-rwxr-xr-x   2 circleci circleci    356 Nov  4  2016 loose-envify*
-rwxr-xr-x   2 circleci circleci    599 Oct 28  2015 miller-rabin*
-rwxr-xr-x   2 circleci circleci    149 Feb  5  2015 mime*
-rwxr-xr-x   2 circleci circleci    731 Dec 26  2014 mkdirp*
-rwxr-xr-x   2 circleci circleci   3596 Jan 10  2017 node-gyp*
-rwxr-xr-x   2 circleci circleci  11557 Feb  1 03:45 node-sass*
-rwxr-xr-x   2 circleci circleci   1549 Nov 12  2015 nopt*
-rwxr-xr-x   2 circleci circleci    115 Jul  7  2015 not-in-install*
-rwxr-xr-x   2 circleci circleci    115 Jul  7  2015 not-in-publish*
-rwxr-xr-x   2 circleci circleci   1050 Jul 26  2016 phantomjs*
-rwxr-xr-x   2 circleci circleci 158874 Jun 28 03:32 prettier*
-rwxr-xr-x   2 circleci circleci   1377 Feb 27  2015 regjsparser*
-rwxr-xr-x   2 circleci circleci   1566 Oct  3  2016 remarkable*
-rwxr-xr-x   2 circleci circleci   1196 Dec 15  2016 rimraf*
-rwxr-xr-x   2 circleci circleci   2701 Apr 29 07:56 sassgraph*
-rwxr-xr-x   2 circleci circleci   4092 Jun 28  2016 semver*
-rwxr-xr-x   2 circleci circleci    993 Nov 10  2016 sha.js*
-rwxr-xr-x   2 circleci circleci    995 Aug  7  2016 shjs*
-rwxr-xr-x   2 circleci circleci   4704 Mar  2 02:21 sshpk-conv*
-rwxr-xr-x   2 circleci circleci   4011 Apr 22  2016 sshpk-sign*
-rwxr-xr-x   2 circleci circleci   3507 Jan 12  2016 sshpk-verify*
-rwxr-xr-x   2 circleci circleci    823 Aug 13  2014 strip-indent*
-rwxr-xr-x   2 circleci circleci     55 Aug 20  2016 svgo*
-rwxr-xr-x   2 circleci circleci  21486 Apr  8 19:21 uglifyjs*
-rwxr-xr-x   2 circleci circleci    422 Jan 13  2015 user-home*
-rwxr-xr-x   2 circleci circleci   1143 Jun 16 17:53 uuid*
-rwxr-xr-x   2 circleci circleci   9482 Apr  4 19:28 webpack*
-rwxr-xr-x   2 circleci circleci  11289 Apr 22 10:30 webpack-dev-server*
-rwxr-xr-x   2 circleci circleci    985 May  5  2016 which*

#2

Thanks for raising this @BRMatt.

I am also having the same issue. In my Laravel repository, a number of files in vendor/bin directory are symlinked. Since persist_to_workspace does not preserve symlinks, subsequent jobs are failing. My build environment is Ubuntu 14.04. Would love to see a resolution.


#3

I’m running into this issue as well, also with node repositories. One possible workaround is to tar the files yourself (while preserving the symlinks), persist the tarball, then untar in your other stages. Example:

      - run:
          name: tar workdir
          command: |
            cd ..
            tar -czf my-workdir.tar.gz my-workdir
      - persist_to_workspace:
          root: "~"
          paths: 
            - my-workdir.tar.gz

...

      - attach_workspace:
          at: "~"
      - run:
          name: untar workdir
          command: |
            cd ..
            tar -xzf my-workdir.tar.gz

It’s the tar -h flag that causes tar to follow symlinks when creating archives. CircleCI team: could we just have a follow_symlinks option on persist_to_workspace, or even more generally, a field for general flags to pass to tar?


#4

Hello,

For this use case we recommend using cache instead, workspace is intended to persist specific files between jobs.

Please let us know if you have any questions or concerns.


#5

@zzak - could you please point me to the documentation of cache usage? Btw my actual requirement is to persist files between jobs. Otherwise I will have to install the packages again and again across the jobs which will affect the build run time.


#6

@zzak

Thanks for clarifying. In that case, it may be worthwhile to mention cache in https://circleci.com/docs/2.0/workflows/#using-workspaces-to-share-data-among-jobs, especially talking about the tradeoffs and when to use one instead of the other.

In particular, both cache and workspace allow specifying both individual files and directories. cache seems like it can also use a unique key per run via {{ epoch }}, so why would one use a workspace over cache?


#7

@BRMatt @tibinvpaul @jimmy We deployed a fix for the symlink issue, could you try to rebuild?

As for caching strategy, we recommend this for dependencies with many files which are incalculable, where as workspace is designed for a finite number of files to persist. In any case, the documentation is here: https://circleci.com/docs/2.0/caching

For this strategy we suggest to have a “dependencies” job which your downstream jobs require. So once it finishes installing and caching your dependencies, your downstream jobs only need to restore from cache.


#8

Thanks for the reply @zzak. I started using the cache stanza after your reply and that seems to be work, but I have the following concerns with the approach:

Jobs that import the build job’s dependencies fail if the job is retried “without cache”

I suppose the only way around this is to have each subsequent job import the cache, then run npm install, but this is exactly what I want to avoid - I want to each job to use the exact dependencies from the build job, if someone wants to bust the cache I’d want them to re-run the whole workflow with a bust cache.

Choosing a cache key that imports the exact dependencies created by the build job isn’t obvious

My first instinct was to create a cache key that uses {{ .BuildNum}}, as I thought it was referring to an identifier that is the same for all jobs in a run of a workflow, but it seems that it’s actually the ID of a specific job in the workflow, which means you can only use it for per-job caches.

It seems {{ .Revision }} is the most applicable variable, but that means that re-builds of the same revision will share the cache. I’m not sure if that’d be problematic, but it’s definitely not what I’d expect.


#9

An alternate approach might be to reference both package.json and package-lock.json in the checksum, so you’re depending on the actual semantics of the npm install operation and only pull in the cache when it would have resulted in the same build.


#10

Hey @zzak I confirm that the symlink issue for persist_to_workspace is fixed. I will definitely try cache instead since this is what you guys are recommending. Thanks for the quick turnaround.


#11

Yep, our package.json already explicitly locks down dependencies, but I don’t think we should have to do that. We’ve had issues in the past where “something is wrong” with the stuff in node_modules and we have to rm -rf node_modules && npm install.

If a workflow runs, and part of it fails because of the deps I want to re-run the whole thing with fresh deps. You shouldn’t be able to bust the cache on part of the workflow and end up with a workflow run that used different deps in different steps. That’s what I liked about the immutable nature of persist_to_workspace.


#12

@BRMatt Ok, please let us know if you run into any issues. Also feel free to reach out to support@circleci.com


#13