Uploading artifacts recursively is exceptionally slow

artifacts

#1

When CircleCI is told to upload a directory artifact, it is uploaded one file at a time, which ends up being extremely slow. Here is a build with two modes of artifact uploading (with this configuration):

  • various intermediate build steps that we generally don’t need to check often, so it’s uploaded as a tarball; and,
  • documentation that we generally do want to check (if documentation was changed in the PR), so it’s preferable to upload individually for easy online browsing and it’s uploaded as a directory, recursively.

The intermediate steps consist of 10693 files in 107 directories totalling 90M. The compressed tarball produced by CircleCI is 53.7 MB. From the build log, it took 0s to produce the tarball and 1s to upload it as an artifact.

On a local build (I did not want to try to download everything individually), the documentation consists of 7285 files in 112 directories totalling 132M. From the build log, it took 5m53s to upload.

In summary, the recursive upload was of 2.45 the size as the tarball, but it took 353 times longer to upload. That is extremely slow. I would guess it’s creating a new connection for each file, or trying to record in an artifact database the link to each file. It would be so much better for build times (both ours and just for general efficiency’s sake) if this could be batched somehow, e.g., tar|upload|untar or similar.


#2

I’m also seeing this . We generate a 16MB tarball that takes 1 second to upload this to s3. We upload both this tarball and the 830 files in it to the artifacts and the entire process takes 30 seconds.


#3

Same here, 4 mins to upload coverage reports is quite unacceptable.

Would you have ways to go around this ? Building an archive with the files isn’t good enough: it renders the files too cumbersome to access.

Thanks.


#4

I agree, the “Collecting artifacts” step is taking 2.5 minutes out of our 18-minute build (almost 15% of the total time!) and it’s a significant factor that slows down our development cycle.

“Collecting 5,788 build artifacts” (coverage reports) appears to result in 5788 calls to curl and 5788 separate connections.

There’s also a lot of spam in the logs for this step:
/usr/bin/curl: /usr/local/lib/libcurl.so.4: no version information available (required by /usr/bin/curl)


#5