Docker images not cached between workflow jobs

docker

#1

Contrary to the problem described in Docker image cache - Not pulling changes I’m not seeing caching benefits on pulls from Docker Hub. All layers are pulled in fresh each time. Considering that the Workflow consists of several stages, it’s a question of several minutes per deploy.

The image reference is a standard name:version style reference.

There should be a possibility to optimise this?


#2

I don’t know what speed-up one should get from Docker Hub - I expect you want layers to be re-used if they have previously been requested.

Out of interest, how many images are you pulling, and what is their total size? One of my CI projects pulls around ~950M of images (uncompressed) and it does it in ~50 seconds. Either your total image size is several G, or pulling from Docker Hub is slow.


#3

What’s the image? Is it public?

Not knowing too much about your setup, I have two suggestions.

  1. Try to use smaller images. The smaller the image, the faster it will be to use them, regardless of caching. Docker images can be several GBs but many can be under a few hundred MBs as well.
  2. If you’re not already using it, taking advantage of CircleCI’s Docker Layer Caching might give you a healthy speed boost for larger images.

#4

Thanks for the heads up!

  1. The image is public on Docker Hub, yes.
  2. AFAIU, DLC only helps for image builds? We are not building Docker images on CircleCI.

Why not having something like the docker_layer_caching also working for image pulls?


#5

@halfer since we’re running 2.0 Workflow of several steps, the Docker image is pulled each time. Those pulls do not use cache from the previous pull and that is something that I’d see as improvement point.

Otherwise it is indeed ~50s at once, I can confirm your experience.


#6

I also realised CircleCI’s own images are actually cached:

Starting container circleci/mysql:5.6
  image is cached as circleci/mysql:5.6, but refreshing...

So all that’d be needed is workflow-level cache for the Docker images. That’d already speed things up a lot.


#7

We’ve switched to CircleCI 2.0 workflows a few months ago and unfortunately I see little benefits over CircleCI 1.0. The main problem is that caching almost never works for us. We see either

image cache not found on this host, downloading

or

image is cached as foo/bar:latest, but refreshing...

Because we define a few workflows and each workflow pulls a new Docker image, our build can easily run for 30 minutes (versus 10-15 minutes on CircleCI 1.0). Because of this I am seriously considering switching to sequential builds since it’ll be roughly the same as CircleCI 1.0.

Maybe there are better caching strategies available? For what it’s worth we use private Docker images and one of them almost never updates but yet we re-download it almost every time.

With my current understanding I struggle to see any benefits of CircleCI 2.0: the config got bigger and much more complex and everything runs slower now.


Caching docker images is not working
#8

I recommend extending one of CircleCI’s convenience images. Even though your image is private, extending our image will greatly increase the chance of having a lot of your image layers cached already. Docker images consist of one layer per directive in in a given Dockerfile; having some of the layers cached helps a lot.

The reason you’re seeing that happen is because we use Nomad to delegate jobs. Nomad has no concept of which hosts have your image cached. The more jobs you run, the more likely you are to have an image cached.


#9

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.