Caching docker images is not working

We run a Rails app using Circle. On Circle 1.0 our builds used to take 13-15 minutes. When we switched to Circle 2.0, the time got reduced down to 10 minutes or so. Seems to be nice, right? Continue reading.

In order to reduce the build time even more, we decided to use a private docker image with all dependencies that we need (previously we were downloading PhantomJS and other stuff every time we run a build).

Unfortunately, the end result turned out to be a disaster. Now our builds average around 28 minutes. The bottleneck is the time that we spend on pulling our image from our private Docker registry. Because we use workflows, every job (we have 10 jobs so far) pulls the same Docker image, adding roughly 2 minutes per job to the total build time.

According to this forum, the more you run your builds, the more Circle hosts will have my image cached. I noticed this is not the case for us. Every single job/build pulls from our registry and we’ve been running it for a month. We haven’t updated our private Docker image at all, yet Circle redownloads it every single time:

image cache not found on this host, downloading <image_name>

Sometimes I can see this odd message:

image is cached as <image_name>, but refreshing...

Nothing asks it to refresh the image, yet it does that.

I’ve asked a similar question in the past. I was recommended to inherit a CircleCI image. I don’t find this answer applicable to us. We don’t want to inherit from your image, we want our image to be cached.

Question: could someone explain how docker image caching works and why the cache doesn’t work for our project? I’m happy to share configs or anything that you need to help.

What is the size of this image, in MB? I wonder if you could get a quick and definite win by trimming the size of it. I’ve seen some folks here running ludicrously-large (7G) images - mine average at 150M, with some of them hitting 30M. (Admittedly I am running Docker in Docker, and so these don’t have any test tools onboard, but nevertheless, it shows what one can do with Busybox-style images).

The next step I would try is to manually cache the Docker image. As I understand it, you presently have no control over caching, since it is done at the CircleCI/host layer. What I would consider doing (and what you will have to take a view on depending on your opinion of its complexity) is to go to Docker-in-Docker and run your own images in your own Docker.

You will need to do pulling and running manually (potential disadvantages) but then you will be able to use Workspaces, which allow you to cache image files between Jobs in a Workflow (you pull once per Workflow and then cache for all jobs within).

Our image is 1.3GB. In the meantime we are indeed working on reducing its size (we bet on Alpine because currently we use Ubuntu). So yes, you are right, that’d be indeed a win.

Correct, we don’t seem to have control over caching. I haven’t heard of Docker-in-Docker, so this is something to investigate for me.

We already cache image files such as bundler and NPM deps, it’s helpful.

What I do wonder about, though, is that if all this trouble is worth the time. I’ve already spent quite some time familiarising myself with the Circle 2.0 config syntax and the workflow stuff. I’ve made the transition for our app but troubles still haunt us. Sure, previously, it wasn’t as fast as it can be now. However, I have to read more about caching and other stuff to get things right. It’s nowhere trivial and it makes me think that sequential builds could be a simple answer here. We’re talking about saving 3-4 minutes here. It’s a good win, but spending time to reach that is not.

Thanks for the reply!

I sense that it is something of a non-standard way of using CircleCI, but I use it because it means I can run my integration tests in Docker Compose on my local machine, and similarly on CircleCI, and the environments are nigh-on identical. The secondary container system in CircleCI is a nice convenience, but it cannot be replicated locally, as far as I know.

I hear you. I’m just a satisfied user, so I have no axe to grind over who you choose as your CI provider, but my analysis is that every hosted system is not perfect for anyone. For any non-trivial enterprise, everyone has to work around something. Thus, you could hop to Travis, GitLab, or someone else, and they might be better for you, or you may find new problems that your particular use-case forces you to work around.

I’ve tried inheriting one of CircleCI’s official images. My only addition to their image was installing one package and downloading some sh script. Initially, it helped immensely, since I only needed to pull one extra layer, so it was very fast. However, it somehow broke over time:

Build-agent version 0.1.301-997b7e08 (2018-08-15T16:32:23+0000)
Starting container registry.example.com/foo/bar:latest
  image cache not found on this host, downloading registry.example.com/foo/bar:latest
latest: Pulling from foo/bar
55cbf04beb70: Already exists
1607093a898c: Already exists
9a8ea045c926: Already exists
d4eee24d4dac: Already exists
b59856e9f0ab: Already exists
c8e2cea26463: Already exists
ed5731a996e1: Already exists
2c5d8cd9d0d3: Already exists
360f82d38e60: Pulling fs layer
a94255323ca7: Pulling fs layer
cad0f6a2f36f: Pulling fs layer
7797e1428e09: Pulling fs layer
3d69a76d197c: Pulling fs layer
b1e168cda94d: Pulling fs layer
ee45d91b1270: Pulling fs layer
1564a8d50af4: Pulling fs layer
be4dbfd0592f: Pulling fs layer
ff73c2751cf2: Pulling fs layer
388a7daadd9e: Pulling fs layer
6a511ab65ad0: Pulling fs layer
4c12b651a9bf: Pulling fs layer
b1e168cda94d: Waiting
1564a8d50af4: Waiting
ee45d91b1270: Waiting
3d69a76d197c: Waiting
e227b1df46d5: Pulling fs layer
6a511ab65ad0: Waiting
be4dbfd0592f: Waiting
4c12b651a9bf: Waiting
ff73c2751cf2: Waiting
326f7f68dba3: Pulling fs layer
f874baf1402a: Pulling fs layer
e227b1df46d5: Waiting
58c5e04cd23a: Pulling fs layer
a104d29a4036: Pulling fs layer
5edad5a36652: Pulling fs layer
b2d6f6de8405: Pulling fs layer
f874baf1402a: Waiting
326f7f68dba3: Waiting
388a7daadd9e: Waiting
cad0f6a2f36f: Download complete
360f82d38e60: Verifying Checksum
360f82d38e60: Download complete
360f82d38e60: Pull complete
7797e1428e09: Download complete
3d69a76d197c: Verifying Checksum
3d69a76d197c: Download complete
ee45d91b1270: Verifying Checksum
ee45d91b1270: Download complete
1564a8d50af4: Verifying Checksum
1564a8d50af4: Download complete
a94255323ca7: Verifying Checksum
a94255323ca7: Download complete
ff73c2751cf2: Download complete
be4dbfd0592f: Download complete
6a511ab65ad0: Verifying Checksum
6a511ab65ad0: Download complete
b1e168cda94d: Verifying Checksum
b1e168cda94d: Download complete
a94255323ca7: Pull complete
cad0f6a2f36f: Pull complete
7797e1428e09: Pull complete
388a7daadd9e: Verifying Checksum
388a7daadd9e: Download complete
3d69a76d197c: Pull complete
b1e168cda94d: Pull complete
ee45d91b1270: Pull complete
1564a8d50af4: Pull complete
be4dbfd0592f: Pull complete
ff73c2751cf2: Pull complete
388a7daadd9e: Pull complete
6a511ab65ad0: Pull complete
326f7f68dba3: Verifying Checksum
326f7f68dba3: Download complete
e227b1df46d5: Verifying Checksum
e227b1df46d5: Download complete
4c12b651a9bf: Verifying Checksum
4c12b651a9bf: Download complete
a104d29a4036: Verifying Checksum
a104d29a4036: Download complete
5edad5a36652: Verifying Checksum
5edad5a36652: Download complete
b2d6f6de8405: Verifying Checksum
b2d6f6de8405: Download complete
4c12b651a9bf: Pull complete
f874baf1402a: Verifying Checksum
f874baf1402a: Download complete
e227b1df46d5: Pull complete
326f7f68dba3: Pull complete
f874baf1402a: Pull complete
58c5e04cd23a: Retrying in 5 seconds
58c5e04cd23a: Retrying in 4 seconds
58c5e04cd23a: Retrying in 3 seconds
58c5e04cd23a: Retrying in 2 seconds
58c5e04cd23a: Retrying in 1 second
58c5e04cd23a: Verifying Checksum
58c5e04cd23a: Download complete
58c5e04cd23a: Pull complete
a104d29a4036: Pull complete
5edad5a36652: Pull complete
b2d6f6de8405: Pull complete
Digest: sha256:4fdf4ba444cc41aa97736a4948c66b8691f1d24ff8fac1a49bc3da8ea33778c6
Status: Downloaded newer image for registry.example.com/foo/bar:latest
  using image registry.example.com/foo/bar@sha256:4fdf4ba444cc41aa97736a4948c66b8691f1d24ff8fac1a49bc3da8ea33778c6

I’ve also tried to squash the image. No dice.
I am extremely disappointed with CircleCI 2.0. I’m not a Docker pro and I am not supposed to be one.

It’s hard to answer, since you’ve obfuscated what image you’re actually pulling. Neverthless, if that’s the 1.3G image, yes, that’s a lot of data to shift. I am not much of an expert on layers, but yes, I have experienced the problem with an unexpected number of layers apparently being changed on something I think I have customised only a little bit! I get that with Docker in general, and in my local environment.

  1. What base image are you using?

  2. What features of that base image are you using? I assume NPM here. You could use a lightweight image and just install NPM on it manually, and bring the image size down to, say, 200-300M.

My advice with CI is that the demeanour you exhibit is critical to the success of your implementation. In other words, if you’re positive that you can get it to work, that positivity makes it more likely that you will persist to get a solution you will be happy with. I think it is important to understand that CI is hard however it is sliced, and having patience with yourself will reap rewards in the end. I would urge you to persist because negativity will hurt you just the same on any platform.

In my view, learning some basics will really help you, and not just with CI. You don’t need to be a pro. When I was learning Docker, I got something working from the manual in ten minutes, because the quick-start demo on the official site was just really easy to follow.

It used to be 1.3G but I mentioned that the new image inherits a CircleCI-provided image now. In fact, I can share the Dockerfile:

FROM circleci/ruby:2.3-node-browsers

RUN sudo apt-get install postgresql-client

RUN mkdir -p ~/coverage/ && \
  curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ~/coverage/cc-test-reporter && \
  chmod +x ~/coverage/cc-test-reporter

The output from my previous post (the foo/bar image) corresponds to this Dockerfile. It should be blazing fast and it used to be (all but one or two last layers were cached). However, something has changed and a lot of layers are not cached anymore.

I have spent a lot of time tinkering with CircleCI already (trying various approaches), so I have been persistent enough, I suppose. I do appreciate your help and motivational words, though :+1:

What I meant is that I know some basic Docker stuff but it seems like to solve the issue I am dealing with I need either of these:

  • dig into how Docker implements layering (which I consider an advanced topic)
  • dig into how CircleCI implements caching (this info might not be available online). I know they use nomad, though
  • hope that CircleCI fixes their caching

Yes, I agree with that - caching is something that should work. The practical steps you can take are:

  1. Log a support call to see if there is a problem. The “Waiting” lines are a bit odd there. You’ll need to specify how long this takes and what the size of the resulting image is. Your base image is 797 MB according to DockerHub, so I assume with your additions it’d be ~900 MB.

  2. Try moving your base images to another registry, in case DockerHub is a network bottleneck. GitLab is free for this, if you don’t have a private registry yet. Using a custom registry is documented here.

  3. Or, get a Docker Compose version running. That would consist of:

    • a lightweight container with Docker already built in and installing DC manually
    • pulling the images you need in the first job and storing them in a workspace
    • restoring the images you need in subsequent jobs from the workspace
    • writing a docker-compose.yml file to specify how to start up your containers
    • starting DC in each job

    I don’t know if that would solve your slowness issue, but it’s why I say some familiarity with Docker is useful - it will aid building a proof of concept.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.