Allow pre-loading of remote docker base images

docker
caching
circle.yml

#1

When building docker containers after a successful set of tests, you always have to pull the base image first, whether it’s ubuntu:14.04, debian:wheezy or some custom base image in your own remotely accessible repository (e.g., Google Container Registry). There is no way to maintain a docker cache within your CircleCI account, you always have to pull a new image, or go with a hacky solution like keeping a tar file of the cached data in your repository. You could always store that tar file somewhere outside of your repository (such as Amazon S3 or Google Cloud Storage), but then how much are you gaining outside of pulling the image directly from the repository?

Maybe there is a better way! The workaround describes using docker’s save/load functionality to load the cache, but this has to be done using a tar file that presumably you have already created somewhere and stored. I believe that another solution might be able to leverage features that already exist within CircleCI: Parallelism and Modifiers.

Consider the following block of a circle.yml file:

services:
    - docker

This works currently to enable docker in your builds. However, using some parallelism might allow one to download any target base images while testing is going on:

services:
    - docker:
        load_images:
            - redis

That setup uses modifiers (though I’m not sure if I’m using them correctly here, I’m no YaML expert either) to indicate that a second container can download the “redis” image from Docker Hub in parallel (if your plan supports it), and then transparently will perform docker save, transfer it to node0 (something like ssh node0 rsync -varplz /path/to/image.tar /path/on/node0/image.tar), and then load it for use with docker deployment commands (maybe docker load /path/to/node0/image.tar).

Additionally, if you have a private repository (like a private Docker Hub repository, Google Container Registry, etc), you can specify preparation commands to run before downloading the images:

services:
    - docker:
        pre:
            - ./ensure-gcloud-installed.sh
            - ./gcloud-auth.sh
        load_images:
            - us.gcr.io/my-project/my-base:${CIRCLE_BRANCH}

In this example, you can authenticate to Google Cloud, and download the “my-base” image with the tag ${CIRCLE_BRANCH}. Maybe there is a way of checking for deployment branch triggers first, otherwise this will likely not find the tag if the branch doesn’t have any deployment triggers, though failing silently might be acceptable as well.

With this implementation, deployment commands performing a docker pull <base_image> would simply check the pre-loaded image’s hash and perform a comparison, see that they’re the same, and move on quickly.

I would also vote that on large images (or short build/testing runs), a deployment (and only the deployment) should block unless it detects a non-zero exit code from the parallel download. This way if the download fails, the deployment will simply continue and run as it would have without this feature (likely downloading the base image within the container build anyway). If the download is already occurring and hasn’t failed, then the deployment can wait until the download and subsequent docker load is completed, since that would take less time than restarting the download within the node0 container anyway. That’s just my personal opinion; you all might have better ones.

Of course, there’s room for improvement in the details here, but I’d like to see the generic idea implemented somehow!


#2

Thank you very much for going deep into the details of how you would like to see this in action—we’ll see what we can do about this. Cheers!


#3