Caching Docker build images

docker

#1

I think I’ve tried just about every way to cache a docker build so I don’t have to re-build my package. I realize that there are some limitations, but would have hoped that one of these might have worked. Is this possible to do in CircleCI 2.0?

Some core information:

version: 2

jobs:
  build:
    docker:
      - image: docker:17.03.0-ce-git
        environment:
          CONTAINER_IMAGE=my-image

    steps:
     - setup_remote_docker

Since we’re running docker 17.03 we should be able to use the --cache-from option

  - run:
      name: Build image
      command: |
        eval $(aws ecr get-login --region us-east-1)
        if aws ecr describe-images --repository-name=my-repo --image-ids imageTag=${CIRCLE_BRANCH} ; then tag=${CIRCLE_BRANCH} ; else tag=latest ; fi
        docker build \
            --cache-from ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${CONTAINER_IMAGE} \
            -f docker/Dockerfile.base --rm=false \
            -t ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${CONTAINER_IMAGE}:${CIRCLE_BRANCH} \
            -t ${CONTAINER_IMAGE} \
            .

That didn’t work, let’s try the pull the image and then build it…

  - run:
      name: Build image
      command: |
        eval $(aws ecr get-login --region us-east-1)
        if aws ecr describe-images --repository-name=my-repo --image-ids imageTag=${CIRCLE_BRANCH} ; then tag=${CIRCLE_BRANCH} ; else tag=latest ; fi
        docker pull ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${CONTAINER_IMAGE}:${tag}
        docker build \
            -f docker/Dockerfile.base --rm=false \
            -t ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${CONTAINER_IMAGE}:${CIRCLE_BRANCH} \
            -t ${CONTAINER_IMAGE} \
            .

That didn’t work either, so how about if we load/save the container chain.

  - restore_cache:
      key: app-{{ checksum "package.json" }}
      paths:
        - ~/docker-cache/

  - run:
      name: Build image
      command: |
        if [[ -e ~/docker-cache/image.tar ]]; then docker load -q -i ~/docker-cache/image.tar; fi
        docker build \
            -f docker/Dockerfile.base --rm=false \
            -t ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/${CONTAINER_IMAGE}:${CIRCLE_BRANCH} \
            -t ${CONTAINER_IMAGE} \
            .
        mkdir -p ~/docker-cache ; docker save --output ~/docker-cache/image.tar ${CONTAINER_IMAGE}

  - save_cache:
      key: app-{{ checksum "package.json" }}
      paths:
        - ~/docker-cache/

Out of all of these ways of restoring state so I don’t have to have a 5 minute build for the npm packages none of them worked, is there something else that needs to be done?


Docker "--cache-from" flag not working
#2

We support layer caching on the base Docker executor. You need to contact your CSM to gain access to it. It will be a premium feature in the future.


#3

That’s unfortunate that it’ll be a premium feature. Since I’m noticing that build times are about 25% slower on CircleCI 2.0 vs. 1.0 would have thought that anything that might reduce the load on your systems might be of benefit to both parties rather than a forced upgrade path.

So even though I can kick out a build on my local machine in <1 minute with layer caching, I’m looking at 12 minute build times just for my docker container.


#4

Layer caching is actually huge load on our systems.


#5

Wow! Ok…

Just out of curiosity is it Network load or CPU load that’s the cause of the load? Since my guestimates are that I’m pulling and pushing about 200MB because I don’t have a cached container.


#6

@rohara I’ve been thinking about this a bit more.

I can totally see how having an automated way to preserve docker image state is expensive. Since it makes it hard to figure out what images you need to persist and what to load on startup for a customer. It takes real engineering, support and other costs.

Though after thinking about this over night I realize that in the examples that I gave I’m explicitly bringing my image state back to the party. Which means that CircleCI doesn’t have to do any engineer and work to support it, the most you have to do is allow me to copy files from my repository back into my build environment which you already do support quite well.

So when I look at the three scenarios:

  1. --cache-from – A feature added this year to support CI build environments
  2. Explicit pull – This is more a theory, but it should bring back my full image stack
  3. Using CircleCI caching to restore my job environment, so no additional traffic across the internet, but increased storage costs for CircleCI.

It would be really great if you would not explicitly break the --cache-from option to a build so that I can use Docker in the way intended, increase the performance of my builds and not require CircleCI to do any engineering or additional support.

So the question back to you is, what’s the complexity of supporting an existing feature?


#7

@rohara Like @koblas, I’m confused about using existing docker features. I am doing this (myimage is in an AWS ECR repo):

docker pull myimage:latest
docker build --cache-from myimage:latest -t myimage:abcd -t myimage:latest ./mydockercontext
docker push myimage:latest

But the docker build command always rebuilds from scratch, even when there have been no changes of any kind - like clicking the CircleCI “Rebuild” button. When I call ‘docker image ls’ I see the image I pulled listed, so I know it’s there with the right tag.

Can you help me understand why that is?

Thanks.


#8

I do not yet fully understand why that doesn’t function as expected.


#9

@jtbennett This happened because the series images are not restord and only the top level image is there. hece it rebuilds You can take a look here
on how to distribute cache across multiple hosts http://blog.runnable.com/post/145362675491/distributing-docker-cache-across-hosts https://blog.codeship.com/building-a-remote-caching-system/


#10

Hey @koblas, not sure if this might help you but I managed to persist my docker image between jobs using the following kind of config (and also then push it to our ECR repository)…

  build_docker_image:
    <<: *deploy_container_config
    steps:
      - *restore_repo
      - setup_remote_docker
      - run:
          name: Do the docker build
          command: |
            docker build -f Dockerfile.static -t my_app:$CIRCLE_BRANCH .
            mkdir -p docker-cache
            docker save -o docker-cache/built-image.tar my_app:$CIRCLE_BRANCH
      - save_cache:
          key: *docker_cache_key
          paths:
            - docker-cache

  push_docker_image:
    <<: *deploy_container_config
    steps:
      - *restore_docker_cache
      - setup_remote_docker
      - run:
          name: Sign into AWS ecr
          command: $(aws ecr get-login --no-include-email --region us-east-1)
      - run:
          name: Push it to ECR
          command: |
            docker load < docker-cache/built-image.tar
            docker tag my_app:$CIRCLE_BRANCH $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my_app:$CIRCLE_BRANCH
            docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my_app:$CIRCLE_BRANCH

Docker images between jobs in Workflows
#11

Hey @edkellena I’m looking to do a similar things to what you’ve got in this snippet but I’m new to CircleCI and the way you configure it. I was wondering if you by any chances had a copy of the config you’ve used which you can share in full?

I don’t understand the * references and was hoping to see a more complete config to get a better idea of what it means and how it all works.


#12

I don’t use the star device myself, but it’s likely to be standard YAML, rather than a Circle-specific thing. I suspect it’s to do with naming a section so that it can be repeated easily - see here.


#13

Hey @aranw, yeah, @halfer has it right. The * is a standard YAML syntax to insert a previously defined section. That just saves on repetition, and allows you to define common things in one place.
I’ve actually moved away from this methodology now anyway, as I found it quicker in practice to simply push up the docker image to our registry (with some intermediate/temporary tag like the SHA1 value) and then allow the subsequent job/s to pull it down again. The CircleCI cache is nice and useful for some stuff, but in this case, going outside was quicker!


#14