Speed up Docker Spin up Environment?

We are using docker spin up environment. We have our own test image, so the layers are not cached. It takes at least 1 minute to spin up environment every time. Sometimes it takes 5 minutes. With v2.0, we split our original test to 5 different jobs. It’s supposed to save time, but small jobs still need time to spin up. As a result we went from a pretty fixed 25 minutes test suite in v1 to a range of 18mins - 26mins test suite in v2. If there’s a way to cache that spin up environment, it’d be much more stable, and likely to move down to 12-13mins fastest.

I know one way is to use the official circle docker images to take advantage of cached layers, but it’s almost against why we customize our own docker image in the first place. Our test image is built to be consistent with our production image. The official circle ci image has different libraries, and it doesn’t quite make sense to use the official image and risk breaking our code in production.

The other way is using docker_layer_caching — pulling a base image and have circle cache the layers. Is this the best option right now?

On the other note, if we need spin up environment with a docker base image, and enable docker_layer_caching, shouldn’t we just have an option to cache docker layers inside spin up environment right away?

Do (or could) all of your five jobs run on the same base image? If so, and if the jobs are in the same workflow, you could use workspaces to share folders of data between them, so that only the first job has to build the environment.

Alternatively, if the images need lots of things installing, then you could create a separate pipeline to build your base image, and just rebuild it weekly. For this approach, push it to an external (public/private) registry and then pull it in CircleCI. It is also a good idea to use a lightweight OS (such as Alpine) in order to reduce the size of images (100M is much better than 1.5G!).

Finally if you are building images and want a variety of layer caching options, consider this post.

Thanks for the post!

Our jobs are all running on the same base image, and we do use workspace to share the node modules, bundles, and built assets are between different jobs. The first job installs all the dependencies, and the rest of the jobs restores the same workspace.

I think the main difference between our setup and yours is the executor. We are using docker https://circleci.com/docs/2.0/executor-types/#using-docker as our executor to spin up the environment. It looks like you are using machine https://circleci.com/docs/2.0/executor-types/#using-machine?

I have to say our base image is getting big – at about 1.5G, which is probably why the docker spin up takes about 45 seconds. Reducing it to a lightweight OS is probably the best for us.

The bundle and node modules take at least 100M each, which takes another 15 seconds restoring the workspace.

With more divided jobs, the cost of restoring cache is a big draw back for us switching to 2.0. Do you feel the same as well?

I am not, I’m using Docker.

That will make a big difference.

This is where baking your own image can be helpful (assuming pulling a larger image is faster than installing those tools - it usually is).

I didn’t use CircleCI before 2.0, but I am very happy with it.

I think this relates to a previous thread that I would like to see a conclusion to:

Say, @rohara
When delegating jobs to hosts via Nomad, would it be possible to take into consideration the docker image/tag name from the specified executor for the queued job, and use that to weight the selection of available machines that have recently/previously loaded a similar image/tag docker executor?

Alternatively, given we know the graph layout of the workflow, cascading jobs with common image/tag could be prioritized to run on a common machine, such that stretches of a workflow path with repeated executor image/tags would be delegated to a common machine. I’m not sure if machines chach job caches as well as docker images, but that might also help speed up workspace/cache restores.

Still, I think something similar to Docker Layer caching could applied to this problem as well, as instead of weighting hosts by image/tags they’ve recently pulled/cached, one could go one step further by weighting host by number/size of pulled/cached layers that a machine has in common with that from required by the job’s executor. The scheduler could use the same docker registry api correlating weight assignments that microbadger must use to determine the digest and size of layers.