Why do docker layer download speeds vary so wildly?

We have custom docker images that are downloaded for each run. We’ve optimized our images as best we can at this point.

When we’re lucky, and everything is cached, we get super-fast Environment Spin-up times (5s).

But then in some cases, we will have a single run within our 12 parallel runs that gets stuck downloading layers, and spends 2+ minutes during Environment Spin-up. While in the same commit, all 11 other runs take only 1 minute max to Spin Up.

By watching the logs we can see that sometimes an image takes 2-5x longer to download from one run vs another, within the same commit.

I understand that layers will not always be cached the same, but why do some downloads vary so wildly from others? (We’re talking in the case where no layers are cached, so everything has to download)

We’ve tried varying the container host (ECR vs DockerHub).

The only way to be fast is to use CircleCI Community Images, which we’re doing for every service we are able to.

These outliers are killing are throughput. What can we do?