Increased rate of errors when pulling docker images on machine executor

Hi there,
We run docker-compone on a machine executor. In the recent month or so we observe increased rate of failed jobs due to errors when docker-compose pulls an image.

Here is one example:

Pulling mysql (circleci/mysql:5.7.27-ram)...
5.7.27-ram: Pulling from circleci/mysql

359f1fff: Pulling fs layer
1c853362: Pulling fs layer
54c0af6f: Pulling fs layer
c1a77330: Pulling fs layer
8a88eabc: Pulling fs layer
8658f4dd: Pulling fs layer
f6bff01b: Pulling fs layer
1efb6f83: Pulling fs layer
dbd83183: Pulling fs layer
57d8f022: Pulling fs layer
9495c6e7: Pulling fs layer
96f3c914: Pulling fs layer
545dcb9e: Pull complete=================================================>]     748B/748B9kBBERROR: error pulling image configuration: Get https://docker-images-prod.s3.dualstack.us-east-1.amazonaws.com/registry-v2/docker/registry/v2/blobs/sha256/03/03a... ...c7cfeecd0824: dial tcp 52.216.161.126:443: i/o timeout

Exited with code exit status 1

It happens for various images. Usually ok after rebuild. What is causing the issue and how can we prevent it?

Seeing the same issue with an i/o timeout error, and rebuilds usually fix it - but it’s happening for every other build.

3.7.6-slim-buster: Pulling from library/python

d04f60ab: Pulling fs layer 
16f83cca: Pulling fs layer 
30ef4680: Pulling fs layer 
c4206257: Pulling fs layer 
error pulling image configuration: Get https://docker-images-prod.s3.dualstack.us-east-1.amazonaws.com/registry-v2/docker/registry/v2/blobs/sha256/84/84de2ffd919d8fc218a243d5eb6fc0f17c7e9ebc196ad10dab4edaa9367ead4f/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIA2KUBRXV6HSE42Q7X%2F20211202%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211202T022810Z&X-Amz-Expires=1200&X-Amz-Security-Token=FwoGZXIvYXdzEFsaDOKqEUdT9jOCINEdViKFBKAmy4t6Gzta9O2sCktpYCF0rhvazivemB3bSiJwXFZn1XxETZzHlbN4SIZ4ack%2FHgI99ghWdRkO96M2kj0EYE%2BeHW%2BxRi9kS2cpaZ8mjWy0PGFvgDYeSeEScJ6k0DLNgwM7SQGMyelxipBnGJzeYzWVMvvjl6MuaNiJSdADEa6069dBjVP0t7yGYs5mi9loKy4LbA8sdTzOgMk5SQ%2Bhj5gQhQMZZLXGhJJEBDK2iir7ZXBko9k3%2FfQeQZX3BIzkPTUAPKi%2BMRrExtBxsjncSd9uuGKR5etS8QJmm4mA%2F9Z4jk%2BrWit3KNtNml5bVECaS8N9Xlk88rXUBc3OibOssLioz2aYbT4SA3dFEQbG2pf4rkvFDy3u8RaudStQrNGskFMd7%2BCLc1R6BFnKgBtfttTPkurZpL0VrpYBBv4eF0gS7I7AodE7A2G%2B%2BOPRvCJ2Ng%2BKrb4gnW9FyO82BT8eRO0IvfvVp9gT6kln%2FCbKJ2C4Ns%2F%2B5UEpEg%2Fj%2B4rOCaVA4eYQr5Plk0aAb3K1%2BqFl3e22p960AAiGV%2F72Zb6NKDxg6s5uMBp0P85vRFt0f%2Bn7Y%2FxhnT6%2F2RbsoVnU7SID41pUbjCbhnBc0ly2J9YGPGtwvQkKw52C7DA7n95cq5w7WIC5C4J4qYeGK5s%2BDTTsS4JmrJWObUkXxq%2FQzeACH9erHvtWZZoovsagjQYyKofXUxQLF5S57ekk4cB2%2FxvCeJ%2F5s4PHElhcKf%2FhUEEWUtqUQFKd2PIh0A%3D%3D&X-Amz-SignedHeaders=host&X-Amz-Signature=e70d46b7f5620a9c3c35d81072b8644f9b366795266641909f496c65f3f08c4b: dial tcp 52.216.80.238:443: i/o timeout

Exited with code exit status 1

Currently, we use a workaround and we retry pulling images.

- run:
          name: "Pull test docker images"
          command: i='0';while ! docker-compose -f tests/docker/docker-compose.yml pull && ((i < 3)); do sleep 3 && i=$[$i+1]; done

It is not ideal but it helped and we don’t have to rerun every other build manually.