Including Retry Mechanisms when Using the Machine Executor on CircleCI

Hi! My name is Sebastian Lerner, and I’m a product manager for the team within CircleCI that ensures jobs execute on CircleCI Cloud efficiently and reliably.

I wanted to provide a quick recommendation for customers who take advantage of our Machine executors.

For context, CircleCI relies on connecting to one or many upstream third party services while executing your job reliably. Those services include, but are not limited to:

  • AWS and GCP’s internet
  • DockerHub, Elastic Container Registry (ECR), Google Container Registry (GCR)
  • AWS S3, Google Cloud Storage (GCS)

Because of this reliance on third party services, it is highly recommended that customers include robust retry mechanisms when attempting to connect to these third party services over a network connection during job execution.


An example of a curl retry would be:

curl frp://server/dir/file.tar.gz --retry 10 --retry-max-time 0 -C -

(Persistent retrying resuming downloads with curl)

A similar example for working with wget:

wget --retry-connrefused --waitretry=1 --read-timeout=20 --timeout=15 -t 10

(How to retry connections with wget? )

For services that don’t have a native retry:

while ! ./service; do sleep 1; done

(Stack overflow)

This code assumes that the service returns an error code when the retrying call fails. One caveat with this code snippet is that it waits 1 second between retries without exponential backoff and retries forever.

CircleCI Attempting Retries & Network Performance Factors

CircleCI where possible attempts to retry certain aspects of network communication that are within its control. Additionally, CircleCI in early March, 2022 patched a version of Docker which it uses to include retries by default. Despite this patch, there will be cases and steps that are within the customer’s control that CircleCI cannot always retry automatically on behalf of the customer. Robust retry mechanisms will help alleviate pain when CircleCI cannot attempt a retry and an upstream third party service that CircleCI relies on does not connect properly during job execution.

In addition to successfully connecting to the third parties listed above, CircleCI is dependent on the network performance of those third parties for certain steps within CircleCI jobs. Each connection to a third party via the internet may have some variance for network speed which is out of CircleCI’s control and dependent on the third party’s network performance.

Please don’t hesitate to reach out to our Support team if you’re experiencing network connectivity issues with 3rd party services after including a retry mechanism or believe to be seeing abnormally large amounts of variance in network connection speed to third party services.