Very long delays in "Spin up Environment" task


#1

We’ve been seeing “Spin up Environment” hang for an indefinite amount of time and have been forced to cancel and re-run jobs, whereupon the job executes as quickly as we’d expect it.

Here’s a sample, which ran for about 30 minutes before we finally cancelled it: https://circleci.com/gh/resistbot/gemini/296

There’s not much to go on. Does it have something to do with the Docker pull?


#2

Another example: https://circleci.com/gh/resistbot/gemini/299


#3

I’m getting a 404 there. If you want volunteers to pitch in, please grab the information in the relevant step (ideally in text format) and paste it here, in a formatted block.

Assuming this is about fetching a build image, are you pulling from Docker Hub (the default) or a custom registry?


#4

Hey, thanks for replying. Sorry for the protected links.

Step output:

Build-agent version 0.1.1301-dd8e9365 (2019-01-09T17:12:43+0000)
Starting container 421157468025.dkr.ecr.us-west-2.amazonaws.com/kubernetes-build:0.3.0
  image cache not found on this host, downloading 421157468025.dkr.ecr.us-west-2.amazonaws.com/kubernetes-build:0.3.0
0.3.0: Pulling from kubernetes-build
55cbf04beb70: Already exists
8faee70b1dad: Pulling fs layer
a8b11560f118: Pulling fs layer
94658cbca3cc: Pulling fs layer
b59856e9f0ab: Pulling fs layer
b023afffd10b: Pulling fs layer
4d4eb448d315: Pulling fs layer
c4eb58602129: Pulling fs layer
598629fb90fc: Pulling fs layer
e2209011607c: Pulling fs layer
45eb4d9add54: Pulling fs layer
52f53fb163d5: Pulling fs layer
384be3fdd7fb: Pulling fs layer
3c3781845b40: Pulling fs layer
0b2534e82c2e: Pulling fs layer
817dc37e68cd: Pulling fs layer
a3245b5ce790: Pulling fs layer
cc73bd726599: Pulling fs layer
a0439def0fd7: Pulling fs layer
4ed310b0f4ef: Pulling fs layer
cda08fcf08e4: Pulling fs layer
94658cbca3cc: Waiting
b59856e9f0ab: Waiting
384be3fdd7fb: Waiting
3c3781845b40: Waiting
b023afffd10b: Waiting
4d4eb448d315: Waiting
0b2534e82c2e: Waiting
c4eb58602129: Waiting
817dc37e68cd: Waiting
598629fb90fc: Waiting
a3245b5ce790: Waiting
cc73bd726599: Waiting
e2209011607c: Waiting
45eb4d9add54: Waiting
a0439def0fd7: Waiting
4ed310b0f4ef: Waiting
52f53fb163d5: Waiting
cda08fcf08e4: Waiting
a8b11560f118: Verifying Checksum
a8b11560f118: Download complete
8faee70b1dad: Verifying Checksum
8faee70b1dad: Download complete
8faee70b1dad: Pull complete
a8b11560f118: Pull complete
b023afffd10b: Verifying Checksum
b023afffd10b: Download complete
4d4eb448d315: Verifying Checksum
4d4eb448d315: Download complete
c4eb58602129: Download complete
598629fb90fc: Verifying Checksum
598629fb90fc: Download complete
e2209011607c: Verifying Checksum
e2209011607c: Download complete
94658cbca3cc: Verifying Checksum
94658cbca3cc: Download complete
52f53fb163d5: Verifying Checksum
52f53fb163d5: Download complete
94658cbca3cc: Pull complete
384be3fdd7fb: Verifying Checksum
384be3fdd7fb: Download complete


Build was canceled

Total run-time of this job was 28:34. Queue time was about one second. Behavior was a normal Docker image download, until it got to the last message – 384be3fdd7fb: Download complete, where it stalled and did not continue for the rest of the 25 or so minutes of the total runtime.

This is using ECR.