Error on self hosted runners when they start up

All my runners are showing the following error when they try and start a job

Agent download unsuccessful.
error: allocation=“BB2YWAN6” download failed: error downloading agent version=“1.0.125217-db0d4bca” os=“linux” arch=“amd64”: could not write file “/tmp/circleci-launch-agent118322308/circleci-agent/1.0.125217-db0d4bca/linux/amd64/circleci-agent.tmp”: context deadline exceeded (Client.Timeout or context cancellation while reading body)

OK, now that I’ve modified scripts so that I use circleci based systems I have some time to provide more detail.

I have 5 runners in a pool and builds sent to the pool started to report the above error about 2 hours ago, without any change at my end to the environments. From the message, all I can guess is that there is a published update to the local circleci agent, but something is not going right with the update process, but the message is not clear about what the issue is.

Same here. I have been scrambling to fix my runners, but there appears to be nothing I can do.

error: allocation="MC5J675V" download failed: error downloading agent version="1.0.125706-f3a01134" os="darwin" arch="amd64": could not write file "/var/folders/km/wtrqlb_x6tj5_zhxjzrb_x3h0000gn/T/circleci-launch-agent167940725/circleci-agent/1.0.125706-f3a01134/darwin/amd64/circleci-agent.tmp": context deadline exceeded (Client.Timeout or context cancellation while reading body)

Well, it is good to hear that I’m not alone…

Have you had a chance to raise a support ticket as well?

Their error seems to indicate a mistake on their end as I’m already running version 1.0.35012-2fb4f32, so this failed update seems to be downgrading the version.

Yes, I raised a support ticket. I also have my agent on 1.0.35012-2fb4f32. I was trying to see if I could maybe pull an older version.

Just for fun, I tried running some other runner-based builds. A macos runner on agent 1.0.30969-f1984fc, and a windows runner on 1.0.35012-2fb4f32. All the same.

Well spin up now reports the following

Build-agent version 1.0.125706-f3a01134 (2022-05-25T21:25:02+0000)
Launch-agent version 1.0.35012-2fb4f32 (circleci-2)

So something has been fixed - I think the issue is not that an issue happened, as things go wrong, but there is a lack of transparency in terms of issues like this.

Hi @rit1010 and @flowjo-lukej,

I am escalating this internally to get more insight into the issue, and will provide an update via this post in addition to leaving comments on the tickets for our Support team.

One possible source of the timeout error is possibly where the Runner machine is hosted. Our infrastructure is mainly in the AWS region us-east-1, and downloading the agent can timeout if the Runner is located in another region of the world (the timeout is 30 seconds).

I will get back after I have more information.

I can only detail my nodes, which are on a 1Gbit link at a major EU service provider and they reported no communication issues with any other service they depend on.

The messages my end also indicate other issues as the failure was caused when trying to

  error downloading agent version=“1.0.125217-db0d4bca”

but worked with an agent download of 1.0.125706-f3a01134

This issue has now shown up again.

For anyone else seeing the same thing, I have raised a ticket.