Self-hosted runner setup on GCP with spot instances

xakraz · August 9, 2023, 2:46pm

Hi there

Early this year in February, we have setup self-hosted machine runners on GCP to reduce our network-related costs with our private Artifacts Registry where we push and pull docker images.

For costs reasons also, we have setup Managed Instances Groups with spot/pre-emptible VMs.

However, due to the nature of spot VMs, sometimes the VMs are stopped by Google and the jobs that run on these VMs never “stop” and end with the new “infrastructure fail” status

We were told by some CircleCI support team members to setup a “system” where systemd would send 2 SIGTERM signals to the tasks-agent to make him “drain” and report the job as canceled to CircleCI control plane.

We tried to do so without success

Have someone already done such setup or is facing such situation?
If so, could you share with us your setup

Many thanks to you

If any CircleCI employee find this post, I think it could be a good example to add to the self-hosted machine runners documentatio

xakraz · August 19, 2023, 8:47pm

No suggestions at all

rit1010 · August 20, 2023, 2:18pm

I think you will have to talk to the support team again as the recommended solution of sending 2 SIGTERM signals to the task-agent is at odds with how SIGTERMs are normally handled and CircleCI’s own docs.

A SIGTERM is a ‘polite’ kill as it is a signal that can be handled by the process, so it is possible that the handler could be coded to accept 2 SIGTERMs as a special case, but it is not documented anywhere that I can find and only the internal teams have access to the current code base.

The only docs I can find are here

But these cover a different shutdown cycle that first tries a SIGTERM and then a KILL as the focus is on a clean shutdown over time.

xakraz · August 21, 2023, 10:10am

Hi @rit1010 ,

Thanks for your message

I will reach CircleCI’s support team directly then, as you suggested.

I will post the conclusion to this question if I get anything out

Topic		Replies	Views
Trouble getting started with self-hosted runner Build Environment	6	1759	February 14, 2024
Self-hosted runners are here! Build Environment feedback	8	5810	June 6, 2022
Trouble setting up self host runner Build Environment	4	412	May 30, 2023
Queuing on Self Hosted Runners for Machine Hardware Build Environment	1	389	August 2, 2023
Test Splitting on Self-hosted Runners Build Environment	11	3200	February 7, 2024

Self-hosted runner setup on GCP with spot instances

Related topics