Hi there
Early this year in February, we have setup self-hosted machine runners on GCP to reduce our network-related costs with our private Artifacts Registry where we push and pull docker images.
For costs reasons also, we have setup Managed Instances Groups with spot/pre-emptible VMs.
However, due to the nature of spot VMs, sometimes the VMs are stopped by Google and the jobs that run on these VMs never “stop” and end with the new “infrastructure fail” status
We were told by some CircleCI support team members to setup a “system” where systemd
would send 2 SIGTERM signals to the tasks-agent
to make him “drain” and report the job as canceled to CircleCI control plane.
We tried to do so without success
Have someone already done such setup or is facing such situation?
If so, could you share with us your setup
Many thanks to you
If any CircleCI employee find this post, I think it could be a good example to add to the self-hosted machine runners documentatio