A more scalable, container-friendly self-hosted runner: Container Agent - now in Open Preview

sebastian-lerner · August 16, 2022, 8:20pm

We’re excited to announce: Container Agent (final name TBD), a more scalable and container-friendly self-hosted runner, is now in Open Preview: Container runner - CircleCI

With container agent, self-hosted runner users will have:

The ability to easily define, publish, and use custom Docker images during job execution
The ability to easily manage dependencies or libraries through custom Docker images by using the Docker executor in config.yml
Seamless orchestration of ephemeral Kubernetes pods for every Docker job on self-hosted compute

If you need to run CI/CD jobs on your own infrastructure with Kubernetes, or are using the existing self-hosted runner installation on Kubernetes, visit the container agent docs today to get started.

Container Agent does not replace the existing self-hosted runner, but is instead a complement. The existing self-hosted runner is meant for customers needing to use the Machine executor. Container Agent is the equivalent of the Docker executor for self-hosted runners.

sebastian-lerner · August 31, 2022, 7:43pm

New additions in the past week, mainly improvements to scenarios when deviating from the happy path:

If a job using container agent fails, previously the workflow did not always gracefully fail as well. This has now been fixed
When the underlying node for a task pod is removed from the cluster (either by kubectl delete node, unexpected shutdowns, or a variety of other reasons) the container-agent garbage collection loop is now able to detect that the node is no longer available and clean up the pod
Because container agent allows you to configure tasks pods with the full range of Kubernetes settings, this means pods can be configured in a way which cannot be scheduled due to their constraints. We’ve added a constraint checker which periodically validates each resource class configuration against the current state of the cluster to ensure pods can be scheduled. This prevents container agent claiming jobs which it cannot schedule which would then fail

yuft · September 2, 2022, 7:07am

Like the new Container runner!

I am having issues installing the helm chart into multiple namespaces with different resource class names.
ClusterRole and ClusterRoleBindings have conflicts.

Do you have any suggestions?

sebastian-lerner · September 2, 2022, 12:48pm

Hi @yuft, thank you for the feedback! At the moment, one of the limitations is that each container-agent can only be deployed to a single namespace.

We’re looking at how we can change that in the future and will update when we have more!

rit1010 · September 4, 2022, 7:42pm

Will the console interface gain any features to allow more control over a runner - currently, it is possible to create a resource class and runner in the GUI, but there is no way to delete them.

There is also a lack of reporting in terms of runner usage, but that is a longer-term issue for when more people are using runners.

sebastian-lerner · September 6, 2022, 1:21pm

@rit1010 Yup, management of resource classes & resource class tokens via the UI is something on the near-term roadmap. We hope to have something out in the next ~3 months.

Showing Runner usage is also on the roadmap, but further down the line.

yuft · September 7, 2022, 3:46am

is there a way to allow us to intercept the ephemeral task pod creation process?
In my case, I’d like to append a Label to the ephemeral tasks pod so that it can claim to use a Managed Service Identity(MSI) during deployment to Azure.

sebastian-lerner · September 7, 2022, 1:26pm

@yuft Right now the only customization to task pods is through the resource class configuration process. I’m not as familiar with how one goes about appending that Label, is that something that can be added to the pod spec?

uplight-james · September 20, 2022, 3:10pm

Hey there,

We are testing this out and are getting the following panic regularly after task runs:

15:02:05 20bb0 6453.340ms service-work mode=agent result=success service.name=container-agent service_name=container-agent
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1712d00]

goroutine 66 [running]:
github.com/circleci/container-agent/driver/k8s.(*k8sTask).Cleanup(0xc0007b6480, {0x1f28430, 0xc00089a210})
        /home/circleci/project/driver/k8s/task.go:409 +0x400
github.com/circleci/container-agent/service.cleanup({0x1f28430?, 0xc00089a210?}, {0x1f28628?, 0xc0007b6480?})
        /home/circleci/project/service/task_worker.go:257 +0x3b
github.com/circleci/container-agent/service.(*taskWorker).runTask(0xc000819790, {0x1f28430, 0xc00089a210}, 0x7f77a94f8548?, {0xc0006f92a8, 0x8}, 0xc0003763f0?, {0x1f28628?, 0xc0007b6480})
        /home/circleci/project/service/task_worker.go:147 +0x374
github.com/circleci/container-agent/service.(*taskWorker).serviceWork(0xc000819790, {0x1f28388?, 0xc00004b380?}, {0xc0003fa740?, {0xc0003763f0?, 0x1?}})
        /home/circleci/project/service/task_worker.go:125 +0x6df
github.com/circleci/container-agent/service.(*taskWorker).work(0xc000819790, {0x1f28388, 0xc00004b380})
        /home/circleci/project/service/task_worker.go:74 +0x5e
github.com/circleci/container-agent/service.Add.func2({0x1f28388?, 0xc00004b380?})
        /home/circleci/project/service/task_worker.go:62 +0x30
github.com/circleci/ex/system.(*System).Run.func2()
        /home/circleci/go/pkg/mod/github.com/circleci/ex@v1.0.3650-a1109cf/system/system.go:63 +0x25
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /home/circleci/go/pkg/mod/golang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
        /home/circleci/go/pkg/mod/golang.org/x/sync@v0.0.0-20220819030929-7fc1605a5dde/errgroup/errgroup.go:72 +0xa5

The task is still cleaned up but the container agent restarts after this, has anyone else reported this issue? Apologies if this is the wrong spot to toss this.

We are running the container-agent on GKE.

sebastian-lerner · September 20, 2022, 3:45pm

This is the right spot! Taking a look with the internal engineering team, I’ll report back.

sebastian-lerner · September 20, 2022, 3:52pm

@uplight-james Can you share the version of container agent you’re using? It should be visible in the “task lifecycle” step from the Job Details page for a job that was run. Or if you go to your inventory screen (“Self-hosted Runners” on the left-hand nav of your UI) it should have the version as well

uplight-james · September 20, 2022, 5:46pm

@sebastian-lerner thanks for the reply! We are using circleci/container-agent:1.0.8569-ccd6594. Let me know what else I can provide.

We see this directly after a task finishes, with garbage collection on or off. The container spun up for the task DOES get removed from the cluster properly but this error still occurs.

It results in the container-agent exiting 2 (according to kubectl describe pod) and restarting, the container-agent does come back up and start working after that.

sebastian-lerner · September 20, 2022, 5:57pm

Thanks, can you try doing a helm update & upgrade to get the latest chart version and let me know if you’re still seeing this issue? CircleCI’s self-hosted runner FAQs - CircleCI

sebastian-lerner · September 20, 2022, 6:00pm

Folks, a couple of updates to share in the recent helm chart upgrades:

container-agent can now be run on ARM pods for both the pod that installs container-agent and the “task pods”. No need to specify this in values.yaml, there’s logic built in to pick up the right architecture and work accordingly
We now fallback to a generic shell if bash is not included in the image provided. @jpi I think this should fix the issue you were seeing in this thread.

If you upgrade to the latest helm chart these should be available.

Also coming very soon, some logging improvements to the errors that we output to be more actionable.

sebastian-lerner · September 27, 2022, 2:55pm

We just pushed a fix with the latest version of the helm chart that fixes issues some users were seeing in this thread. It was preventing some images which worked just fine on CCI-hosted compute from being used with container-agent. This limitation should no longer exist. Reach out to me if there are still issues you’re seeing.

denis · September 29, 2022, 5:28pm

This is in open preview, but is it considered stable enough for production usage? To clarify, we probably have fairly plain-vanilla use cases, nothing complex.

sebastian-lerner · September 29, 2022, 5:58pm

We have customers today using it in production. It is used heavily as part of our internal development process in CI as well so we are using it in production within CircleCI.

That being said, we’re still making many changes over the next couple of weeks before declaring it “generally available” which may cause bugs. So our official stance is “be careful when using in production and use at your own risk”.

sebastian-lerner · October 11, 2022, 12:53pm

Hey folks, two updates:

We’ve moved the helm chart location and as a result you are now able to rollaback to previous helm chart versions.
The documentation now includes a reference to how to use Kaniko to build a container image

Expect another update later this week as we plan on deeming it Generally Available by EOW.

sebastian-lerner · October 13, 2022, 12:24pm

Hi folks, we’re deeming the container runner out of preview and officially generally available.

We’ve added to the Runner UI in app.circleci.com the ability to see the install instructions for a container runner.
The helm chart and container runner application will be using a new major/minor release strategy detailed here.
The helm chart itself will be in a public repo shortly to improve visibility.

Don’t hesitate to reach out if there are any questions or concerns.

Topic		Replies	Views
Container-agent error on job execution Build Environment	6	1469	September 15, 2022
Circleci-k8s-agent: Kubernetes scaling solution for self-hosted runners Community Projects	3	1684	August 18, 2022
A more complete example for using container agent self-hosted runner? Build Environment container-image , gcp	5	1442	October 17, 2022
Kubernetes container agent self-hosted crash and restart Feedback & Bug Reports docker , circle-yml	1	866	September 22, 2022
Container agent (container runner) setup Build Environment	3	831	August 23, 2022

A more scalable, container-friendly self-hosted runner: Container Agent - now in Open Preview

Related topics