Container Agent fails to start Task Agent - exit code 139 segmentation fault

I’m trying to use the new Container Agent setup on a microk8s (first I appreciate this might be my issue as it likely hasn’t been tested) cluster (k8s v1.22). I can deploy everything successfully with Helm and the container-agent connects to CircleCI successfully and polls for jobs. However, when attempting to run any jobs I get the following error present in the CircleCI UI in a step called ‘Instance Failure’:

could not run task: launch circleci-agent on "container-0" failed: command terminated with exit code 139

So it seems there’s a segmentation fault occurring.

The same error log is reflected in the container-agent pod logs
service-work error=1 error occurred:mode=agent result=error service.name=container-agent service_name=container-agent.

Other notes:

  • If I try and set any resource requests or limits for the task containers (like in the example in the docs) then no task containers ever get launched, the jobs just sit there unprocessed and times out after 10 minutes (no task-agent containers ever get created).
  • Container agent version is: 1.0.7556-dfb352b
  • I can see warnings in the container-agent pod logs: httpclient: container-agent /api/v2/runner/claim app.loop_name=claim ................ warning=no content but I’m unsure what this means.
  • If task-agent pods do get spawned they get into a running state in k8s but just sit there doing nothing with no log output whatsoever.
  • I also get the WARN: Missing API Key message on the first line of the logs even though the agent seems to be communicating with CircleCI successfully.

Thanks

So it seems this segmentation fault happens only when I use certain docker images such as any alpine based image or the ‘hashicorp/terraform’ images but no problems occur when I use the circleci convenience images or something like ‘python:latest’. I’ve used the alpine based images without problem on the circleci cloud based runners so not sure why they’re failing here, the alpine images also work when I deploy them manually to the cluster. I appreciate this is in beta so I’m fine using the CircleCI convenience images for now.

The issue of not being able to declare the resource requests and limits is still present though. Thanks

Sorry I missed this post. The 139 is often seen with an image that is using an entrypoint that is invalid. See the fifth bullet here: Container runner open preview - CircleCI. This can be worked around by explicitly adding an entrypoint in your circleci config when trying to use that image.

For the task pod configuration, are you including the top-level “agent” key in your values.yml file? This was an issue with our docs that has now been fixed. Can you share a build link and also the values.yml snippet where you’re setting up that resource class specification? You can share via support ticket or via direct message if you prefer not to post it here.