Job on local agent executor hanging in “Pending” state, seeing 409 errors in “container-agent” log

I have a particular Job in a Workflow that that hangs, in the CCI UI, I see Queued / Preparing with 0 / 20m 46s and going up.

Under the steps output, I just see Preparing environment with a blue spinner.

It never progresses.

This is running on a local executor.

I have other Jobs in my Workflow which use the same executor and they work fine. It is always that specific Job where the issue happens.

In my pod where we run the container-agent kubernetes I see HTTP SC 409 errors on url https://runner.circleci.com/api/v2/step/end

Here is a slightly redacted version of the log output:

{
    "data": {
        "duration_ms": 100.041198,
        "http.attempt": 1,
        "http.base_url": "https://runner.circleci.com",
        "http.client_name": "runner-agent",
        "http.host": "runner.circleci.com",
        "http.method": "POST",
        "http.request_content_length": 0,
        "http.response_content_length": "0",
        "http.retry": false,
        "http.route": "/api/v2/step/end",
        "http.scheme": "https",
        "http.status_code": 409,
        "http.target": "/api/v2/step/end",
        "http.url": "https://runner.circleci.com/api/v2/step/end",
        "http.user_agent": "",
        "meta.beeline_version": "1.15.0",
        "meta.local_hostname": "circleci-runner-container-agent-123123-vgkkm",
        "meta.span_type": "root",
        "meta.type": "http_client",
        "mode": "agent",
        "name": "httpclient: runner-agent /api/v2/step/end",
        "result": "success",
        "service": "circleci-runner",
        "service.name": "circleci-runner",
        "service_name": "circleci-runner",
        "span.kind": "Client",
        "trace.span_id": "123",
        "trace.trace_id": "123",
        "version": "3.0.24-6207-17b540e",
        "warning": "the response from POST /api/v2/step/end was 409 (Conflict) (1 attempts)"
    },
    "time": "2024-09-26T00:35:12.765978102Z",
    "dataset": "unknown_dataset"
}

It is not just this Job within this Workflow that has the issue. I see the same problem with unrelated Jobs in other Workflows. But the same pattern, that on re-run it is always the same Jobs that get the issue, whilst others run fine. All jobs are using the same executor and none are using any caching.

I have tried rerunning the Workflow in 2 ways:

  1. CCI UI “Rerun from start”
  2. Push a git change and allow it to trigger a Workflow

In both cases the same result, the other Jobs in the Workflow complete and this Job hangs.

We tried restarting the pod which hosts our container-agent it has not helped.

I have tried retriggering jobs and I think that the HTTP SC 409 errors do correlate with Job’s getting stuck in the Preparing environment state.

The issue of Job’s not running was due to them pointing to a self-hosted runner that had been decommissioned. By changing the job to point to a different self hosted runner, the issue was resolved. I don’t know if the HTTP SC 409’s are related.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.