No output from a container's step in a parallel job workflow

Reference: https://circleci.com/gh/OpenNMS/opennms/8182#tests/containers/5

We seem to be experiencing intermittent issues where a step for a particular container is not producing output. This causes the container to abort after the preset time, subsequent steps that save artifacts and analysis show that the job is indeed running. Examining the other containers running in parallel show no issues.

The script in question smoke.sh first runs an IF to test a condition, then a binary IF which regardless of condition runs an echo command, this should always cause output at the beginning. Looking at the referenced link above, you can see no output for step Smoke Tests in container 5, but other containers have valid output.

This has occurred for us across different jobs/steps and cannot seem to be isolated to something controllable on our side of the fence.

1 Like

I have a ticket in support for this exact issue, and your output is very helpful. You have a 30m timeout, and it ran for 76m! That looks like a red flag to me… :thinking:

Perhaps you could flag this to support@circleci.com and mention that it might be the same issue as ticket 54295?

1 Like

We’ve had this happen on some other jobs too like this one: https://circleci.com/gh/OpenNMS/opennms/8294#tests/containers/6

My suspicion is that if/when the container times out, the logs aren’t properly gathered.

There’s a similar problem that occurs when the container run time exceeds the 5 hour limit. Some of the logs are visible in the step output, but the complete log file cannot be downloaded.

1 Like

Since 4-5 days, our jobs are completely hanging in the container and node command takes forever. It just shows blank command and doesn’t print anything after that. Sometime, when job starts it runs fine and in between vanishes all the console log. Unfortunately not a single workflow successfully PASSED in last 4-5 days for us! We have parallelism varies from 3-5 container and atleast one job hangs in each workflow.

It has affected our team productivity and can’t move forward. Container hangs for timeout limit for each jobs, we are blocked!

CircleCI, will you fix this blocker issue ASAP? I have opened support ticket but no reply from you too.

CircleCI%20Job%20Hangs

It may not be a blocker for enough customers - the issue I have is intermittent, only on one repo, and a rebuild always fixes it. I notice also that you do not have a “Too long with no output”, so I wonder if you have a different problem.

Rebuild doesn’t fix issue for us and one of the container out of 3-4 parallelism always fails in any case. Issue is common as described in the original description and our test works same way as far as parallelism and tests distribution is concerned. It is private repo otherwise i would have shared the URLs. I got the replies from CircleCI and they are still investigating.

We received a report from CircleCI support that this particular issue has been resolved. Our recent builds do not seem to be experiencing this particular issue anymore.