Rspec Tests getting killed exactly at 10 minutes

I have a pretty simple build process for a Ruby on Rails project that’s running rspec (bundle exec rspec --require spec_helper --format RspecJunitFormatter --out /tmp/circle-tests/rspec/rspec.xml --format progress $(circleci tests glob “spec/**/*_spec.rb” | circleci tests split --split-by=timings) and it just started failing last Friday morning with a message that rspec ‘Too long with no output (exceeded 10m0s)’.

I figured someone committed a bad test or something like, but after investigating I found that wasn’t the case and even builds for branches that ran perfectly fine the previous day or two were now failing with this same issue - even after rerunning without cache. I ssh’ed into the machine and tail’ed the test.log and found that up until the exact point when the circle env killed the rspec process our tests were running and not hung on something. I then ran the build with ssh and ran the tests using the same seed and command that the build runs and that completed without issue too and generated all the expected rspec output (.ie a green period is output to the console per completed test case).

So now I’m wondering why all of a sudden our tests are getting killed at exactly the 10 minute mark when they’e still progressing through the test “script” and appear to actually still be generating output (as the console output from the build summary has thousands of periods in it - we have around 10K tests) when prior to Friday everything worked fine. As mentioned above, this happens for every builds, even those that I re-run that previously ran to completion without error.

I’m kind of stumped where to begin here as nothing appears to be wrong, yet the builds are getting killed. I can modify the timeout on the run command as a workaround, but I’d like to eventually set it back if I can figure out the issue.

2 Likes

Steps have an individual timeout, see this post:

I increased the time on the command and our builds are running to completion now…but I’m not real clear why I needed to make that change. The rspec command I’m running is generating output to stdout the whole time it’s running. It prints a period to the console for every single test case completed - and it completed 4,000+ test before being killed by circle’s run command supervisor (I copied the output from the build report in the web app, so it’s definitely picking up the output too). So it’s not as if the job sat idle as the kill command seems to indicate, it’s printing out several periods every second the command is running.

Also, the fact that our build didn’t require the 20m timeout prior to last week is suspect too. Like I indicated in my initial report, if I re-run a build that ran without failure last Wednesday, the test command now gets killed on those builds as well, where it wasn’t killed initially.

I’m ok working with the adjusted 20m time, I’m just not following why it’s needed when the test command is generating output and the command timeout wasn’t an issue prior to last week Friday.

You’re welcome :grinning:

I agree with your analysis. Raise a bug report by sending a message to support@circleci.com, and an engineer will look at it.

I have the same issue. Changing --format progress to --format documentation worked for me.
It looks the behavior of stdout buffer has changed and jobs timeout when there’s no line-breaks for 10 minutes?

I’ve started seeing the same issue but with a minitest test suite. I’ve seen this happen very intermittently in the past but as of today it has happened on 7 out of 14 builds.

@wktk @amleaver Thanks for confirming the issue. I sent a bug report to Circle support referencing this post.

@jhulford You seem to be avoiding acknowledging my assistance. I find that a little unexpected. I am a volunteer here, and although I enjoy helping people, the normal social glue is surely not too much to ask. Have we just had a disagreement on this board, and I have forgotten about it? I would rather fix that, if so.

1 Like

Sorry, it wasn’t my intention to snub you. @halfer I appreciate the assistance.

Thanks @jhulford - appreciated! :+1:

Thanks for raising this - the issue was caused by an edge case we had overlooked when rolling out our new secrets masking implementation.

We’ve toggled it off for now and will get this sorted before continuing our rollout.

2 Likes