Variable parallel container performance

We have a Rails/PgSQL application. Running on an XL in series it can take up to 30’.

So, there is a branch that does parallel builds (4 x large). Using file size splitting.

In this branch, CI runtime ranges between 7.5’ and 28’ (which will fail due to a timeout). No code/config changes. Just continual reruns to collect more data points.

How can we make container performance more consistent?

I would suggest splitting on timing rather than file size.

You’ll need to make sure your test results are stored and include filename data

Use the CircleCI CLI to split tests - CircleCI has the basic idea.

roughly, something like:

circleci tests glob "test/**/*_test.rb" | circleci tests run --command="xargs bin/rails test --verbose" --split-by=timings

I’ve done that. It doesn’t make a difference. I flip between filesize and timing. Similar wild variance. This leads to intermittent failure on the overall CI run when one of the containers stalls out.

This feels like a noisy neighbor issue. The platform is under provisioned for the volume of containers being run.

Do different plans offer stronger isolation guarantees?

We saw a whole lot of these (similar scenario as you) a few weeks ago, and then it calmed down again.

Today, I also got a job with the ‘infrastructure failure’ banner on it (as described here).

It would be nice if someone from Circle has any information

Not good. We get timeouts all the time. Stalled containers. Frustrating.