Workflow stops/crashes seemingly randomly

I’m experiencing that the automated test runs we run on PRs sometimes fails/crashes.
To me it seems the docker instance just kills itself for no apparent reason, and there’s no consistent reason/“location” in the run where it happens.
Haven’t found any clues in the logs while running in SSH mode, and the resource usage seems to not be pushed at all (image used: ubuntu-1604:202010-01)

Would appreciate any pointers on where to look next, not even sure if I’ve looked in the right logs.

Thanks!

Hi @kskarra,

Welcome to the CircleCI community!

Could you check if the job’s memory usage doesn’t reach the limit for the resource class you’re using?

Hi, and thanks!

I included the memory usage step from the link you provided, and it looks like only a few GB is in use (25,2% of 15GB) at peak right before the run crashes.

I’ve also SSH’d into a run while it was going, and looked at the htop intently, and I couldn’t see any spikes or unwanted behavior while it ran.

I am now able to reproduce this locally, so it’s likely not a issue within CircleCI.

Thanks for the help and feedback!

Thanks for the update, @kskarra!

Once you’ve managed to fix the issue locally, let me know if you have any trouble running the build on CircleCI.

Was able to locate the issue, and now things are looking green again!