I’m experiencing that the automated test runs we run on PRs sometimes fails/crashes.
To me it seems the docker instance just kills itself for no apparent reason, and there’s no consistent reason/“location” in the run where it happens.
Haven’t found any clues in the logs while running in SSH mode, and the resource usage seems to not be pushed at all (image used: ubuntu-1604:202010-01)
Would appreciate any pointers on where to look next, not even sure if I’ve looked in the right logs.
I included the memory usage step from the link you provided, and it looks like only a few GB is in use (25,2% of 15GB) at peak right before the run crashes.
I’ve also SSH’d into a run while it was going, and looked at the htop intently, and I couldn’t see any spikes or unwanted behavior while it ran.