I have set up a number of scheduled builds for a client, and occasionally they fail. The failure steps are always a Docker build, and always a networking operation, so I am inclined to think there is a highly intermittent networking issue. A manual rebuild always fixes the issue.
Example 1:
RUN yum install -y curl git
...
Complete!
Too long with no output (exceeded 10m0s)
Example 2:
RUN composer install --prefer-source
...
Generating autoload files
Too long with no output (exceeded 10m0s)
Example 3:
RUN yum update -y
...
---> Running in 6e325d7e6098
Too long with no output (exceeded 10m0s)
In each case, an operation that should take 1-3 minutes is still stuck after 10 minutes. I logged a ticket about this, and Scott suggested that I increase the no-output timeout. I’ve replied to say that is not an ideal fix, since if something has consumed a 250% increase of build time, then it is probably stuck permanently, and just needs to be restarted. (It’s ticket 54295 if any employees want to read it).
I am pondering whether I could fix this by adding a retry script in the Dockerfile
for every network operation. This seems a bit hacky to me, since I’ve never had this issue locally - and I imagine the network links to the build servers have a fair bit of redundancy built-in. Any suggestions how I can tackle this?
Here is one solution, but it is a bit chunky. I would probably add a timeout
here too, so that no operation is permitted to get stuck for more than three minutes.