We are experiencing intermittent but disruptive failures in our workflow, all with the following error message:
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
The failing jobs all use the machine executor type, and fail at a rate of about 2 failures per workflow of ~50 jobs. This appears to have begun a few days ago, and anecdotally has been getting slightly worse over time.
Can you offer any assistance as to what we can do to fix this? I can supply links to failing workflows and jobs upon request.
A current work-around we are using is to select the
Rerun workflow from failed option, but that is obviously not a ideal.