I’ve got a test configuration like this in
test: override: - script1.sh - script2.sh parallel: true
script1.sh writes something to a file (e.g.
/tmp/myfile). Specifically, it runs
docker create and
docker start -a to execute a script that writes to a file inside the docker container, then uses
docker cp to copy the file from the docker container to
script2.sh attempts to “fan out” this file to the other nodes. When
CIRCLE_NODE_INDEX isn’t 0, it uses
scp node0:/tmp/myfile /tmp/myfile
But when I tried this out with 4 parallel containers running the tests, the result was:
script1.shcreated the file.
script2.shsuccessfully copied the file.
script2.shtimed out —
ssh: connect to host X port Y: Connection timed out.
script2.shseemed to connect, but couldn’t find the file —
scp: /tmp/myfile: No such file or directory.
When I tried to investigate (by enabling SSH and poking around) after observing the above, I saw that nodes 0 & 1 shared an IP address (using different ports), and nodes 2 & 3 shared an IP address (using different ports). I’m not sure if that’s relevant.
When I try another build, I see a mix of these symptoms, sometimes including
Permission denied (publickey).
This seems like a timing problem / race condition, but my understanding is that the non-parallel
script1.sh step should finish executing before any of the
script2.sh steps are started, so I don’t see how there could be a race condition here.
I believe this should be a correct usage of
scp, based on the info at https://circleci.com/docs/ssh-between-build-containers/, unless it’s somehow inaccurate.
Am I doing something wrong? What else could cause this?