I’ve got a test configuration like this in circle.yml:
test:
override:
- script1.sh
- script2.sh
parallel: true
script1.sh writes something to a file (e.g. /tmp/myfile). Specifically, it runs docker create and docker start -a to execute a script that writes to a file inside the docker container, then uses docker cp to copy the file from the docker container to /tmp/myfile.
script2.sh attempts to “fan out” this file to the other nodes. When CIRCLE_NODE_INDEX isn’t 0, it uses
scp node0:/tmp/myfile /tmp/myfile
But when I tried this out with 4 parallel containers running the tests, the result was:
-
Node 0’s
script1.shcreated the file. -
Node 1’s
script2.shsuccessfully copied the file. -
Node 2’s
script2.shtimed out —ssh: connect to host X port Y: Connection timed out. -
Node 3’s
script2.shseemed to connect, but couldn’t find the file —scp: /tmp/myfile: No such file or directory.
When I tried to investigate (by enabling SSH and poking around) after observing the above, I saw that nodes 0 & 1 shared an IP address (using different ports), and nodes 2 & 3 shared an IP address (using different ports). I’m not sure if that’s relevant.
When I try another build, I see a mix of these symptoms, sometimes including Permission denied (publickey).
This seems like a timing problem / race condition, but my understanding is that the non-parallel script1.sh step should finish executing before any of the script2.sh steps are started, so I don’t see how there could be a race condition here.
I believe this should be a correct usage of scp, based on the info at https://circleci.com/docs/ssh-between-build-containers/, unless it’s somehow inaccurate.
Am I doing something wrong? What else could cause this?