I’ve got a test configuration like this in circle.yml
:
test:
override:
- script1.sh
- script2.sh
parallel: true
script1.sh
writes something to a file (e.g. /tmp/myfile
). Specifically, it runs docker create
and docker start -a
to execute a script that writes to a file inside the docker container, then uses docker cp
to copy the file from the docker container to /tmp/myfile
.
script2.sh
attempts to “fan out” this file to the other nodes. When CIRCLE_NODE_INDEX
isn’t 0, it uses
scp node0:/tmp/myfile /tmp/myfile
But when I tried this out with 4 parallel containers running the tests, the result was:
-
Node 0’s
script1.sh
created the file. -
Node 1’s
script2.sh
successfully copied the file. -
Node 2’s
script2.sh
timed out —ssh: connect to host X port Y: Connection timed out
. -
Node 3’s
script2.sh
seemed to connect, but couldn’t find the file —scp: /tmp/myfile: No such file or directory
.
When I tried to investigate (by enabling SSH and poking around) after observing the above, I saw that nodes 0 & 1 shared an IP address (using different ports), and nodes 2 & 3 shared an IP address (using different ports). I’m not sure if that’s relevant.
When I try another build, I see a mix of these symptoms, sometimes including Permission denied (publickey)
.
This seems like a timing problem / race condition, but my understanding is that the non-parallel script1.sh
step should finish executing before any of the script2.sh
steps are started, so I don’t see how there could be a race condition here.
I believe this should be a correct usage of scp
, based on the info at https://circleci.com/docs/ssh-between-build-containers/, unless it’s somehow inaccurate.
Am I doing something wrong? What else could cause this?