Trouble with SSH between parallel containers

paralellism
circle.yml

#1

I’ve got a test configuration like this in circle.yml:

test:
    override:
        - script1.sh
        - script2.sh
            parallel: true

script1.sh writes something to a file (e.g. /tmp/myfile). Specifically, it runs docker create and docker start -a to execute a script that writes to a file inside the docker container, then uses docker cp to copy the file from the docker container to /tmp/myfile.

script2.sh attempts to “fan out” this file to the other nodes. When CIRCLE_NODE_INDEX isn’t 0, it uses

scp node0:/tmp/myfile /tmp/myfile

But when I tried this out with 4 parallel containers running the tests, the result was:

  • Node 0’s script1.sh created the file.

  • Node 1’s script2.sh successfully copied the file.

  • Node 2’s script2.sh timed outssh: connect to host X port Y: Connection timed out.

  • Node 3’s script2.sh seemed to connect, but couldn’t find the filescp: /tmp/myfile: No such file or directory.

When I tried to investigate (by enabling SSH and poking around) after observing the above, I saw that nodes 0 & 1 shared an IP address (using different ports), and nodes 2 & 3 shared an IP address (using different ports). I’m not sure if that’s relevant.

When I try another build, I see a mix of these symptoms, sometimes including Permission denied (publickey).

This seems like a timing problem / race condition, but my understanding is that the non-parallel script1.sh step should finish executing before any of the script2.sh steps are started, so I don’t see how there could be a race condition here.

I believe this should be a correct usage of scp, based on the info at https://circleci.com/docs/ssh-between-build-containers/, unless it’s somehow inaccurate.

Am I doing something wrong? What else could cause this?


#2