Trouble with SSH between parallel containers

jtbandes · August 4, 2016, 12:25am

I’ve got a test configuration like this in circle.yml:

test:
    override:
        - script1.sh
        - script2.sh
            parallel: true

script1.sh writes something to a file (e.g. /tmp/myfile). Specifically, it runs docker create and docker start -a to execute a script that writes to a file inside the docker container, then uses docker cp to copy the file from the docker container to /tmp/myfile.

script2.sh attempts to “fan out” this file to the other nodes. When CIRCLE_NODE_INDEX isn’t 0, it uses

scp node0:/tmp/myfile /tmp/myfile

But when I tried this out with 4 parallel containers running the tests, the result was:

Node 0’s script1.sh created the file.
Node 1’s script2.sh successfully copied the file.
Node 2’s script2.sh timed out — ssh: connect to host X port Y: Connection timed out.
Node 3’s script2.sh seemed to connect, but couldn’t find the file — scp: /tmp/myfile: No such file or directory.

When I tried to investigate (by enabling SSH and poking around) after observing the above, I saw that nodes 0 & 1 shared an IP address (using different ports), and nodes 2 & 3 shared an IP address (using different ports). I’m not sure if that’s relevant.

When I try another build, I see a mix of these symptoms, sometimes including Permission denied (publickey).

This seems like a timing problem / race condition, but my understanding is that the non-parallel script1.sh step should finish executing before any of the script2.sh steps are started, so I don’t see how there could be a race condition here.

I believe this should be a correct usage of scp, based on the info at https://circleci.com/docs/ssh-between-build-containers/, unless it’s somehow inaccurate.

Am I doing something wrong? What else could cause this?

Topic		Replies	Views
Parallel runs status Build Environment parallelism , circle-yml , bash	4	2216	June 18, 2018
Tests time out in UI but run fine in SSH Build Environment	4	1836	June 18, 2018
Ssh: connect to host IPX.XXX.XXX.XXX port 22: Connection timed out Deployments debugging	1	6885	August 12, 2018
Different result depending the container number Build Environment rspec , parallelism	3	2304	June 18, 2018
SSH timeout after port knocking Deployments	1	1506	June 18, 2018

Trouble with SSH between parallel containers

Related topics