Parallelize commands within a container


#1

It would be really really really awesome if you could specify two commands to be able to run in parallel inside the same container.

Example use case: Installing requirements for different runtimes (say, JavaScript and Python). They are completely independent and could run at the same time, so why should they have to wait for one another?


#2

I havent tried it on circleCI but you can do things like this in a bash script with the wait command
see: http://stackoverflow.com/a/13296959/1114274 for a hint at it’s usage

roughly (untested) like:

#!/bin/bash

pip install -r stuff.txt &
go get ./... &
yum -y upgrade &

wait %1 %2 %3

#3

You can try also GNU Parallel

Unlike bash wait, it can handle exit codes from parallel commands.
^^ That’s exactly our case, when we started to use bash wait and then migrated to something more robust like parallel in our circle.yml.

Installation:

sudo apt-get install -y parallel

Example:

cat list-of-commands.txt | parallel

HowTo use it Doc


#4

I still wanted to be able to see all commands’ output and fail the build if any of the commands failed, so I ended up implementing a python-based solution of sending jobs to tmux in the background and wait for them to return before proceeding:

Once I got it to work, it made me realize at last that in the example I gave, i.e. installing requirements for different runtimes, both tasks are completely IO-bound (I think I was more hoping they were more CPU-bound), so parallelizing them actually made them slower. So I never got to turn this on for our actual tests.

Maybe it’s still useful to somebody! :slight_smile:


#5

This is great! thanks @teeberg - looks useful and will check out your python solution. Do you have any further usage instructions on how to get this code working in CircleCI?


#6

For sure! With the circleci_run_bg.py and circleci_join.py scripts from that snippet, run:

circleci_run_bg.py session_name "/long/running/command --with arguments"

This is going to start a tmux session in the background and return immediately.
At some later point, run

circleci_join.py session_name

to wait for that command to finish and print its output.


#7

Hey @teeberg thanks again. Is this code self-contained or requires dependencies to be installed?

NameError: global name 'run_non_interactive' is not defined

Sorry, it’s possible that my limited understanding is the actual root cause. Appreciate your help :slight_smile:


#8

My bad, didn’t realize I was using that here. It’s a tiny wrapper around subprocess:

def run_non_interactive(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE):
    p = subprocess.Popen(args, stdin=stdin, stdout=stdout, stderr=stderr)
    out, err = p.communicate()
    return p.returncode, out, err

#9

thanks @teeberg - got past that and hit with this now

python circleci_run_bg.py poster "sleep 20"

Traceback (most recent call last):
  File "circleci_run_bg.py", line 23, in <module>
    tmux.run(args.command)
  File "/Users/kanak/utils/shell.py", line 100, in run
    self.shell_execute(cmd)
  File "/Users/kanak/utils/shell.py", line 53, in shell_execute
    self.send(*args)
  File "/Users/kanak/utils/shell.py", line 46, in send
    return self._execute('send', *args)
  File "/Users/kanak/utils/shell.py", line 26, in _execute
    if not self._started and not self.exists():
  File "/Users/kanak/utils/shell.py", line 42, in exists
    retcode, out, err = self._execute_no_session('has-session', '-t', self._session_name)
  File "/Users/kanak/utils/shell.py", line 15, in _execute_no_session
    return self.run_non_interactive(['tmux'] + list(args))
  File "/Users/kanak/utils/shell.py", line 18, in run_non_interactive
    p = subprocess.Popen(args, stdin=stdin, stdout=stdout, stderr=stderr)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 703, in __init__
    errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1126, in _get_handles
    p2cread = stdin.fileno()
AttributeError: 'list' object has no attribute 'fileno'

#10

That doesn’t seem right! What’s the value of stdin at that point?


#11