Install php packages, then fan out for parallel steps (caching the entire container?)

We have a long linear job. It’s a single job in a single workflow. Let’s say it’s like this:

  1. checkout code
  2. install php extensions
  3. install mysql
  4. run fast tests (requires a database, frequent truncates and inserts)
  5. run slow tests (also requires a database, frequent truncates and inserts)

I would like to change it so that circleci runs 1-3, then runs 4 and 5 in parallel. Because of the frequent database changes, 4 and 5 can’t use the same database at the same time.

I don’t understand how to do this. My first thought was to use caching. But caching just saves files. Because we’re installing php extensions it seems I would need to cache the entire contents of the container, which feels wrong.

I’ve read about the parallelism key, but that seems to not allow manual control. There are “magic” test splitters in circleci, but I don’t see how to manually specify these tests in one process and those in another.

Am I missing something obvious and easy?

From your description, you are building a single image (steps 1-3) against which you need to run 2 independent sets of tests (steps 4 and 5), is that correct?

If so the following comes to mind

  • If you are using a container solution (such as docker) you could create an image in a job that does steps 1-3 and then uses this in 2 concurrent jobs that do steps 4 and 5. Before considering this look over the new I/O pricing structure that is due at the end of the month.

  • Can 2 instances of the application be run side-by-side in the same environment using 2 different databases placed within the MySQL instance?

  • Depending on how long steps 1-3 take could you just run 2 independent jobs that do steps 1-3+4 and 1-3+5.

Out of these, the last idea may be the quickest to set up, while the first allows for better long term growth. The real issue regarding the first idea is the future I/O costs of creating and storing an image every time. One possible answer for this problem is to use a self-hosted runner, but as I’ve been finding this feature needs CircleCI to improve their docs and tool set before most people would want to try it.

1 Like

This is some great commentary. Bullet 1 is what I kept thinking the “right answer” would be, but I wasn’t sure how best to save and reuse the new container. I’m glad to hear it’s not obvious or cheap.

Bullet 2 is also a good suggestion. It doesn’t happen to work (at least not easily) in our case, but it’s good thinking.

Bullet 3 crossed my mind but I’m not sure if it will cost us more. Let’s say running everything as it is takes 7 minutes, but running in parallel takes 5. Will we “pay” for 10 minutes if we run in parallel? Or only 5. In any case, I think #3 is not unreasonable and it’s easy to implement today.

Most of all I’m glad to hear I wasn’t missing something easy or obvious. This was a helpful response. Thank you very much @rit1010 .

For option 1 you would end up learning Docker and something like Docker Hub as the storage location, but if you do not already use such solutions that is a lot of company time to invest and as I noted would incur the egress I/O fees that are due to start at the start of next month without even more work.

Yes, 2 parallel tasks of 5mins each will cost you 10 minutes of billable time but unless you are using larger runners or repeating this set of tasks often I would guess it is the cheaper option.