How to set up CI with best practises with Docker and AWS

Hi there,

I would like some advice from the community to see what best practises are for creating a fully automated Continuous Integration system with both Docker, AWS and CircleCI.

To explain: Our current system consists (or used to consist) of a large pipeline with many individual steps, which takes in data, analyses it, creates indeterminate files, and then does more modification and generates final data files + an analysis report (I could go into detail but I think it doesn’t matter at this point in time). All individual components that we have live in their own repository & most of these were released internally with manual testing.

We are now moving to a new system where we use CircleCI to spin up a blank VM, install dependencies and kick off all our unittests.

This works for small repositories (because they only need an R installation with a bunch of additional libraries) but this takes a long time to install from scratch. Large repo’s (or repo’s which rely on a lot of other software to run) are a challenge to implement because it takes a long time to install all dependencies, only to found out you forgot a semi column in your test.

A better solution would be to not spin up a blank VM, but have a VM with preinstalled software, THEN deploy our fresh code on top of it, and then run our unittests. (I understand it’s quite popular at the moment to then apply CD, but this is not a priority for us).

This sounds like a perfect scenario for Docker (yet I’m new to Docker). It looks like what I need is: A Docker container with all pre-requisite software installed for the repo to be tested, let circleci deploy the container on the VM, let circleci deploy the git repository on top of that and have it kick off the unittesting frameworks.

The part I am confused about is how to build my Docker container(s). I’ve read to not treat containers as VMs (https://blog.docker.com/2016/03/containers-are-not-vms/) and to place every piece of software in its own container “Each container should have only one concern” (https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#each-container-should-have-only-one-concern).

I also wonder where to store these build images (given that we are using AWS for running our product, I assume Amazon Container Service could be an option, yet if the containers become very large (multiple GBs) will that be an issue to copy from AWS to circleci with every build (and can it utilise the Docker cache)? If however we need to have every piece of software in its own container, do I then stitch those together using Docker compose?

Also, when using a Dockerfile to make the container, do you run the Dockerfile everytime to create an image from the Docker container or just once to ‘make’ the container (and store somewhere) and then load the created container with circleci ?

I’m also wondering if I need circleci 2.0 (the docs advice that new projects use 2.0) but I can’t clearly see what that offers over 1.0.

Any best practises on how to set up this system (on bitbucket) would be most welcome.