Test Splitting on Self-hosted Runners

We’re thrilled to announce a huge enhancement for our self-hosted Runner offering, official support for CircleCI’s Test Splitting feature.

Customers who use self-hosted runners can now take advantage of CircleCI’s parallelism and test splitting features. Split your tests intelligently and use parallelism to save time for your developers.

Please keep in mind that in order to use the Test Splitting feature on a job with self-hosted runners, your runner resource class must have at least two self-hosted runners associated with it.

Comment below with any questions.

1 Like

@sebastian-lerner I haven’t found any docs to explain this so far: how many jobs can a self-hosted runner run at one time? Currently for my workflows it looks like it only one job runs on the self-hosted runner at a time; another parallel job waits to run even though it should be running immediately.

Do we have to run one self-hosted runner for every job we want to execute in parallel? So, 50 self-hosted runners running to run 50 jobs in parallel?

If so, how do we autoscale these runners? We have an autoscaling policy on the host that’s running the runner, so if the CPU is stuck at >90% for 5 minutes, it will launch another instance, etc. But depending on the instance size and the job’s processing, a new one might never spawn. On the other hand, if we launched each instance with 5-10 runner processes, we would end up with one instance with 5 jobs running on it (slowly), and eventually a second instance, with more runners. But I can’t see how CircleCI Cloud would know which runner to schedule a job on next.

Any advice on this would be greatly appreciated. Thanks

You define a number of runners with the same resource class name, so as the workflow is run the jobs can be queued against the pool of available runners.

The number of runners you can define is restricted by the type of account you have with circleci, which starts at 5 runners for the free offering.

I’ve never tried to autoscale the availability of runners so can not comment on how the circleci system will react to runners becoming available once the workflow is being processed.

Thanks for your reply!

You define a number of runners with the same resource class name

Not exactly sure what this means, could you elaborate please? Do you mean you literally execute multiple runners? Or do something else in the configuration to ‘define a number of runners’ ?

Yes, you just start up a number of runners which are defined using the same resource class name. The result is a pool of runners that tasks will be distributed across.

Hey folks, we do call out some specific self-hosted runner details on our docs for test splitting: Test splitting and parallelism - CircleCI

You are correct though that one self-hosted Runner can run one job today. So as was mentioned earlier, you would need to make sure you have n runners installed and accepting work if you want to run a job with parallelism= n.

I will note that we are working on making significantly easier to accomplish with a more scalable and container-friendly self-hosted runner, now in Open Preview: Container agent (container runner) open preview - CircleCI You should be able to install this in a Kubernetes cluster and it’ll spin up n ephemeral pods for your jobs that use parallelism = n.

Hi @sebastian-lerner , thanks for the info!

For context for your engineering org: it’s simpler for us to use a regular VM for runners, both because of lower cost and maintenance compared to running K8s, and also because we may want to deploy K8s changes from a CircleCI runner, so this creates a chicken-and-egg scenario.

For the moment I will probably just use this method to run multiple copies using systemd: Running multiple PgBouncer instances with systemd - 2ndQuadrant | PostgreSQL

However, it would be nice if the runner itself executed more runners as needed. For example, if I could start the runner in a ‘manager’ mode, and it could read some settings from CircleCI, it could then spawn new normal ‘agent’ runners, up to some maximum number. This same thing could work in both Kubernetes and on a VM. I realize this is a lot more logic to add (essentially a simplistic scheduler) but it would allow me to configure the behavior from the CircleCI web interface without having to deal with infrastructure.

Thanks

@pwillis-eiq This is is something we are considering going forward. One thought would be to let you install one “EC2 runner” in AWS on a VM which then spawns ephemeral EC2 instances which execute CCI jobs themselves before automatically getting terminated. It has the same “orchestration” aspect as the container runner but doesn’t require Kubernetes and lets you take advantage of the full VM. Does that sound similar to what you’re describing?

What is the root of the problem you’re trying to solve? Minimizing infra management burden? Or something else?

cc @nathanwfish

If you are going down that route please make the solution be able to run a locally provided script, rather than just your EC2 spawner. This way we could other solutions to start runners on say a VMware environment.

@sebastian-lerner That would be a bit overkill for us. The root of the problem we’re trying to solve is we want to run more self-hosted CircleCI jobs at once, and we want to do this without a dependency on kubernetes, because that limits us (as mentioned before; cost, complexity, chicken-and-egg).

  • we want to run CircleCI jobs in a self-hosted runner without K8s
  • we want multiple jobs to run in parallel, versus one at a time

We can already create an autoscaling group in AWS/GCloud/etc that can spawn more instances if a single VM instance gets overloaded by parallelized CircleCI jobs. So we don’t need extra cloud-specific functionality baked in. We just need the existing runner to execute more than one job at a time.

But to be honest, before we even got to creating an autoscaling group, we would probably just create one VM with a given size and set a parallelize limit on CircleCI runners on that one VM. Wasteful, but simpler. If we want to reduce cost or scale higher, the ASG would probably work fine.

Thank you for this feature, it’s very useful!

Is there an undocumented limit to parallelism, below the 20-job limit for the Performance plan and the limit specified in the Helm chart in kubernetes? I have a job which never schedules with parallelism 16 even when nothing else is running, but schedules and completes just fine when setting parallelism to 14.

No.

Can you please open a support ticket? It would be helpful to dig more into your situation. It doesn’t sound like your experience is nominal.