Docker Executor Infrastructure Upgrade

richlovescats · March 19, 2025, 1:24pm

We’re encountering 2x degraded performance, meaning we’ve had to double our resource class, and still getting killed processes.

This is an urgent priority for us as it’s meaning our CI just won’t even run, let alone doubling our costs.

V2 run with double resource + killed:
bcafd6c0-65ca-4bf0-80a3-32f4a2a912cc

V2 run without doubling resource class:
2bd09640-83c3-4d62-8125-e27370b99d41

V1 - no issues:
fdb9393e-403d-485d-9bfc-07508c833996

Would appreciate a techie looking at this urgently as well as clarification on if you’ll issue billing corrections for the extra resources and failed runs? This is the kind of thing that erodes trust and costs enormous amounts of staff time while we can’t operate correctly.

DomParfitt · March 19, 2025, 2:16pm

Thanks @bpetetot,

Those two should help with our investigations.

Dom

internets · March 19, 2025, 2:25pm

Please opt our org out. Our builds are almost twice as slow now.

https://app.circleci.com/pipelines/github/tankfarm/tankfarm.io/10016/workflows/4edc2cdb-f1d1-43ae-aa0f-4e122a565e1d/jobs/59783

capnfuzz · March 19, 2025, 4:42pm

Please opt our organization out as well. We have not had our Cypress tests pass once since the upgrade.

7ff12f72-4604-4914-ae1f-33157fda029d
aefa059e-4a2c-4dd6-9372-215fb1d1e13b

DomParfitt · March 19, 2025, 4:54pm

Hi @capnfuzz,

Sorry you’re having issues.

I’ve opted your org out, it should take about 10 minutes to apply. Please let me know if you’re still seeing Jobs running on the V1 runtime after that.

Thanks,

Dom

hyeonLewis · March 20, 2025, 5:50am

Hi, we’re currently facing issues in that the first run always failed with signal: killed. After rerunning the ci, it passes. Can I get any advice to solve this issue?

https://app.circleci.com/pipelines/github/kaiachain/kaia/2330/workflows/3cff1f7c-05ef-4c6f-8b36-a70ec66ca9c5/jobs/12892

DomParfitt · March 20, 2025, 8:47am

Hi @hyeonLewis,

Sorry you’re having issues.

Thanks for providing that Job link. I have opted your org out of the rollout for now whilst we investigate.

Many thanks,

Dom

ValentinCondurache · March 21, 2025, 8:34am

Hello, @DomParfitt. Can you please opt out our org as well? We are having multiple builds fail with jvm out of memory error and our cpu resource stays at 100%. Thank you!
https://app.circleci.com/pipelines/github/SectorLabs/cheetah/56105/workflows/039170d6-d6cb-4847-8f61-b2de34023431/jobs/901254

DomParfitt · March 21, 2025, 8:54am

Hi @ValentinCondurache,

Sorry you’re having issues.

I’ve just opted your org out. It may take around 10 minutes to apply but after that you should see your Jobs running on V1 again.

Many thanks,

Dom

maaaaakoto35 · March 25, 2025, 7:32am

Hello!
Our org seems to have been upgraded to v2 container the other day.
The CI that used to take 6 or 7 minutes took 3 hours and finally failed.
Please opt out our org.

docker image : cimg/ruby:3.2.2-browsers
testing library : rspec (parallel_rspec)
job : https://app.circleci.com/pipelines/github/smartcamp/boxil/38081/workflows/080c6315-e7af-4353-baea-50895b16e911/jobs/313312/parallel-runs/1

Addition : CI was also down as well after updating image to cimg/ruby:3.2-browsers.

DominicLavery · March 25, 2025, 10:14am

Hi @maaaaakoto35,

Sorry to hear that! I’ve opted out the boxil project whilst we investigate

Dom

tbenr · March 25, 2025, 10:29am

Hey @DominicLavery
we are seeing failing tests without any apparent reason (seems like related to general slowdown in some cases)

https://app.circleci.com/pipelines/github/Consensys/teku/35887/workflows/2c80f3cd-ce08-4d3f-a8fe-cb783899a162/jobs/270349

Any hint?

tbenr · March 25, 2025, 10:33am

it is happening systematically since yesterday

DominicLavery · March 25, 2025, 10:36am

Hi @tbenr,

It’s a tricky one. It doesn’t line up with the project being opt’ed in to v2. But looking at the timeline of your issues it could be related to a small bug that got introduced yesterday.

A fix for it has just been rolled out, would you be able to retry your failing job please?

Thanks
Dom

tbenr · March 25, 2025, 10:37am

oky retrying

tbenr · March 25, 2025, 10:48am

Still failing.
I suspect it is related to a slow CPU, because there is also another job that normally was taking 13min, it is now taking 31min

tiagobabo · March 25, 2025, 11:00am

Can someone please opt our organization out? We are experience some issues with Rails and Capybara.

Our org id is: 51d1ac41-f636-4691-a993-7440a6b5b8d7

DominicLavery · March 25, 2025, 11:01am

Sorry @tbenr. I’ve fully reverted the change from yesterday. Please could you give your build one more go?

DominicLavery · March 25, 2025, 11:02am

Hi @tiagobabo.

Sorry to hear that. Please could you please provide a link to an effected job and some details about what you are seeing?

Thanks
Dom

tiagobabo · March 25, 2025, 11:08am

Yes, here it goes: https://app.circleci.com/pipelines/github/carwow/quotes_site/93051/workflows/41b15f02-70ab-4f46-bece-5b32b1195482/jobs/2032725

We started seeing these failures across our capybara specs.

Topic		Replies	Views
Upgrading the underlying operating system for the Docker executor Announcements docker	13	2386	April 2, 2019
Filesystem update for all Docker jobs Announcements	1	3372	February 5, 2021
Upgrading the underlying operating system for the Docker executor - 18.04 to 20.04 Build Environment	2	2847	December 16, 2021
Docker executor upgrade Announcements docker	0	1225	December 4, 2020
Upgrading the Docker version used for the Docker executor to Docker 24 Build Environment	1	1059	November 27, 2023

Docker Executor Infrastructure Upgrade

Related topics