Docker Executor Infrastructure Upgrade

Hi,

We’re seeing failures since one of our jobs was moved to cgroupv2 (here for instance).

This happens with pytest, while collecting tests to run. I couldn’t figure out what was the cause.

Could you either help us solve it, or opt us out (if that’s still possible), as this is blocking a release?

Thanks,
Cyril.

Hi @patatepartie,

Sorry to hear that!

We’ve not seen that particular issue before. It’s hard to say if it is directly related to v2 from looking at the history of the project’s jobs so I’ve applied a temporary opt out to help narrow that down.

In this case, the failing job is an arm one. Which has been handled slightly differently. The opt out I’ve applied will only effect arm jobs

Dom

Several of our pipelines are also failing after being migrated to v2. We’d like to request rollback. We created support tickets for the issue but we’ve had radio silence for over a week. This is severely impacting our deployments

Hi @dasph,

I’m sorry to hear this. If you can provide a link to your builds or a ticket number I can look into this

Dom

Thanks for the reply. Our ticket number is 161360. I’ve provided some updates since opening the ticket.

Thanks @dasph,

I’ve taken a look at your ticket and effected jobs. There are some known issues with cgroupv2 in the jvm version the jobs are using. Java 21 is the first version that fully supported it but it does look like 11.0.16 also got a fix.

To support v2 please could you look at upgrading to 1 of these versions? I’ve applied a 7 day opt out to get your jobs running again in the mean time. It can take 10 minutes to take effect

Dom

Thank you, Dominic.
I can confirm that following jobs worked without issues, so the cause was the move to v2.

One of my colleagues tracked it down to the loading of the spacy library (during pytest collection).
Since it’s likely this will happen again once we’re definitely moved to v2, is there a way for us to test the build on v2 without all arm jobs being moved to it? Maybe just for one branch?

Cheers,
Cyril.

Hey Dominic, thanks for the assistence. The jobs are now working.
We’re going to upgrade to 11.0.16. Could you move one of our pipelines to V1 so that we can test it?
Thanks!

Hi @patatepartie & @dasph,

Thank you both for getting back to me and looking into your respective projects.

We have 2 options for making testing v2 less disruptive at the moment.

  1. We can arrange a date and time to opt your project in and ensure someone is available to provide support.
  2. You can create a new project with a replica of your job which we can ensure is opted in. You can then test before your main project is put back on to v2.

Please let me know which works best for you.

Dom

Thanks, Dom.

I went with option 2 and the new project automatically used V2 (and reproduced the failure).
We’ll be able to investigate with this.

Will let you know when we’ve fixed it.

Cheers,
Cyril.

1 Like