Docker Executor Infrastructure Upgrade

Thanks for confirming @Peter-Darton-i2

Are you referring to this thread?

@DominicLavery below as requested, let me know if you need any further information.

https://app.circleci.com/pipelines/github/attiam-blueoak/BlueBillingAPI/2379/workflows/b9a4b0c4-2728-4e7b-8b59-225c404cec50/jobs/7135

Thanks @JarmBlueOak,
I’ve applied the temporary opt out whilst we look into it

1 Like

Hi @DominicLavery, thank you for actioning so promptly. I can confirm post-opt out that the builds no longer have the issue.

Hi @DominicLavery could we get our org removed from this? All docker jobs seem to fail when using the circleci-agent step halt step. I am unsure why this would be related to this change but this seems to be the only difference in our workflow. We would like to opt out and check if this fixes the workflows.
Example builds:
https://app.circleci.com/pipelines/github/WhoopInc/iOS?branch=smartling-translation-completed-cfr2saygp7a5-from-dev
https://app.circleci.com/pipelines/github/WhoopInc/iOS?branch=smartling-translation-completed-yxvueww3gt4j-from-dev

Hi @swarajrao.

Sorry about that. I’ve applied the opt out and we will look into it

Hi @DominicLavery! Yes of course. In our CCI config, we started with

  playwright:
    docker:
      - image: mcr.microsoft.com/playwright:v1.43.0-focal

and updated this to

  playwright:
    docker:
      - image: mcr.microsoft.com/playwright:v1.50.1-noble

See https://playwright.dev/docs/docker. We also updated @playwright/test version in our package.json from 1.43.0 to 1.50.1. I hope this helps someone else hitting this issue!

Thanks so much @iwakoscott!

The error was just a timeout error - Error: Timed out waiting 60000ms from config.webServer. @DominicLavery

Thank you, if it’s helpful I can confirm the jobs using circleci-agent step halt work now

Thanks for confirming @swarajrao.

Could I just double check. Did this workflow suffer the same issue that you are reporting: https://app.circleci.com/pipelines/github/WhoopInc/iOS/74349/workflows/5837fd55-5125-4e58-9aa8-bcb53243f6aa ?

That seems to be on v1, so I’m wondering if there is some else going on as well

That workflow was a separate issue (using too small of an executor) that has since been fixed

Thanks for the confirmation @LexLuthr

We’ve deployed a fix for the cause of the crash so I’ve opted your project back in.

Sorry for the inconvenience there. Please let us know if you spot any thing else

Dom

Hi @DominicLavery is there a way we could have this change tested in one project?

Hi,

we at commercetools are also seeing pipelines failing without us having made any changes to them. We’ve restarted pipelines which succeeded the day before yesterday and since yesterday those same pipelines which succeeded persistently fail. As a result certain pipelines are impossible to get green from our side. We’ve tried many things by now but can’t seem to resolve the issue on our side.

Can you please have a look at out pipelines (happy to share via DM) and opt us out of the upgrade? With the upgrade in place we’re seeing issue being able to use CircleCI to deploy.

Best

It appears that this has changed the default TERM, which for Gradle users changes the default logging style (to rich, which is more appropriate for interactive builds than CI) and prevents builds that run on v1/v2 of the container runtime sharing configuration cache entries.

Calculating task graph as configuration cache cannot be reused because environment variable 'TERM' has changed.

Hi,

We’re seeing PostgreSQL timeout errors again for etalab/transport-site.

Previous issue: Docker Executor Infrastructure Upgrade - #12 by DominicLavery

Hi @AntoineAugusti,

Apologies for that, I had applied a new opt out to the project however that has not taken effect. We are looking into why.

The workaround we had previously applied for your project had to be reverted as it was causing some system issues. This worked around a bug in the version of Erlang/OTP in the container the project is using, which is fixed in newer versions.

To support the v2 runtime the project will need to use a newer version (25.3.2.12 or later I believe). We will continue to try and fix the opt out for your project however it will cease to apply by April 1st.

Dom

Hi @AntoineAugusti

Your opt out should now be applied.

To let you know what went wrong: The rollout is being applied incrementally within categories of jobs. The project was in a category that is already complete and so wasn’t eligible for an opt out in our system. I’ve updated the service responsible with an exception for the project.

Sorry again for the troubles there.

Please let me know when you’ve had a chance to look at upgrading

Dom

I retried a build and it failed. Is this expected?