Within the same job, we started getting different arch values for parallel runs

sakif-imtiaz · November 3, 2022, 8:20pm

So we use arch as a key to cache ruby gems between jobs/steps.

Within same job, some of the parallel runs are starting to get arch = arch1-linux-amd64-6_106

Checking back a couple of days ago, we were consistently getting: arch1-linux-amd64-6_85

Is there a reason these would vary between parallel runs on the same job or workflow? or is this a bug?

gonzaloserrano · January 24, 2023, 11:04am

It happened to me when I rerun the job with SSH, was that your case?

yannCI · January 30, 2023, 6:00pm

Hello @gonzaloserrano,

It doesn’t appear to be related/limited to SSH reruns.

I’m seeing another user reporting this behaviour (Revision inconsistent during workflow).

We’re looking into it.

yannCI · February 1, 2023, 1:05pm

@gonzaloserrano, @sakif-imtiaz,

Please find below the explanation for the discrepancy you’ve observed.

As mentioned in our documentation, the {{ arch }} template captures:

architecture
family
model

The underlying machines on which Docker jobs run are usually EC2 instances with chipsets “Family 6, Model 85”. In some cases, where there isn’t sufficient capacity of these instances, Docker jobs might instead run on EC2 instances with chipsets “Family 6, Model 106”.

Our approach is that we prefer to run on instances with a slightly different CPU than to delay the job, or potentially not run it at all.

We’re currently assessing whether or not we can/should simplify the {{ arch }} template’s granularity to prevent occurrences of the behaviour you’ve observed across your Docker jobs.

For now, I suggest relying on other templates when using the cache feature in Docker jobs.

Let me know if you have further questions.

chengguizi · February 9, 2023, 4:15pm

I would be great if you can remove the family and model. As arch variable is useful for multi arch artifacts building

manuel.fittko · February 21, 2023, 5:53pm

Thanks a lot for reporting this, we have these in our cache keys and therefore were having issues with cache restores. The arch value should definitely be stripped from the CPU family, it’s about distinguishing build artefacts for the different CPU architectures!

liamsharp · September 25, 2023, 1:09pm

This issue has recently hit us with regards to test splitting. We split our tests across 20 machines. We’ve noticed that sometimes a few machines will take longer than the others, something like 20-25% longer, to run the tests. We’ve tracked it down to this issue - the 106 machines are always fast, then 85 machines are always slower.

This has a significant impact on test splitting. The whole idea behind the split is that you are doing it across equivalent machines based on timing data from a previous run on the same machines. We aim to get our tests done in <=20 minutes, but, sometimes, just getting dealt a single 85 machine takes us to 25 mins which is super frustrating.

There is also the financial side - we’re being charged the same amount for both machines per minute, but actually, as they take longer, we get charged more for the slower machine.

Can we please get a configuration option added to say we want to run on identical machines?

Topic		Replies	Views
Value of arch Unstable on Windows Feedback & Bug Reports cache , windows	1	1328	March 3, 2023
Changes in the processor architecture for the Docker executor on CircleCI Announcements 2.0	7	1929	November 21, 2018
Docker_layer_caching inconsistencies Build Environment docker , cache , parallelism , circle-yml , workflow	0	1297	March 18, 2022
Revision inconsistent during workflow Feedback & Bug Reports	3	729	February 4, 2023
Environment variables on last version Feedback & Bug Reports	2	579	November 3, 2018

Within the same job, we started getting different arch values for parallel runs

Related topics