Need help with CircleCI parallelization performance issue in Django with Celery tasks

Our system incorporates regular Django app tests along with tests that utilize Celery and task_always_eager=True. To optimize the speed of the tasks, we leverage CircleCI’s parallelization feature using a larger resource class. We run the tests on four parallel threads, and utilize the timing feature.

Upon reviewing CircleCI’s calculated timings and calculating the timing of each partition, it appears that the partitioning was successful, with each partition receiving the expected amount of timing.

However, during actual runtime, the partitioning time for each of the parallel runners can range from 1 to 3 hours. Any thoughts on the possible cause of this issue?

Using task_always_eager = True causes the execution to occur synchronously and seems to be working contrary to the test splitting. This could be useful if you are attempting to execute tasks synchronously but is not suitable for use cases where it can potentially block the caller and reduce the benefits of asynchronous task execution.

https://docs.celeryq.dev/en/stable/userguide/configuration.html#task-always-eager