Reuse ECR Login Token and/or retry login

We often encounter build failures due to ECR login throttling

Build-agent version 0.0.4869-fac853b (2018-04-17T20:59:55+0000)
error authentication with ECR: ThrottlingException: Rate exceeded
status code: 400, request id: c7b48609-475c-11e8-9f28-77bef5fdcc37

This happens during Spin Up Environment step. So there’s nothing we can do on our side to mitigate this failure

How many times is that being tried, and in what timeframe?

Is it being tried a lot because it is happening once per job, and you have many jobs in a workflow?

Can you apply to AWS to get the throttle limits changed for your account on ECR?

We have a workflow which has 32 jobs, each job uses 6 docker images from AWS ECR which means we are doing 192 logins at the same time for 1 build. Multiply that by N number of concurrent builds, and it’s easy to see how we’d run into throttling limits.

Per the documentation (below):

  • the throttle for the GetAuthorizationToken action is 4 transaction per second (TPS), with up to a 200 TPS burst allowed
  • the throttle on the GetAuthorizationToken operation cannot be increased on a per-account basis
  • To handle throttling errors, implement a retry function with incremental backoff into your code.
  • To avoid needing to retry, the token should be reusable for a certain period of time.

https://docs.aws.amazon.com/AmazonECR/latest/userguide/common-errors.html#error-429-too-many-requests

Gotcha. I’ve not used them, but could persist your images using workspaces?

https://circleci.com/docs/2.0/workflows/#using-workspaces-to-share-data-among-jobs

However, I admit I don’t know how compatible that is with Circle’s native way of instantiating containers, since you may not be able to set anything up prior to Circle doing its thing.

I have a greater flexibility with my own (perhaps unusual) configuration as I do manual pulls of the images I use, and then start them all with Docker Compose inside a single Docker image. This means I could, if I wished, park the images in a workspace with docker save and then docker load them when required. I can also put sleeps between pulls, if the registry provider throttling is triggered.

This is certainly on option, but I think failing on throttling is just a bug. I’ve filed a bug report with Circle.

1 Like

You can’t persist images using workspaces or cache. This is a pretty bad issue, and it’s certainly complicated. I’m very confident you’re the first to hit this issue.

Talking to AWS about your rate-limit could be a faster turnaround. I don’t know what’s going to be involved in fixing this from our end… but it’s a very legitimate (and frustrating!) bug report so we’ll certainly work on addressing it.

Per the documentation (below):

  • the throttle for the GetAuthorizationToken action is 4 transaction per second (TPS), with up to a 200 TPS burst allowed
  • the throttle on the GetAuthorizationToken operation cannot be increased on a per-account basis
  • To handle throttling errors, implement a retry function with incremental backoff into your code.
  • To avoid needing to retry, the token should be reusable for a certain period of time.

https://docs.aws.amazon.com/AmazonECR/latest/userguide/common-errors.html#error-429-too-many-requests

The last 2 points are something that Circle can be handling which would resolve this problem.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.