Reuse ECR Login Token and/or retry login

docker
aws

#1

We often encounter build failures due to ECR login throttling

Build-agent version 0.0.4869-fac853b (2018-04-17T20:59:55+0000)
error authentication with ECR: ThrottlingException: Rate exceeded
status code: 400, request id: c7b48609-475c-11e8-9f28-77bef5fdcc37

This happens during Spin Up Environment step. So there’s nothing we can do on our side to mitigate this failure


#2

How many times is that being tried, and in what timeframe?

Is it being tried a lot because it is happening once per job, and you have many jobs in a workflow?

Can you apply to AWS to get the throttle limits changed for your account on ECR?


#3

We have a workflow which has 32 jobs, each job uses 6 docker images from AWS ECR which means we are doing 192 logins at the same time for 1 build. Multiply that by N number of concurrent builds, and it’s easy to see how we’d run into throttling limits.

Per the documentation (below):

  • the throttle for the GetAuthorizationToken action is 4 transaction per second (TPS), with up to a 200 TPS burst allowed
  • the throttle on the GetAuthorizationToken operation cannot be increased on a per-account basis
  • To handle throttling errors, implement a retry function with incremental backoff into your code.
  • To avoid needing to retry, the token should be reusable for a certain period of time.

https://docs.aws.amazon.com/AmazonECR/latest/userguide/common-errors.html#error-429-too-many-requests


#4

Gotcha. I’ve not used them, but could persist your images using workspaces?

https://circleci.com/docs/2.0/workflows/#using-workspaces-to-share-data-among-jobs

However, I admit I don’t know how compatible that is with Circle’s native way of instantiating containers, since you may not be able to set anything up prior to Circle doing its thing.

I have a greater flexibility with my own (perhaps unusual) configuration as I do manual pulls of the images I use, and then start them all with Docker Compose inside a single Docker image. This means I could, if I wished, park the images in a workspace with docker save and then docker load them when required. I can also put sleeps between pulls, if the registry provider throttling is triggered.


#5

This is certainly on option, but I think failing on throttling is just a bug. I’ve filed a bug report with Circle.


#6

You can’t persist images using workspaces or cache. This is a pretty bad issue, and it’s certainly complicated. I’m very confident you’re the first to hit this issue.

Talking to AWS about your rate-limit could be a faster turnaround. I don’t know what’s going to be involved in fixing this from our end… but it’s a very legitimate (and frustrating!) bug report so we’ll certainly work on addressing it.


#7

Per the documentation (below):

  • the throttle for the GetAuthorizationToken action is 4 transaction per second (TPS), with up to a 200 TPS burst allowed
  • the throttle on the GetAuthorizationToken operation cannot be increased on a per-account basis
  • To handle throttling errors, implement a retry function with incremental backoff into your code.
  • To avoid needing to retry, the token should be reusable for a certain period of time.

https://docs.aws.amazon.com/AmazonECR/latest/userguide/common-errors.html#error-429-too-many-requests

The last 2 points are something that Circle can be handling which would resolve this problem.


#8

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.