Information on recent Workflow delays

Over the past few days, our reliability has not met our standard of excellence. We are writing to apologize for any disruptions this may have caused for you and your work. Resolving these issues is our team’s top priority, and we are doing everything we can to resume the level of service you expect from CircleCI.

What happened:

The majority of recent incidents are related to an unexpectedly high increase of jobs processed on our platform. This new level of requests has caused one of our primary datastores to be much slower than normal, and at times, unresponsive. These datastore issues have resulted in slow or stopped job processing for some customers.

What we are doing to resolve these ongoing issues:

Historically, we have been able to plan and support the ongoing growth of our platform through predictable scaling of this datastore. In this instance, however, we have witnessed 10-100x increases in latency and complete datastore stalls. We are working closely with the datastore technology provider to tune the datastore itself and adapt our own application code to make optimal use of the datastore.

Where to learn more:

We will release a full root cause analysis when we have observed a sufficient period without further datastore incidents. When we do so, you will be able to find it on our Status Page, and we will also link to it on Discuss here.

Thank you for your continued patience and support. We know that your CI/CD pipelines are a mission-critical part of the work your team does. We are committed to being as transparent as we can, and apologize again for any impacts you and your team have felt.

Sincerely,

The CircleCI Team

5 Likes

Thank you for the update. Out of curiosity, what datastore technology do you use?

3 Likes

We use a number of different techs in our stack. The later write up will go into more depth, but you can see our stack here https://stackshare.io/circleci/circleci

1 Like

Time to start taking bets on what part of the stack it is :upside_down_face:

In all seriousness though, I know hitting those cliffs can suck. Best of luck with the issues, looking forward to the full update.

3 Likes

This post makes a lot of sense. It’s a bit at odds from the official messages being posted, blaming your cloud provider and waiting on actions from your cloud provider.

Curious how these tie together

1 Like

Does anyone have an alternative service they prefer? At this point it would be difficult to extricate ourselves from CircleCI, at least seemingly, but it would be nice to know if others have had more reliable success elsewhere.

The last issue we had today was cloud provider related and, afaik, not related to our recent issues. It just happened to occur on a rather bad day.

We’ve been using Buildkite recently for more things, because its model of “bring your own agent” works well with some of our builds that have some specific requirements. It also works well with a monorepo, since we can have dynamic build pipelines throughout the project. Finally, it has the advantage that if there’s any issues with the agents, we own them and can troubleshoot and fix them.

That said, increased reliability was explicitly not a goal of using Buildkite. There’s no particular reason for me to believe that any other CI/CD provider is going to have better reliability in either the control plane or the build agents. Right now CircleCI is definitely having issues, but they seem to be of the type that could happen anywhere.

2 Likes

Why does the status page say “All Systems Operational.” That seems to contradict the email I just got.

Tell me if I’m wrong, but the mail is talking about issues encountered during the few days past.

At this time, status seems fine: https://status.circleci.com

1 Like

The system overload occurred yesterday, and it looks to be all fixed now.

Many thanks to the CircleCI team for pulling out the stops to get it sorted - your efforts are much appreciated :gift: :tada:

8 Likes

I guess I misinterpreted the email. I was under the impression that there were still some lingering issues.

Either way, yes, thanks so much to the CCI team! :heart: your platform!

4 Likes