Information on recent Workflow delays

Sara · April 10, 2019, 10:39pm

Over the past few days, our reliability has not met our standard of excellence. We are writing to apologize for any disruptions this may have caused for you and your work. Resolving these issues is our team’s top priority, and we are doing everything we can to resume the level of service you expect from CircleCI.

What happened:

The majority of recent incidents are related to an unexpectedly high increase of jobs processed on our platform. This new level of requests has caused one of our primary datastores to be much slower than normal, and at times, unresponsive. These datastore issues have resulted in slow or stopped job processing for some customers.

What we are doing to resolve these ongoing issues:

Historically, we have been able to plan and support the ongoing growth of our platform through predictable scaling of this datastore. In this instance, however, we have witnessed 10-100x increases in latency and complete datastore stalls. We are working closely with the datastore technology provider to tune the datastore itself and adapt our own application code to make optimal use of the datastore.

Where to learn more:

We will release a full root cause analysis when we have observed a sufficient period without further datastore incidents. When we do so, you will be able to find it on our Status Page, and we will also link to it on Discuss here.

Thank you for your continued patience and support. We know that your CI/CD pipelines are a mission-critical part of the work your team does. We are committed to being as transparent as we can, and apologize again for any impacts you and your team have felt.

Sincerely,

The CircleCI Team

pcreux · April 10, 2019, 10:50pm

Thank you for the update. Out of curiosity, what datastore technology do you use?

drazisil · April 10, 2019, 10:59pm

We use a number of different techs in our stack. The later write up will go into more depth, but you can see our stack here https://stackshare.io/circleci/circleci

rohansingh · April 11, 2019, 12:01am

Time to start taking bets on what part of the stack it is

In all seriousness though, I know hitting those cliffs can suck. Best of luck with the issues, looking forward to the full update.

jbreckman · April 11, 2019, 12:09am

This post makes a lot of sense. It’s a bit at odds from the official messages being posted, blaming your cloud provider and waiting on actions from your cloud provider.

Curious how these tie together

marcthayer · April 11, 2019, 12:27am

Does anyone have an alternative service they prefer? At this point it would be difficult to extricate ourselves from CircleCI, at least seemingly, but it would be nice to know if others have had more reliable success elsewhere.

drazisil · April 11, 2019, 12:39am

The last issue we had today was cloud provider related and, afaik, not related to our recent issues. It just happened to occur on a rather bad day.

rohansingh · April 11, 2019, 2:43am

We’ve been using Buildkite recently for more things, because its model of “bring your own agent” works well with some of our builds that have some specific requirements. It also works well with a monorepo, since we can have dynamic build pipelines throughout the project. Finally, it has the advantage that if there’s any issues with the agents, we own them and can troubleshoot and fix them.

That said, increased reliability was explicitly not a goal of using Buildkite. There’s no particular reason for me to believe that any other CI/CD provider is going to have better reliability in either the control plane or the build agents. Right now CircleCI is definitely having issues, but they seem to be of the type that could happen anywhere.

red2678 · April 11, 2019, 10:32am

Why does the status page say “All Systems Operational.” That seems to contradict the email I just got.

VincentCATILLON · April 11, 2019, 10:42am

Tell me if I’m wrong, but the mail is talking about issues encountered during the few days past.

At this time, status seems fine: https://status.circleci.com

halfer · April 11, 2019, 10:48am

The system overload occurred yesterday, and it looks to be all fixed now.

Many thanks to the CircleCI team for pulling out the stops to get it sorted - your efforts are much appreciated

red2678 · April 11, 2019, 11:10am

I guess I misinterpreted the email. I was under the impression that there were still some lingering issues.

Either way, yes, thanks so much to the CCI team! your platform!

Topic		Replies	Views
Postmortem: Incidents of October 22nd–29th Announcements	0	1415	November 15, 2019
Builds getting stuck in the queue Feedback & Bug Reports	7	1936	April 10, 2019
CircleCI build issues (Why is this discussion closed?) Build Environment	6	1810	April 10, 2019
Postmortem: March 26 - April 10 Workflow Delay Incidents Announcements	0	1664	April 29, 2019
Public IR - 2025-10-20 - Build failures and delays due to upstream service disruption Announcements	0	114	November 5, 2025

Information on recent Workflow delays

Related topics