Over the past few days, our reliability has not met our standard of excellence. We are writing to apologize for any disruptions this may have caused for you and your work. Resolving these issues is our team’s top priority, and we are doing everything we can to resume the level of service you expect from CircleCI.
What happened:
The majority of recent incidents are related to an unexpectedly high increase of jobs processed on our platform. This new level of requests has caused one of our primary datastores to be much slower than normal, and at times, unresponsive. These datastore issues have resulted in slow or stopped job processing for some customers.
What we are doing to resolve these ongoing issues:
Historically, we have been able to plan and support the ongoing growth of our platform through predictable scaling of this datastore. In this instance, however, we have witnessed 10-100x increases in latency and complete datastore stalls. We are working closely with the datastore technology provider to tune the datastore itself and adapt our own application code to make optimal use of the datastore.
Where to learn more:
We will release a full root cause analysis when we have observed a sufficient period without further datastore incidents. When we do so, you will be able to find it on our Status Page, and we will also link to it on Discuss here.
Thank you for your continued patience and support. We know that your CI/CD pipelines are a mission-critical part of the work your team does. We are committed to being as transparent as we can, and apologize again for any impacts you and your team have felt.
Sincerely,
The CircleCI Team