GitHub outage on 21 October 2018

drazisil · October 22, 2018, 4:11pm

At 22:52 UTC on 21 October (15:52 PDT), GitHub experienced a network partition and subsequent database failure. This has caused intermittent issues with webhook delivery and other events that CircleCI depends on to manage your CircleCI workflows and jobs. The downtime has also prevented us from making API calls to GitHub to check on authorization and project/organization status.

Until GitHub has ended their outage, we will be unable to know fully what changes or issues this has caused with your projects or jobs within our system. Furthermore, when GitHub does start delivering webhooks again, we will see a surge of jobs starting, and we will immediately scale in response and remain overprovisioned until the surge is complete.

If you have any questions or issues, please reply to this Discuss post, or file a ticket with our support team.

GitHub summary: https://blog.github.com/2018-10-21-october21-incident-report/

chrismo · October 22, 2018, 4:29pm

Is there a way to start a new workflow manually? Or is via webhook the only option?

drazisil · October 22, 2018, 4:33pm

If you have Build Processing enabled, the trigger a new build API endpoint may work https://circleci.com/docs/api/#trigger-a-new-build-by-project-preview

prajnak · October 22, 2018, 4:42pm

That worked! Thanks so much @drazisil

rohara · October 22, 2018, 4:47pm

GitHub just resumed delivery of webhooks. We’re braced for impact.

rohara · October 22, 2018, 5:32pm

We’re seeing very slow webhook delivery at this time.

EDIT: Update from GH:

We have temporarily paused delivery of webhooks while we address an issue. We are working to resume delivery as soon as possible.

deviantintegral · October 22, 2018, 6:08pm

A quick script I put together for our team to trigger builds for now.

Aside, I’m surprised the API doesn’t return a reference to the newly created workflow.

#!/bin/bash

PROVIDER=github # or bitbucket
ORG=my-project-org-or-user
PROJECT=my-project-name

if [ -z $CIRCLECI_TOKEN ]
then
  echo "Create a token at https://circleci.com/account/api and export it as CIRCLECI_TOKEN."
  exit 1
fi

if [ -z $1 ]
then
  echo "Usage: $0 <branch>"
  exit 2
fi

echo "Triggering build for branch $1..."

curl -X POST -H "Content-Type: application/json" \
  -d '{"branch": "'$1'"}' \
  https://circleci.com/api/v1.1/project/$PROVIDER/$ORG/$PROJECT/build\?circle-token\=$CIRCLECI_TOKEN

rohara · October 22, 2018, 6:24pm

Webhooks are flowing properly at this time.

pawprintdigital · October 22, 2018, 6:25pm

Top man, thanks!

gruselhaus · October 22, 2018, 6:25pm

Thank you!!!

peterschussheim · October 22, 2018, 6:55pm

webhooks are still not triggered

rohara · October 22, 2018, 7:01pm

Yes they are, but GitHub is flushing their backlog. A brand new webhook will probably be very delayed.

rachelslurs · October 22, 2018, 7:35pm

do I need to be on version 2.1 to run this script?

halfer · October 22, 2018, 8:29pm

I don’t think so @rachelslurs - 2.0 should be fine.

worc · October 22, 2018, 10:22pm

is there any way to see the depth of the queue? the work day is winding down where i’m at and it might be better to call it early if we’re looking at several hours of backlog still to go.

drazisil · October 23, 2018, 12:22am

You need to make sure you have build processing enabled in your project settings. The config version should not matter.

drazisil · October 23, 2018, 12:23am

I believe the queue was/is mostly on GitHub’s side, so we have no visibility into it, sorry. I hope your jobs have run by now.

worc · October 23, 2018, 8:43pm

i was asking about the depth of circleci’s queue. github was reporting that they’d finished their backlog while you were reporting that you still had a backlog. it was a noticeably missing feature where, for example, travis ci’s response to the outage was to show the depth of their queue generated by github reopening the floodgates.

drazisil · October 23, 2018, 9:09pm

I misunderstood. I dont know that we had a queue, but if we did i dont remember seeing it shared in our incident channel. I know we ramped up capacity to prepare for it, but im not in a place to say if we did or not.

Edit: there were updates i missed after i signed off yesterday. It looks like we had queue spikes, and they were quickly cleared. The notes on status.circleci.com are the most acurate since they came from our SRE team. I apparently mispoke, sorry.

drazisil · November 22, 2018, 9:09pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Builds not being triggered by Github webhook Feedback & Bug Reports	4	2786	October 5, 2018
Workflow not triggered by GitHub push Build Environment circle-yml	6	4481	November 5, 2018
Branch not building Build Environment github	11	3603	May 22, 2019
CircleCI build not getting auto triggered Build Environment	1	785	August 11, 2022
Workflow is not triggered by tag push Feedback & Bug Reports	4	1730	May 22, 2019

GitHub outage on 21 October 2018

Related topics