GitHub outage on 21 October 2018


#1

At 22:52 UTC on 21 October (15:52 PDT), GitHub experienced a network partition and subsequent database failure. This has caused intermittent issues with webhook delivery and other events that CircleCI depends on to manage your CircleCI workflows and jobs. The downtime has also prevented us from making API calls to GitHub to check on authorization and project/organization status.

Until GitHub has ended their outage, we will be unable to know fully what changes or issues this has caused with your projects or jobs within our system. Furthermore, when GitHub does start delivering webhooks again, we will see a surge of jobs starting, and we will immediately scale in response and remain overprovisioned until the surge is complete.

If you have any questions or issues, please reply to this Discuss post, or file a ticket with our support team.

GitHub summary: https://blog.github.com/2018-10-21-october21-incident-report/


#2

Is there a way to start a new workflow manually? Or is via webhook the only option?


#3

If you have Build Processing enabled, the trigger a new build API endpoint may work https://circleci.com/docs/api/#trigger-a-new-build-by-project-preview


#4

That worked! Thanks so much @drazisil


#5

GitHub just resumed delivery of webhooks. We’re braced for impact.


#7

We’re seeing very slow webhook delivery at this time.

EDIT: Update from GH:

We have temporarily paused delivery of webhooks while we address an issue. We are working to resume delivery as soon as possible.


#8

A quick script I put together for our team to trigger builds for now.

Aside, I’m surprised the API doesn’t return a reference to the newly created workflow.

#!/bin/bash

PROVIDER=github # or bitbucket
ORG=my-project-org-or-user
PROJECT=my-project-name

if [ -z $CIRCLECI_TOKEN ]
then
  echo "Create a token at https://circleci.com/account/api and export it as CIRCLECI_TOKEN."
  exit 1
fi

if [ -z $1 ]
then
  echo "Usage: $0 <branch>"
  exit 2
fi

echo "Triggering build for branch $1..."

curl -X POST -H "Content-Type: application/json" \
  -d '{"branch": "'$1'"}' \
  https://circleci.com/api/v1.1/project/$PROVIDER/$ORG/$PROJECT/build\?circle-token\=$CIRCLECI_TOKEN

#9

Webhooks are flowing properly at this time.


#10

Top man, thanks!


#11

Thank you!!!


#12

webhooks are still not triggered


#13

Yes they are, but GitHub is flushing their backlog. A brand new webhook will probably be very delayed.


#14

do I need to be on version 2.1 to run this script?


#15

I don’t think so @rachelslurs - 2.0 should be fine.


#16

is there any way to see the depth of the queue? the work day is winding down where i’m at and it might be better to call it early if we’re looking at several hours of backlog still to go.


#17

You need to make sure you have build processing enabled in your project settings. The config version should not matter.


#18

I believe the queue was/is mostly on GitHub’s side, so we have no visibility into it, sorry. I hope your jobs have run by now.


#19

i was asking about the depth of circleci’s queue. github was reporting that they’d finished their backlog while you were reporting that you still had a backlog. it was a noticeably missing feature where, for example, travis ci’s response to the outage was to show the depth of their queue generated by github reopening the floodgates.


#20

I misunderstood. I dont know that we had a queue, but if we did i dont remember seeing it shared in our incident channel. I know we ramped up capacity to prepare for it, but im not in a place to say if we did or not.

Edit: there were updates i missed after i signed off yesterday. It looks like we had queue spikes, and they were quickly cleared. The notes on status.circleci.com are the most acurate since they came from our SRE team. I apparently mispoke, sorry.


#21

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.