Connectivity issues in jobs using IP ranges

Hello,

Since yesterday we started experiencing random connectivity issues in jobs that have the IP ranges feature enabled. These connectivity issues seem to be mostly focused around the SSL handshaking against third party services (e.g. Docker hub, Github, etc.).

These happen almost every time we run a workflow in CircleCI, and they happen randomly at different steps. We’ve been retrying the same workflows for the last 24 hours and a few times we got lucky and the workflows went through but most of the times they fail at some job/step.

Anyone else experiencing this?

Some log examples:

curl: (35) OpenSSL SSL_connect:
Connection reset by peer in connection to github-releases.githubusercontent.com:443
Error while installing datadog/datadog v3.2.0:
Get "https://github-releases.githubusercontent.com/93446053/d75d4ada-0e61-404a-8c49-02ba4862abef?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=*********%2F20210826%2F*********%2Fs3%2Faws4_request&X-Amz-Date=20210826T132714Z&X-Amz-Expires=300&X-Amz-Signature=*********&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=93446053&response-content-disposition=attachment%3B%20filename%3Dterraform-provider-datadog_3.2.0_linux_amd64.zip&response-content-type=application%2Foctet-stream":
read tcp 192.168.160.3:41082->185.199.109.154:443: read: connection reset by peer
Starting container cimg/python:3.6
Warning: No authentication provided, using CircleCI credentials for pulls from Docker Hub.
  image is cached as cimg/python:3.6, but refreshing...

  Error pulling image cimg/python:3.6: Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: unknown: unable to scan account service account row from query: context deadline exceeded... retrying
  image is cached as cimg/python:3.6, but refreshing...

  Error pulling image cimg/python:3.6: Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: received unexpected HTTP status: 502 Bad Gateway... retrying
  image is cached as cimg/python:3.6, but refreshing...

  Error pulling image cimg/python:3.6: Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: received unexpected HTTP status: 502 Bad Gateway... retrying
  image is cached as cimg/python:3.6, but refreshing...

  Error pulling image cimg/python:3.6: Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: received unexpected HTTP status: 503 Service Unavailable... retrying
  image is cached as cimg/python:3.6, but refreshing...

  Error pulling image cimg/python:3.6: Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: received unexpected HTTP status: 503 Service Unavailable... retrying
  image is cached as cimg/python:3.6, but refreshing...

Error response from daemon: Head https://registry-1.docker.io/v2/cimg/python/manifests/3.6: received unexpected HTTP status: 503 Service Unavailable
1 Like

Another data point that a colleague pointed out is that CircleCI is also having connectivity issues against our Kubernetes cluster (although we route the traffic via a DNS record in Cloudflare):

Cannot connect to OpenFaaS on URL: ************************************.
Get "************************************/system/functions":
read tcp 192.168.80.3:43704->104.22.72.68:443: read: connection reset by peer
1 Like

Hi @jverce! Thank you for detailing this. We’re deploying a fix that should take around 24 hours to roll out. If this is still happening within a day or so, I’d love an update.

Thanks!
So far we didn’t see any improvements, but I’ll continue to monitor on our side during the day.

Thanks again :slight_smile:

@thekatertot still having the same issues.
Anything we need to change on our side? Should we delete/re-create the branch being built in CircleCI?

Thanks!

We’re investigating now. Of course, it looks fixed on our end, but we’re digging in to see what could be persisting. Deleting/re-creating shouldn’t make a difference. I’ll keep you updated!

1 Like

Any news? Here the error I get:

#!/bin/bash -eo pipefail curl -sSL https://cli.openfaas.com | sudo -E sh

Finding latest version from GitHub 0.13.13 Downloading package https://github.com/openfaas/faas-cli/releases/download/0.13.13/faas-cli as /tmp/faas-cli curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to github-releases.githubusercontent.com:443

CircleCI received exit code 0
1 Like

Thank you @mathieuforest for including your error message! I’ve passed it along so we can use it as part of our investigation. I will update when we know more.

Hello all,

are there any news on that bug?
I am using IP ranges as well and I am currently facing the same issue when trying to download geckodriver for automated tests. I also noticed that it works from time to time (approx. every 3rd try).
The error message for that is the following:

OpenSSL SSL_connect: Connection reset by peer in connection to github-releases.githubusercontent.com:443

I tried downloading the geckodriver with both curl and wget. Both have the same issue.

Hello everyone!

Thank you again for the patience on this issue. We pushed out today what we believe is a viable work-around to this network connectivity issue. The work-around should fix the reliability issues that were causing connections to be reset.

The caveat to the work-around is that if your job enables IP ranges and pushes anything to a destination that is hosted by the content delivery network (CDN) Fastly, the outgoing job traffic will not be routed through one of the well-defined IP addresses listed above. Instead, the IP address will be one of the IPs that AWS uses in the us-east-1 or us-east-2 regions. This is a known issue between AWS and Fastly that CircleCI is working to resolve.

Please reply and let us know if this does not solve the connectivity issue for your team.

cc: @jverce @mathieuforest @tmklch

1 Like

Hello everyone,

hope you are all well and healthy.
Just wanted to give you some feedback on this issue.
We are using the IP-ranges in our repo again since you fixed the issue and I have to say, it works like a charm now.
Thanks for providing such a quick fix!