1.0 builds are broken on CCIE 2.0


#1

Goal:

A working CCIE 2.0 cluster in AWS to which our CCIE 1.0 work is migrated.

What we have done:

What we found:

  • 2.0 builds work fine
  • 1.0 builds hang in a reported RUNNING state.

Troubleshooting:

  • We spoke with Rose from the CircleCI support team.
  • She requested that we check the logs of the scheduler docker container on the services box.
  • When builds are triggered, we see the scheduler report that the build has been sent to the builder and started.
  • A few seconds later an error is reported, an exception stating that a connection to the public ip on port 443 has timed out. (Error log provided below.)
  • Testing this, I reopened port 443 on the builder box’s public IPs (I changed the security group for the builder’s autoscale group) to allow connections
    to and from the public IPs of the builders and main scheduler/services box) and all then worked as intended.
  • It appears that the scheduler/services box (scheduler process in particular) pulls the build status from the builder box over the public internet instead
    of using a private network.

Summary:

  • We do not intend to expose our CircleCI traffic to the public internet.
  • We are hoping we missed a configuration somewhere to explicitly dictate that all scheduler/services->builder communication use private IPs.
  • If this is not a configuration issue, we are hoping that you have another solution that will keep our CCIE 2.0 traffic private.

The following log entries show the error/problem reported above:

Oct 10 22:20:27.560:+0000 INFO circle.backend.build.run-queue.dispatcher handing build=awesome-org/awesome-project/4 to dispatch thread
Oct 10 22:20:27.581:+0000 INFO circle.backend.build.run-queue.dispatcher dispatcher running: build=awesome-org/awesome-project/4
Oct 10 22:20:27.741:+0000 INFO circle.vcs.status marking build-name=ops/infra/4 for project=ops/infra (commit=f75eefdc8df00a9db2e083e0dff445799d3f7eb2) (build-status=:running)>
Oct 10 22:22:35.015:+0000 INFO org.apache.http.impl.client.DefaultHttpClient I/O exception (java.net.ConnectException) caught when connecting to {s}->https://54.X.X.X:443: Con>
Oct 10 22:22:35.015:+0000 INFO org.apache.http.impl.client.DefaultHttpClient Retrying connect to {s}->54.X.X.X:443

54.X.X.X is the public IP address of one of the buidler boxes.

Thank you for your help.


#2