We’ve been running our builds using the ruby
image with the -browsers
flag. In the past month or so, we’ve started seeing an increasingly large amount of flake. It complains that “Process unexpectedly closed with status 1”. Digging deeper into the selenium logs, I saw an error with geckodriver unable to connect to the display on :99
. Deeper still, it appears that Xvfb
is not properly loading in these cases. I’ve written a script to detect when the display is booted up by checking the /tmp/.X11-unix
directory. But after waiting for 2 minutes, on the builds that fail, Xvfb has still not loaded.
Prefixing end to end test runs with xvfb-run -a
seems to have resolved the issue for now. But it seems to me like this shouldn’t be necessary.
This issue is also being seen running Cypress - see Missing X server or $DISPLAY · Issue #31484 · cypress-io/cypress · GitHub
This is a recent regression in CircleCI and is not being seen on GitHub Actions.
I’ve asked the support team for more information on the issue. They replied that forcing the X server to run in background is a best practice. I totally disagree, but I don’t have much choice and accept it.
We do it this way.
command: |
set +e
Xvfb :99 -screen 0 1280x1024x24
if [ $? -eq 0 ]; then
echo "Xvfb started successfully"
else
echo "Xvfb is already running"
fi
background: true
The issue started 3 weeks ago on few of our builds. But as we are running cypress intensively … it was quite painful.
This is the exact answer i’ve received:
You’re right that repeatability is a core expectation from CI platforms.
While our convenience images do include Xvfb, we’ve found that explicitly starting services leads to more consistent and predictable behavior across builds. The 95/5% split you’re seeing suggests there could also be a timing or resource-related edge cases affecting your builds.
By explicitly stating it; it ensures the service is always running regardless of environment variables or container initialization quirks.
I’ve requested in Feature Request: Update to cimg:base:2024.12 according to policy · Issue #462 · CircleCI-Public/cimg-node · GitHub for the cimg/node
Docker images to get updated. They’re running an old version of Ubuntu - 22.04.3 LTS which was released almost two years ago on August 10, 2023.
It’s not possible to say if the issue is due to the outdated Ubuntu version, but before doing any more detailed troubleshooting it would definitely be helpful to have the convenience images updated. According to the CircleCI convenience images support policy I would have expected that to have happened already.
Hi @mockdeep
I noticed that the cimg/ruby
images are also based on the outdated Ubuntu - 22.04.3 LTS release (see cimg-ruby/Dockerfile.template at main · CircleCI-Public/cimg-ruby · GitHub).
Do you have any feeling whether the Xvfb problem might be related to the use of the outdated release?
@MikeMcC399 it’s hard to know. We’ve been using the convenience images without issue for several years, including, I assume the Ubuntu 22 based ones. Maybe there was a new release of the image with some update to xvfb that broke things, or maybe there was some hardware change that caused the issue? I really don’t know.
Considering that the image hasn’t been updated in quite a while and the issue only started recently, then your guess about hardware sounds more likely. A hardware change could have affected timing in some way that makes starting up Xvfb unreliable. I’m making a guess here as well!
There isn’t actually any change to Xvfb even if the underlying Ubuntu image were to be updated from 22.04.3 LTS to 22.04.5 LTS: both show
$ apt list xvfb
xvfb/now 2:21.1.4-2ubuntu1.7~22.04.14 amd64 [installed,local]
so it’s all a bit of a mystery! The issue seems to have started around April 2025.
cimg:node:22.17.0-browsers
have been published based on Ubuntu 24.04.2
LTS and the sporadic issue with Xvfb is still occurring. It’s going to need to some more investigation.
EDIT: There’s a limit of 3 posts in a row, so I can’t put this in a new post, unless somebody replies here.
There seems to be a pattern, when Cypress is reporting “Missing X server or $DISPLAY”, the CircleCI step “Spin up environment” is taking significantly longer than when no error occurs.
For instance, in a successful job, “Spin up environment” is showing 10s, whereas in unsuccessful jobs it is 30s or even 2m.
For the longer running step of 2m, there is often an additional error message
Error getting image metadata for cimg/node:22.17.0-browsers: Get “http://%2Fvar%2Frun%2Fdocker.sock/v1.41/images/docker.io/cimg/node:22.17.0-browsers/json”: context deadline exceeded
WARNING: docker image cimg/node:22.17.0-browsers does not specify an architecture
So it looks like the issue could be caused by a resource constraint during provisioning. This isn’t something that the user has control over and I don’t know how much this can be investigated further on the user side.