Selenium/standalone-chrome crashes without larger shm or shm volume mounted

2.0

#1

I’m experiencing Net::ReadTimeout (Net::ReadTimeout) on some of my cucumber tests, and found this issue explaining that chrome needs /dev/shm to be bigger.

For a normal docker scenario, we can either run the container with -v /dev/shm:/dev/shm or run privileged and set shm size.

As another alternative, the host docker could be run with run option --shm-size=2g.

How am I to configure this in my 2.0 config.yml?

jobs:
  build:
    docker:
      - image: alienfast/ci-ruby:1.0.4
      - image: library/mysql:5.7.17
      - image: selenium/standalone-chrome:3.1.0

#2

That’s intriguing. I think you’d need to be on the machine executor to get that running properly.


#3

Do you mean I need machine due to the need to have a larger shm or shm volume mounted? Is shm something that circle could increase on the host without needing to mount the volume? I would think that running the selenium images for browser testing is going to be quite common, so it would make sense for CCI 2.0 to take a look at this and accommodate without having to go out of the way and run containers manually with machine…or am I missing something simple here?


#5

You’re not missing anything. I just opened a ticket but I’m certain it’s low priority compared to a lot of other roadmap items.


#6

2 posts were split to a new topic: Ramdisk for in memory databases


#7

@rohara I need some guidance here. I’m very frustrated. I understand this is a beta, and I have put significant time getting to where I am today. I have no complaints about the time spent - I and am very happy with it - except this one issue.

I have a fully functional multi-container config.yml with the docker executor for a rails engine, with a dummy react app running rspec and cucumber (including mysql and selenium/chrome).

This doc talks about multi-container execution with docker executor, and has almost nothing with machine. I have spent a couple hours looking over the forums etc and I see various bits of incomplete or old information; perhaps I’m missing a good doc or example project.

I don’t really have days more to spend to start from scratch to get this to work. I hope I’m wrong about most of this including starting from scratch, but I’m not finding the information I need.

  1. What does it mean to have to convert a project from docker executor to machine? What changes in the config.yml approach?
  2. Should I use compose instead (because I will in production)?
  3. Do I lose caching? How would it work with compose?
  4. Do I need to extract my test results for storing?
  5. Is there an example project that does what normally is done in the docker executor but converted for machine?

It is VERY frustrating to have a config that is fully fleshed out only to run into this small shm volume size issue, which seems to mean I need to throw out all the work we did to get this running on CCI 2.0.

I hope I’ve fundamentally mistaken my situation, and will be quite happy to be wrong about it.

Any help here would be greatly appreciated.


#8

Remove the docker section. Install everything you need. Docker is pre-installed but not much else is.

That sounds great to match your production environment.

Docker layer caching is lost here. We’re working on support for it, but it’s not done yet.
Regular caching is still fine.

Yes but it depends on your specific setup. If you just mount the volume, the test results can already be accessible.

Not really. The docker executor allows you to build all your dependencies into the image but the machine forces you to install everything yourself. With that said, you can just pull in Docker images and run them with your dependencies installed; that could save you some build time.

I took a closer look at the status of the shm on the docker executor and we did increase it in the past because of Selenium testing. Maybe it’s not big enough yet, or maybe there’s another problem. We have a ticket open about it internally to further investigate.


#9

My goal is to take advantage of CCI’s features, not roll my own build with a good bit of scripting, experimentation, and pain. The docker executor and my current setup is pretty fantastic in CCI 2.0 and very much satisfies those goals.

machine seems to be much more like do-it yourself. This may be a breaking point for me where cost/benefit of the very basic Convox build workflow may be more attractive than the machine approach.

It does seem it will take me days of research and experimentation to convert this and get it right based on your comments. I’m considering the effort to switch machine may not be worth it for me.

I’m not crazy right? Browser testing should be quite in-line with any roadmap for 2.0, so it would seem to be a HIGH priority to solve this.

We are trying to go to staging this week and production in a month, so I’ll need to make a decision on this.

Please let me know if you can simply increase the shm config. I’m happy to work with or try anything you need, just let me know.


#10

I have submitted a support request.

With some clarity, this does indeed seem to be a priority problem for 2.0 I should be able to reliably browser test with a selenium container.


#11

No, definitely not crazy. I spent weeks getting Selenium running with certain configurations. It’s a giant pain in the ass in every possible way.

Do you think the shm needs to be 2GB for your project?

Another option- do you think a larger instance would help? We have some larger instances available if you’re eligible- just contact your CSM about it. It might not help, but I think it’s worth a try. It’s a super easy config change on the docker executor.


#12

I’ll try the failing test locally and try to figure out the shm threshold. It’s a small rails-api + react app, I wouldn’t think I need something extra special or a large instance. We do have extensive tests though.

I’ll get back to you.


#14

Still investigating. I’m able to crash the chrome instance locally on a full run with the shared volume. Still looking at tuning parameters to see what is necessary, I cannot expect CCI to succeed if I can’t get through a single threaded run locally.

The crashes are usually ERR_EMPTY_RESPONSE from the javascript console on a json request or even an initial page request, followed by a succession of continued failures, then it bounces out of it and succeeds again. I’ve created an issue with selenium and hope that I can get a lead on using some different settings.


#15

The SHM volume has been 1 GB for about the last month. I suspect this isn’t due to SHM size, but now that you can reproduce it locally, feel free to experiment with larger SHM sizes to see if that fixes things.

We sometimes use selenium recordings to help debug what’s going on. To do this, make sure you have libav and its dependencies installed. This line in your dockerfile is more than you’d need, but it’ll get the job done:

 RUN apt-get update && \
      apt-get install -y libmysqlclient-dev gnupg graphviz python-dev \
                         libgeos-dev unzip sudo build-essential libssl-dev \
                         libffi-dev python-dev libav-tools xvfb

And in your config.yml:

      - run:
          name: Make selenium recordings
          command: |
            mkdir /tmp/selenium-recordings/
            avconv -f x11grab -r 30 -s "1280x1024" -i :99 "-c:v" "libvpx" -qmin 0 -qmax 50 -crf 10 -b:v "2M" -threads 0 /tmp/selenium-recordings/recording-$(date +%s).webm
          background: true
     
# Run tests between these two steps

      - store_artifacts:
          path: /tmp/selenium-recordings
          destination: selenium-recordings

Check out the recordings and see if they tell you anything helpful. Sometimes you can see tab crashes at a particular action (i.e. a button click) or at a particular event (starting a new test).


#16

Thanks @eric and @rohara. I’m venturing down the path of some stress testing and memory usage analysis. It takes about 46 sign in/out attempts with shm volume mounted on the docker image and my local native chrome, which leads me to think this has nothing to do with circleci or the selenium image at all, but some kind of hard limit chrome is setting on memory use.

Once I narrow down the problems (which point to too much memory use/leak), I’ll come back and update with my findings.


#17

I’ve finally narrowed this down, and strangely enough, it was neither a leak or shm issue (in local testing).

When running cucumber, I would see Failed to load resource: net::ERR_EMPTY_RESPONSE in the chrome console after some time, or Net::ReadTimeout (Net::ReadTimeout). I seem to have narrowed it down to the Capybara server - it hangs and stops serving. Sometimes it recovers, other times it does not. I added a route for a simple health check using an HTTP GET request, and it proves that the server is failing to respond.

To further this theory, I ran a typical Thin rails server, and set Capybara.run_server = false to test against development, and simply put, I cannot get my stress tests to stop working.

I have not yet looked at Capybara to see how it runs a server, but it is clear for my case with rails-api/react app that it cannot keep up.


#18

Some additional notes on this subject:

  1. You can run a puma server with capybara by setting Capybara.server = :puma. This currently is not in their README.
  2. Since Capybara is running on the host, and selenium/standalone-chrome:3.3.0 is technically a different host (by IP), be sure things like rack-attack aren’t throttling you. I stumbled upon this inadvertently when puma started complaining TypeError: no implicit conversion of Symbol into String, which happened to be a rack-attack Retry-After header. I have a feeling this has been the problem this entire time, and I never encountered it previously because it overlooks requests from localhost.

#19

Final notes:

  1. This was definitely caused by our rack-attack config throttling chrome since it was no longer on the white-listed localhost.
  2. :puma will be the default in Capybara 3, but if you are using a similar single page app setup, switching to it now seems prudent.

#20

Kevin, this is amazing work. Thank you so much for sharing this. I’d like to share this with the community and get this into our docs. Can you verify the below sums everything up?

Symptoms

  • Flakey integration tests

  • When running Cucumber, the Chrome console reports Failed to load resource: net::ERR_EMPTY_RESPONSE after some time

  • When running Cucumber, the console reports Net::ReadTimeout (Net::ReadTimeout)

Problem
Capybara version < 3 uses Webrick as a default test web server, which appears to leak memory.
https://github.com/teamcapybara/capybara/issues/1855

Solution
For Capybara 2+, set use Puma as the test server in your test setup file (spec_helper.rb, env.rb, etc) with:

Capybara.server = :puma


#21