Tests run slower on 2.0


#1

I’ve just completed the migration to 2.0 for our rails app and I’m concerned that our tests are actually running slower on 2.0 that on 1.0. I’m not talking about the entire build here but just our test commands.

We have a test suite that comprises of rspec and cucumber tests. On circle 1.0 the rspec tests take just over 5 minutes (5:06-5:45) to complete, the cucumber suite takes slightly longer and is usually just under 6 minutes (5:40-5:58). Running that same test suite on circle 2.0 produces slower and more inconsistent test run times, rspec is between 5:45-7:20 and cucumber is between 10:51-18:41. This a pretty big slow down in the amount of time the tests take to run, especially the cucumber tests.

I am surprised there is such a difference in the amount of time taken to run the tests especially as lots of people have reported speed improvements. I have cut down on the setup time for each build by using caching but this is only a saving of <2 minutes and is eclipsed by the additional time the rspec and cucumber tests take to run.

We are using rails 5.0, MySQL DB, and phantomjs. We have a custom docker image that is based off of circleci/ruby:2.3-node and simply installs phantomjs, we also use the circleci/mysql:5.6 for the database.

Here is our circle config file:

version: 2

jobs:
  build:
    working_directory: /home/circleci/ytb

    docker:
      - image: yourtradebase/ruby:2.3-node-phantomjs-0.0.1
        environment:
          RAILS_ENV: test
          RACK_ENV: test
      - image: circleci/mysql:5.6

    steps:
      - checkout
      - run: cp config/{database_circleci,database}.yml

      # Run bundler
      # Load installed gems from cache if possible, bundle install then save cache
      # Multiple caches are used to increase the chance of a cache hit
      - restore_cache:
          keys:
            - gem-cache-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
            - gem-cache-{{ .Branch }}
            - gem-cache
      - run: bundle install --path vendor/bundle
      - save_cache:
          key: gem-cache-{{ .Branch }}-{{ checksum "Gemfile.lock" }}
          paths:
            - vendor/bundle
      - save_cache:
          key: gem-cache-{{ .Branch }}
          paths:
            - vendor/bundle
      - save_cache:
          key: gem-cache
          paths:
            - vendor/bundle

      - run: bundle exec rubocop
      - run: bundle exec rake db:create db:schema:load --trace
      - run: bundle exec rake factory_girl:lint

      # Precompile assets
      # Load assets from cache if possible, precompile assets then save cache
      # Multiple caches are used to increase the chance of a cache hit
      - restore_cache:
          keys:
            - asset-cache-{{ .Branch }}-{{ checksum "VERSION" }}
            - asset-cache-{{ .Branch }}
            - asset-cache
      - run: bundle exec rake assets:precompile
      - save_cache:
          key: asset-cache-{{ .Branch }}-{{ checksum "VERSION" }}
          paths:
            - public/assets
            - tmp/cache/assets/sprockets
      - save_cache:
          key: asset-cache-{{ .Branch }}
          paths:
            - public/assets
            - tmp/cache/assets/sprockets
      - save_cache:
          key: asset-cache
          paths:
            - public/assets
            - tmp/cache/assets/sprockets


      - run: bundle exec rspec
      - run: bundle exec cucumber

And our custom Dockerfile:

FROM circleci/ruby:2.3-node

# Install runtime dependencies
RUN sudo apt-get update \
 && sudo apt-get install -y --no-install-recommends \
        bzip2 \
        libfontconfig \
 && sudo apt-get clean \
 && sudo rm -rf /var/lib/apt/lists/*


RUN set -x  \
    # Install official PhantomJS release
 && mkdir /tmp/phantomjs \
 && curl -L https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2 \
        | tar -xj --strip-components=1 -C /tmp/phantomjs \
 && sudo mv /tmp/phantomjs/bin/phantomjs /usr/local/bin \
    # Clean up
 && sudo apt-get clean \
 && sudo rm -rf /tmp/* /var/lib/apt/lists/*

Any ideas on what could be causing the slower test runs on 2.0?

Thanks


Postgres with in memory DB
#2

Are your tests database intensive? It could be the database is slower and not the tests themselves. I’ve been looking for data points on slower databases.


#3

I have a test suite that has a similar issue with database performance. The project in question is a Django project on python 2.7 with the official mysql:5.6 image for our database. We’re using remote docker so we can reuse our docker-compose dev environment in the build

On 1.0 the test suite portion of our build takes about 18 minutes, while on 2.0 it takes about 27 minutes. I saw similar issues about a year ago experimenting with a docker-compose build on circleci 1.0.

At the time I suspected btrfs might be the culprit, as I remember reading in 1.0 docs and forum posts that the 1.0 docker environment used btrfs. From my reading I found copy on write or relatime could be a potential issue here if you’re using that in your btrfs setup.


https://btrfs.wiki.kernel.org/index.php/FAQ#Why_I_experience_poor_performance_during_file_access_on_filesystem.3F


#4

We’re using AUFS on 2.0


#5

I managed to speed up our build by taking the database config from CircleCI v1 and building a custom docker image. This is the my.cnf we are now using:

pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock datadir = /var/lib/mysql log-error = /var/log/mysql/error.log bind-address = 127.0.0.1 innodb_flush_log_at_trx_commit = 2 sync_binlog = 0 innodb_use_native_aio = 0

[client] default-character-set = utf8mb4

[mysql] default-character-set = utf8mb4

[mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_unicode_ci


#6

Unfortunately, this doesn’t seem to hold true during peak hours. We’re working on addressing the slow performance across the board.

Thank you to everyone for your patience while we work through this. It’s impacting us, too; you’re certainly not alone.


#7

Is this an AWS issue that as the US comes online AWS is slower or is it something different?


#8

Great question. Currently unsure. We’re still working on it.


#9

Just another “Me too” on this topic.

@rohara - if another private build would help with troubleshooting, feel free to DM me and I’ll get you the links.

publicly hosted docker image + Ruby on Rails and node/webpack with rspec (including poltergeist/phantomjs), postgres


#10

@rohara Howdy, I’ve been working recently to convert our app to use circle 2.0. I have experienced a 10x slowdown in time spent in our database. After SSH’ing into a 2.0 container and a 1.0 container, here are some mysql profiling benchmarks:

1.0:

MariaDB [circle_ruby_test]> TRUNCATE TABLE users_tags;
Query OK, 0 rows affected (0.00 sec)

MariaDB [circle_ruby_test]> show profile for query 1;
+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000024 |
| checking permissions         | 0.000055 |
| Opening tables               | 0.000110 |
| After opening tables         | 0.000006 |
| System lock                  | 0.000003 |
| Table lock                   | 0.000003 |
| After table lock             | 0.001073 |
| Waiting for query cache lock | 0.000005 |
| After table lock             | 0.000005 |
| query end                    | 0.000008 |
| closing tables               | 0.000014 |
| freeing items                | 0.000004 |
| updating status              | 0.000013 |
| cleaning up                  | 0.000004 |
+------------------------------+----------+
14 rows in set (0.00 sec)

2.0

MariaDB [ebth_test]> TRUNCATE TABLE users_tags;
Query OK, 0 rows affected (0.09 sec)

MariaDB [ebth_test]> show profile for query 1;
+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000050 |
| checking permissions         | 0.000051 |
| Opening tables               | 0.000306 |
| After opening tables         | 0.000009 |
| System lock                  | 0.000003 |
| Table lock                   | 0.084814 |
| Waiting for query cache lock | 0.000016 |
| query end                    | 0.000018 |
| closing tables               | 0.000006 |
| Unlocking tables             | 0.000029 |
| freeing items                | 0.000008 |
| updating status              | 0.000049 |
| cleaning up                  | 0.000006 |
+------------------------------+----------+
13 rows in set (0.00 sec)

I have read a few things related to AUFS causing slowdowns. Is there anything we can do?


#11

Can you try one of our images? https://circleci.com/docs/2.0/circleci-images/#mysql


#12

@rohara Thanks for the suggestion. I tried one of these containers as a test, but I am still seeing DB operations running 10x slower than Circle 1.0. In addition, there are incompatibilities in our code with MySQL since we use MariaDB.


#14

Any thoughts on MariaDB users experiencing this performance hit in Circle 2.0?


#15

@rohara - we’d really like to start using Circle 2.0 and I was reminded of it today when I saw the new it was released to the public (congratulations!). Do you think this is just the nature of Docker? Is there any more information we can provide to help diagnose?


#16

With Circle 2.0 my build just went from 15 minutes to 50.

My stack is Rails with Yarn, Postgres and Rspec with Capybara Webkit. So I don’t think it’s a specific database or rspec driver problem.

Images:

  • circleci/ruby:2.4.1-node
  • circleci/postgres:9.5.6-alpine
  • redis:3.2-alpine

For what it’s worth: I had the exact same problem when Travis went over to containerized builds back in the day (sudo: false). The reason was that the DB did not run off RAM but started using the disk, see https://github.com/travis-ci/travis-ci/issues/3049.

Could that be a similar issue here?

UPDATE: I changed the database cleaning method (which runs after every test) from truncating to transaction rollbacks and now the build on Circle 2 is as fast as on Circle 1. I’m not sure why that is, but that was my culprit. And feature specs don’t fail randomly any more, as they used to on Circle 1.


#17

We optimized postgres docker image performances with:

  pg:
    image: postgres:9.6.2-alpine
    command: -c fsync=off -c full_page_writes=off
    network_mode: host
    ports:
      - "${PG_PORT}:5432"
    volumes:
      - "/dev/shm/pg_${PG_PORT}:/var/lib/postgresql/data"

#18

What worked best for me performance-wise was mounting the db container’s data volume in a tmpfs to avoid disk entirely

  db:
    image: mysql:5.6
    tmpfs:
      - /var/lib/mysql

#19

@rohara We are using postgres:9.5-alpine-ram image, still our Rails spec suite takes on average 12:40 (3x parallelism) on CircleCI 2.0, while 1.0 build takes only 09:20. Do you have suggestions on what can be improved more?


#20

@nashbridges I’d try making working_directory: /dev/shm/project to avoid the I/O of the project itself. Just be sure to bundle install into a different directory (like /home/circleci/project/vendor/bundle) and symlink it to where you need. The gems can’t be run in shared memory.


#21

Thanks for the response. It turned out the main factor of slowness was an attempt to also shift from capybara-webkit to standalone chrome via selenium. We reverted back to capybara-webkit and was able to achieve build timings quite the same as in 1.0.