CircleCI 2.0 Builds Timing Out Unexpectedly

Sometimes, the test run gets stuck on some random test and fails the test run, timing it out.

Also, the test cases run under 10min on CircleCI before as well as in my local setup.

This behaviour seems to be erratic and unexpected. Please resolve this.

I doubt anyone will be able to help you in response to such a brief post. Would you supply your config.yml and your log output? Please supply both in text format, with block/code formatting applied.

Are you able to do some debugging to find out where your test gets stuck? Is it a browser test? The more information you can share here, the more helpful responses might be.

CircleCI Config:

version: 2
jobs:
  build:
    branches:
      only:
        - master
        - develop
    environment:
      TZ: "/usr/share/zoneinfo/America/Los_Angeles"
    docker:
      - image: circleci/build-image:ubuntu-14.04-XL-922-9410082
      - image: circleci/mysql:5.7.22
        environment:
          MYSQL_ROOT_PASSWORD: root@123
          MYSQL_DATABASE: circle_bigquery
      - image: circleci/mongo:3-ram

    steps:
      - checkout
      - run:
          command: |
            sudo mkdir -p /var/log/adsnative
            sudo chmod -R 777 /var/log/adsnative
            git submodule sync
            git submodule update --init --recursive
            find . -name "*.pyc" -exec rm -rf {} \;
      - restore_cache:
          key: v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: install global dependencies
          command: |
            sudo apt-get update
            sudo apt-get install -y python-virtualenv python-dev redis-server
            sudo pip install -U pip
            sudo pip install -U virtualenv
      - run:
          name: install django dependencies
          command: |
            virtualenv venv
            . venv/bin/activate
            pip install setuptools==20.3
            pip install -r requirements.txt
      - save_cache:
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: run tests
          environment:
            CIRCLECI: true
            TEST_MONGO_MODE: 1
          command: |
            . venv/bin/activate
            python manage.py test -v 2 --with-nicedots --exclude-dir=common/mediation/management/commands/ --exclude-dir=apps/waffle/tests/ --exclude-dir=common/tests/ --exclude-dir=common/feeplus/tests/

Log Output: Too long with no output (exceeded 10m0s)

No, not a browser test.

The test does not get stuck in my local, so there seems to be no debugging on my part.

Although, I ssh-ed into the CircleCI box and ran the tests manually there. Those worked fine.

Update: Apparently, the latest test run has successfully completed.

Is it possible that your test program could produce no output for 10 minutes and not actually be stuck? I am not familiar with --with-nicedots but I am assuming this will print dots to the screen to show the test system is running and not crashed?

I dont think that is possible. It runs under in a few seconds in my local and also, in the next successful run that happened after this failed test run.

The dots dont run continuously to tell if it is running or not.
It prints a dot, or a letter (such as E for error, F for fail).
You can read more about here.

In my case, it does this:


test_cases.py:TestCase.test_list ... Too long with no output (exceeded 10m0s)

More Info: Test run was failing in CircleCi 1.0, with a different test case. After which I migrated to 2.0, which ran smoothly for about a month and then it again started to fail and now its fine again.

It might be that your tests are too sensitive to their environment, and they need to be made more robust. Passing sometimes and failing sometimes is a classic sign of flakey tests (don’t worry about it, everyone gets 'em).

Can you add debug commands to your tests to log how far they get on the occasions it gets stuck? If it stops on the same test every time it fails, that would be an indicator of a problem to dig into.

Well, if the test was “flakey”, then it wouldn’t run after I SSH-ed into the CircleCI box as well, right? I managed to run it multiple times in the box and it didn’t stuck.

EDIT:
This Django Testing Example is much similar to the test which is failing. In the example link, it makes a POST request, I do a GET request and compare with the expected output. That’s it.
You might say that it may be stuck due to the API call, but that same API call is being made just in the previous test and that worked fine.

I think you’re partly right: your tests are brittle only on start-up. Do they use the MySQL server in your tests? If so, I wonder if you need to have a wait command before your tests to ensure MySQL is ready before you try connecting. Of course, in an SSH session, that’s already done, which might be why it works here.

But it didn’t get stuck on the first test case.
I can try with the wait command, can you give me an example of how to do it?

If you are reliably running tests that use the database, and it gets stuck on something in the middle, then it is not likely to be a database start-up issue, and instead you need to find out where it gets stuck. You will need to add something to log to a file (and export as an artefact) or examine the file in a post-fail SSH session.