Failing e2e tests with random test failures

Hi all,
In Oppia we use e2e tests ( e2e or end-to-end ). But we are facing many random test failures during the running of tests. We have a set of e2e tests, while few are running fine, while few are failing.
The link to the workflow: Circle CI
The tests failing on Circle CI are:

  • additionalEditorAndPlayerFeatures_e2e_tests
  • explorationStatisticsTab_e2e_tests
  • explorationTranslationTab_e2e_tests
  • extensions_e2e_tests
  • learnerDashboard_e2e_tests
  • learner_e2e_tests
  • library_e2e_tests
  • skillEditorPage_e2e_tests

Link to the workflow: Travis
Also, these tests seem to work fine on Travis.
So we tried to debug the issue but was not able to found any solution.

1 Like

This is a classic test debugging scenario. It looks like you’re using a headless browser driver, and it should be noted that this is hard enough to debug locally, never mind remotely! Nevertheless, my advice would be to dig in and debug why they are failing.

In one case, you have an Angular error in your JS app:

TypeError: Cannot read property ‘content’ of undefined\nUndefined states error debug logs:\nRequested state name: Introduction\nExploration ID: Y83LB8ECQe3n

I expect that should never happen, so you need to find out why it happens, and then stop it happening.

There is also an element not found:

No element found using locator

However, that is probably due to the Angular issue.

I am also seeing this:

Page takes more than 15 secs to load

Again, you need to find out why.

For all of these cases, I would suggest:

  • Create a branch
  • If the tests take a non-trivial time to run through, modify them in the code so they do not run, or just run a single test
  • After a (quick) job run on code push, re-run it with SSH
  • Use your test runner to run flakey/broken tests individually and debug them on the server
  • Use the screenshot feature in your headless browser if needed
  • Use a console editor to make more trivial assertions prior to a failing step (e.g. is there any HTML in the DOM at all? Has the UI rendered? etc.)
2 Likes

Okay, Thanks. I will try these steps.

Hi @halfer, thank you for the detailed debugging steps. I am a member of the same organization, Oppia and following up on @anubhavsinha98’s behalf.

We’ve already checked all the tests for flakiness. They work perfectly fine on a local machine, simulating the production environment and on Travis CI as well. We use the same Chrome versions on both Travis and CircleCI and the configuration for both the CIs is almost same (apart from the syntactical differences).
Link to the CircleCI config file: .circleci/config.yml
Link to the Travis CI config file: .travis,yml

It’s a bit surprising that the same tests work out-of-the-box on Travis but not on CircleCI.
We have been trying to get these tests running on CircleCI for quite some time now. Any help would be appreciated.

Thanks!

Hi @apb7. Do please let me know how you got on with my suggestions above. I would reiterate the use of logs and screenshots to help you understand where it is failing.

It may be worth turning up the log level too, in case more detail is available that presently you cannot see.

Hi @halfer, I manually walked through each of the steps of the failing e2e tests on a local machine while simulating the production environment. Then, I checked out each of the elements which were stated not visible or not found or not clickable, while manually “running” the tests.
These tests smoothly pass on Travis and on the local machine. Therefore, I suspect something going on the CircleCI end (may be some extra configuration or setup step is required, which isn’t documented).
Thanks!

Nothing comes to mind, sorry. I take the view that CI machines are pretty much Linux machines, and I would guess that you’re bumping into a Linux environmental problem rather than a CircleCI one.

You could try switching to a Machine executor, to see if you are hitting into Docker or RAM limitations.