Subject: Tracking down inconsistent failure in jenkins


Raphael:

Thanks for becoming involved!

It’s super-frustrating that some of the tests on Jenkins do (or do not) reproduce, even if you “beast” them. Hoss’ reports come from many different environments, from Windows to various Java releases to… So “does it fail locally” is a tricky question. Plus, many of the intermittent failures are timing-related, so the speed of your local machine, the other tasks running on your machine etc. can be a factor.

What I do is use Mark Miller’s “beast” script. See: https://gist.github.com/markrmiller/dbdb792216dc98b018ad

Two important parameters to the script above are
- how many separate tests you want to run in parallel. This helps when the failures are timing-related
- how many iterations of the tests you want to run. Each test puts its output in a separate subdirectory, so when a test fails you have the full logs in the corresponding subdirectory.

Then I run the failing test over and over and over. If I can get it to fail (and if you’re getting 0.5% failures, it’s _really_ hit or miss) then I can diagnose the logs in the appropriate directory, possibly add logging and run it all again.

Unfortunately, for intermittently-failing tests, you never _quite_ know if you’ve fixed the problem because your 10,000 iterations may have just lucked out.

Welcome to the joys of distributed computing ;)

Best,
Erick
---------------------------------------------------------------------