From what I’ve seen over the years, tests that exercise a system through the graphical user interface are disproportionally likely to fail for reasons other than an actual, genuine product failure. Call them false positives, call them flaky tests, call them whatever you want.
There are good reasons for this: graphical user interfaces are interfaces optimized for consu...