This graph shows the long and short term failure rates for the FIPS 140-2 tests, both individually and as a whole. The short term average tracks a window of the last 1000 tests. A correctly working system should expect to converge on just under 0.8 failures per thousand as the long term trend, with the short term average varying from 0 with occasional peaks over 5 (as the rare, but not quite infinitely improbable, rough upper bound). A sustained short term rate greater than that would indicate a systemic failure.
The theoretically expected failure rates for the individual tests are (approximately): Monobit around 0.104 per 1000, Poker ~0.099 per 1000, Runs ~0.328 per 1000, Long run ~0.298 per 1000. The Repetition test is expected to fail 'very rarely' (but not never), runs of around 2 to 30 million blocks between failures of that test are not uncommon. In practice it is fairly common to see the failure rates of the Monobit and Poker tests be very similar, and for the Runs and Long run tests to also track each other quite closely over the long term average.
The two graphs below show the run length between FIPS 140-2 test failures. A correctly working system should expect to see failure of the FIPS 140-2 tests about once in every 1250 blocks tested on average. Occasional runs of much longer than that can be reasonably expected, with a run of 17500 or longer expected about once in 1.2 million tests (about 3.5TB of samples). A sustained lack of failures would indicate a problem that ought to be investigated.
The first graph shows the short and long term average for a single device, while the second is an overview of all devices in the system that shows their current short term average. The short term average tracks a window of the last 10 runs. The long term average is calculated over all runs which have occurred since the monitoring process began.