My thoughts are too long for commenting, so I'll wrap them up as an answer...
There are some serious issues with the other two answers, to the extent that some of us have miss-interpreted what a randomness test is. As bullet points:-
1.
"distrust any crypto algorithms that come out of NIST". There are no NIST generated algorithms in FIPS. Certainly none of the complexity of Dual_EC_DRBG. Runs and Poker tests are not US Department of Commerce (NIST) proprietary algorithms. They are mathematical characteristics of a uniformly random distribution. If I posit that the expected number of ones should be ~50%, does that make me a subversive? Neither does expanding the mean of 0.5 with $n$ standard deviations. $\mathcal{N}(\mu, \sigma^2)$ is the standardised form for that distribution and I wouldn't expect anything less incomplete. Checking for repeat output blocks (Continuous random number generator test) is not subversion, it's common sense.
2.
Can I offer this FIPS test as evidence:-
$cat /dev/urandom | rngtest
rngtest 5
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rngtest: starting FIPS tests...
rngtest: bits received from input: 8310580032
rngtest: FIPS 140-2 successes: 415198
rngtest: FIPS 140-2 failures: 331
rngtest: FIPS 140-2(2001-10-10) Monobit: 41
rngtest: FIPS 140-2(2001-10-10) Poker: 53
rngtest: FIPS 140-2(2001-10-10) Runs: 123
rngtest: FIPS 140-2(2001-10-10) Long run: 115
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=10.703; avg=1976.720; max=19073.486)Mibits/s
rngtest: FIPS tests speed: (min=75.092; avg=199.723; max=209.599)Mibits/s
rngtest: Program run time: 43724402 microseconds
The failure rate is p=0.0008. That's very comparable to the p=0.001 threshold within the SP800 STS test suite, and dieharder's:-
NOTE WELL: The assessment(s) for the rngs may, in fact, be completely
incorrect or misleading. In particular, 'Weak' p values should occur
one test in a hundred, and 'Failed' p values should occur one test in
a thousand -- that's what p MEANS. Use them at your Own Risk! Be Warned!
So not apparently controversial.
3.
"It's not specified that these tests should be run on the unconditioned entropy source". Of course not. That's correct. No one has statistical characteristics for unconditioned entropy source distributions. They come in all shapes and locations. Some of them do not even have mathematical names (double sample of log normal, bathtub MOD $x$ e.t.c.) We can only run standardised statistical tests on conditioned final output.
4.
"it's impossible to detect a competently backdoored generator from it's output alone". Again, of course. That's not the intention of e.g. FIPS startup testing. You need programmers and cryptographers for that. FIPS simply automates the randomness testing and sets out guidelines for basic security programming like no string literals for control, and relocatable code. All very normal.
Therefore FIPS 140 isn't all that contentious. Saying so is equivalent to saying NIST has backdoored the Normal distribution, or that dieharder is useless. FIPS is just not great at some few things. And testing 20,000 bit blocks fits neatly at the bottom end of the scale for randomness testing, just below ent
(500,000 bits).