Might Most Positive Tests be Wrong?

by David Mackie

A lot of people are bad with numbers, and especially so in the area of probability. Earlier this year (with accidental prescience), in the school where I work, as part of our off-curriculum ‘mind-broadening’ provision for sixth-formers, a few of my colleagues and I presented students with a puzzle involving imperfect methods of testing for rare conditions. Such puzzles can yield startling results – ones which even bright students are often reluctant to accept.

For example, if the incidence of a disease in the population is 0.1% and the test has a false positive rate of 5%, the probability that a randomly-selected individual testing positive actually has the disease is approximately one in fifty: about 2%, or a probability of 0.02.

Though this is easy to demonstrate, it is remarkable how resistant many perfectly intelligent people are to the conclusion, even when shown the proof. “But the test is 95% reliable”, they protest. “How can it be that a person with a positive test has anything less than a 95% chance of having the disease?”

That kind of response merits attention. It does so because it is an example of an important failure to understand relevant data (and/or the terminology used to describe those data); and it is a failure that renders people blind (or, worse, resistant) to legitimate concerns about the significance of the published results of recent mass testing – concerns that are still not receiving the wider public attention that they deserve.

What is ‘95% reliability’?

Consider the claim that ‘the test is 95% reliable’ as a gloss on the observation that the test has a false positive rate of 5% (and, for the sake of simplicity, a 0% false negative rate). That gloss betrays an important misunderstanding of what is meant by the ‘false positive rate’. In fact, the false positive rate denotes the percentage of people without the disease whose tests come back positive. Crucially, this is not the same as either

the percentage of positive tests that are falsely positive; or
the percentage of total tests that are falsely positive.

When ordinary intelligent people describe a test as 95% reliable, they mean that 95% of the results that it yields are correct. The reality concerning a false positive rate (“FPR”) as it is in fact defined, however, is quite different: since the FPR is defined as the percentage of individuals who are not infected but who test positive, it simply does not follow that a 5% FPR translates into 95% reliability for the test, as that is ordinarily understood – even if we assume a nil false negative rate. Rather, the reliability (in the layman’s sense) of the test is a function of both of two factors: (i) the tendency of the test to identify as positive people who are not and (ii) the overall prevalence of the disease in society.

Examples

This phenomenon is difficult for most people to grasp in the abstract. It is useful, therefore, to consider a range of examples.

Example 1: The Ubiquovirus. So called because almost everyone has it. Its incidence in society is 95%.

Suppose that we have a test with a FPR of 5%, and that we test 100,000 people.

Since the incidence of the virus is 95%, 95,000 of our 100,000 will have the virus. 5,000 are not infected.
Since the FPR is 5%, 5% of the 5,000 people in our sample who are not infected will test falsely positive. That makes 250 false positive tests.
If we assume that the test yields no false negatives, the test of our 100,000 citizens will yield 95,000 true positives, 250 false positives, and 3,750 true negatives,

With the Ubiquovirus, a positive test therefore has a reliability of 95,000 over 95,250. If you receive a positive test, there is a 99.73% likelihood that you have the virus. That’s quite impressively reliable, by most people’s standards.

Example 2: The Aliquovirus. Incidence in society 50%.

Assume, again, that we have a test with a FPR of 5%, and that we test 100,000 people.

Since the incidence of the virus is 50%, 50,000 of our 100,000 have the virus. 50,000 are not infected.
Since the FPR is 5%, 5% of the 50,000 people in our sample who are not infected will test falsely positive. That makes 2,500 false positive tests.
If we assume that the test yields no false negatives, the test of our 100,000 citizens will yield 50,000 true positives, 2,500 false positives, and 47,500 true negatives.

With the Aliquovirus, a positive test therefore has a reliability of 50,000 over 52,500. If you receive a positive test, there is a 95.24% chance that you have the virus. Now, 95.2% is not 99.7%; but most people would still probably reckon that pretty reliable: it’s probably a test worth paying for, if the price isn’t too high and if the outcome matters.

Example 3: The Rarovirus. Incidence in society 5%.

Assume, once again, a test with a FPR of 5%, and that we test 100,000 people.

Since the incidence of the virus is 5%, 5,000 of our 100,000 will have the virus.
Since the FPR is 5%, 5% of the 95,000 people in our sample who are not infected will test falsely positive. That makes 4,750 false positive tests.
If we assume that the test yields no false negatives, the test of our 100,000 citizens will yield 5,000 true positives, 4,750 false positives, and 90,250 true negatives,

With the Rarovirus, a positive test has a reliability of 5,000/9,750.

If you receive a positive test, it is 51.28% likely that you have the virus. That is pretty unreliable, by anyone’s reckoning. It is hard to imagine ordinary circumstances in which a rational individual would pay any significant sum of money for a test so unreliable as to a binary outcome. (If one were in the position of betting, repeatedly, huge sums of money, for small odds, on the probability that a certain share price would go up or down in the course of day’s trading, then yes: a test that guaranteed such a slender margin of probability could be beneficial; but that is not the kind of situation with which we are dealing when we test ourselves for COVID-19).

The point is that, if you are interested in the reliability of a test in the ordinary, layman’s, sense, then merely knowing the test’s FPR won’t tell you the answer. The FPR was the same in all three examples, yet the reliability, in the layman’s sense, of a positive test was quite different in each. To determine the reliability, you need to factor in the incidence of the virus in society as a whole.

It is important to understand that I am not disparaging the ‘layman’s’ sense of ‘reliability’ by labelling it thus. On the contrary, how reliable a test is, in the layman’s sense of the word, is vitally important. It is a test’s reliability in this ordinary sense that determines such matters as:

whether a test, or mass testing, is worth performing in the first place;
what conclusions can be drawn about the potential risk posed to others by an individual who tests positive;
what conclusions may justifiably be drawn about the general incidence of infection in a community; and consequently
what public policy decisions can justifiably be influenced by the number of positive tests in a community.

Application to SARS-CoV-2

All right; but where does SARS-CoV-2 fall on this scale? Is it more like the Ubiquovirus, or more like the Rarovirus? Well, I’m no scientist, but it is clear that SARS-CoV-2 is not much like either; it is, by any estimate, much rarer than the Rarovirus; and since this is so, the effect of any false positive rate on reliability in the layman’s sense is even more striking.

No one knows, for certain, the actual rate of incidence of SARS-CoV-2 in the UK (or in any other country). On 18 October, Government data quoted a figure of 1088.8 per 100,000 for England. I suspect (for reasons given below) that this may well be a massive overestimate; but let us run with that figure for the sake of exegesis.

What is the false positive rate? Again, no one knows. The best estimate we have is from a meta-analysis by Andrew Cohen and Bruce Kessel of external quality assessments of RT-PCR assays of RNA viruses dating between 2004 and 2019. This analysis revealed false positive rates of 0-16.7%, with an interquartile range of 0.8-4.0% and a median of 2.3%.

Suppose, for the sake of illustration again, that we take the median figure of 2.3% for the FPR, and run the same experiment as before: we test 100,000 people randomly picked from our population.

Since the incidence is 1089 (rounded up) per 100,000, we should expect 1089 people to be positive. 98,911 are negative.
Since the false positive rate is 2.3%, 2,274 (rounded down) of those 98,911 uninfected individuals will test positive.
The total positive tests will therefore (assuming no false negatives) be 1089 + 2,274 = 3,363.

If you have a positive test, therefore, it represents no more than a 1,089 over 3,363 chance – or 32.4% – that you are actually infected.

The intention of this piece is not, however, to suggest that these are the correct figures. I suspect, in fact, that the Government’s claimed rate of the incidence of infection is hugely exaggerated, in no small measure precisely because Government figures blithely assume that all positive tests represent real infected people, and ignore the huge distortion that even small proportions of false positive tests can make to any realistic estimate of incidence of infection in a community. My aim is only to assist the wider public understanding of just how dramatically the reliability of a positive test can be undermined by low percentages of false positives, especially where real incidence is low.

Furthermore, there is a ratcheting effect, in that the greater the distortion caused by failing to account for the likelihood of false positives, the lower the true incidence is likely to be than the Government’s figures. And the lower the true incidence is, the less likely (given any FPR) any positive test is to be correct. To take just one hypothetical example, if the real incidence of SARS-CoV-2 were, say, 50 per 100,000, then even assuming a very modest FPR of 1%, the reliability, in commonsense terms, of a positive test would be just 4.8%:

100,000 are tested
Since the incidence is 50 per 100,000, 50 are actually infected
Since the FPR is 1% we should expect 999 false positives (rounding down)
Chances that a positive test reflects actual infection: 50 over 1049, or 4.8%

Conclusion

Many perfectly intelligent members of the public, as I said at the start, are bad with figures. That is one of the problems – and the one that I have done my best to address here, by explaining as carefully as I can how surprising-sounding effects as to the reliability of testing can indeed be yielded even by low FPRs in cases where actual incidence of infection in a community is low.

A second problem is that the results of considerations of this kind are so startling that even those who can handle the figures may be inclined to doubt the correctness of the definitions on which the reasoning is based. I have been told by a statistician, for example, that my definition of the false positive rate as the proportion of uninfected people who test positive ‘must’ be wrong: the FPR ‘must’, instead, be the percentage of positive tests that turn out to be false. That would make a kind of sense, of course; and it would indeed mean that figures for the FPR would accord better with the ordinary conception of a test’s reliability. But it isn’t true.

As I have said, I’m not a scientist; and I’m not suggesting that the figures in my final example (nor any of the others) are the true figures. My main aim is only to try to explain the reasoning that leads to real and legitimate concerns about current positive test figures. That is worth doing, because experience teaches that the reasoning is obscure to many – so much so that some are inclined to question the sanity of those presenting it.

What is clear is that the public at large is currently blind to the very real possibility that the reliability of a positive test is significantly less than 100%; and that (deplorably) neither Government nor the mainstream media are doing anything to inform the public about such matters, which have an enormous and obvious significance for the moral and practical legitimacy of public policy measures being adopted in response to the testing data.