Is Tom Chivers Right to Say PCR False Positives are "So Rare" they can be Ignored?

Tom Chivers at UnHerd has published an article headlined “PCRs are not as reliable as you might think“, sub-headlined “Government policy on testing is worryingly misleading”. The core argument of the article is that due to high rates of false negatives, a positive lateral flow test followed by a negative ‘confirmation’ PCR should be treated as a positive. I pass no comment on this. However, the article makes a claim that itself needs to be fact checked. It’s been quite a long time since PCR accuracy last came up as a topic, but this article provides a good opportunity to revisit some (perhaps lesser known) points about what can go wrong with PCR testing.

The claim that I want to quibble with is:

False positives are so rare that we can ignore them.

Claims about the false positive (FP) rate of PCR tests often turn out on close inspection to be based on circular logic or invalid assumptions of some kind. Nonetheless, there are several bits of good news here. Chivers – being a cut above most science journalists – does provide a citation for this claim. The citation is a good one: it’s the official position statement from the U.K. Office for National Statistics. The ONS doesn’t merely make an argument from authority, but directly explains why it believes this to be true using evidence – and multiple arguments for its position are presented. Finally, the arguments are of a high quality and appear convincing, at least initially. This is exactly the sort of behaviour we want from journalists and government agencies, so it’s worth explicitly praising it here, even if we may find reasons to disagree – and disagree I do.

Please note that in what follows I don’t try to re-calculate a corrected FP rate, alternative case numbers or to argue that Covid is a “casedemic”.

Let’s begin.

Lab contamination. The ONS’s first argument goes like this:

We know the specificity of our test must be very close to 100% as the low number of positive tests in our study over the summer of 2020 means that specificity would be very high even if all positives were false. For example, in the six-week period from July 31st to September 10th 2020, 159 of the 208,730 total samples tested positive. Even if all these positives were false, specificity would still be 99.92%.

This seems quite reasonable at first, so I don’t blame Tom Chivers or the ONS for making this point; indeed, I also accepted it for a while. However, it rests on an assumption they spell out explicitly later in the document:

Assuming that false-positives… occur at a roughly similar rate over time…

Taking the lowest rate of positive reports as the FP rate requires modelling FPs as a form of uniformly random noise, but this isn’t valid.

Contamination is a very serious issue in PCR labs. We know this because in 2018 the WHO published guidance on how to run a PCR testing lab. Recently, a Chinese biotech firm called Bioperfectus published an article with very similar requirements. The advice is extreme and demands conditions comparable to the cleanliness of a semiconductor fab:

Use four physically separate rooms for different stages of the testing process.
Positive/negative air pressures should be maintained between the rooms. The rooms should each have dedicated air ducting such that they exhaust to the outside world separately. Note: this requires specifically constructed buildings and specialized air handling equipment.
Each area requires not only separate air handling but also separate lab coats and gloves. Staff must use a one-way system in which neither people nor equipment ever go “backwards” during a day’s work. If it becomes necessary to do so, decontamination procedures are required.
Equipment must be regularly cleaned with ethanol, sodium hypochlorite and de-ionized water. There must be 10-15 minute gaps between these cleaning stages.

And so on. Contaminants don’t have to come from the environment. Research into how the PCR process itself can create false positives due to “carry over contamination” was being done as recently as January.

Given the major contribution contamination makes to PCR FPs, it is logical that at any given lab there is no one fixed value but rather that the false positive rate is actually a multiplier of the true positive rate. With low real prevalence there is hardly any virus to contaminate the lab. With high prevalence not only does contamination become intrinsically more likely, including from other possibilities that the WHO advice ignores completely like infected lab workers, but the workload increases and pressure on the lab to cut corners grows along with it.

Still, the low base rate does at least indicate that at least in summer months when not under pressure and when contaminants are rare, there is very little “noise” in the test.

The other arguments presented by the ONS likewise rest upon this assumption:

…high rates of false-positives would mean that, the percentage of individuals not reporting symptoms among those testing positive would increase when the true prevalence is declining because the total prevalence is the sum of a constant rate of false-positives (all without symptoms) and a declining rate of true-positives

External validity. The ONS argument is based on data submitted by the labs themselves. However, this isn’t the FP rate that citizens actually care about. They care about the end-to-end performance of the entire system, and that captures false positives introduced by bureaucratic failures of various kinds – for example, workers deciding to report an inconclusive test result as a positive because ‘better safe than sorry’, results getting sent to the wrong person and someone being told they tested positive even when they didn’t take a test at all. There have been many reports of problems like this happening. I myself have received a PCR certificate for the wrong person (who shared my last name), and I know how it can happen. After my fiancé’s test results were “lost” I watched as the site workers fought with a badly written web-app that randomly auto-completed other people’s details into the form they were trying to fill out.

To determine the actual false positive rate of Covid testing an end-to-end study that includes the actual citizen-facing sites and systems would be required. Governments appear to have never done this.

Occasionally, scientists do perform lab challenges, sometimes called “external quality assessments” (EQAs). Artificially created samples are submitted that are known to contain certain types of viruses or not, and the lab results checked. Although this doesn’t verify the infrastructure of a mass testing programme, it is nonetheless more informative than just checking the lab’s own testimony. A meta-analysis from 2020 looked at the history of EQAs for RNA viruses post-2004, and then looked at how much those observed FP rates could affect claims about Covid:

Review of external quality assessments revealed false positive rates of 0-16.7%, with an interquartile range of 0.8-4.0%. Such rates would have large impacts on test data when prevalence is low. Inclusion of such rates significantly alters four published analyses of population prevalence and asymptomatic ratio.

The study also concluded that reliability didn’t improve over time.

Cross-viral confusion. To what extent can PCR tests mix up different pathogens? Scientists make extremely strong assertions that this cannot happen under any circumstances. However, in the past there have been incidents where this did in fact happen.

In 2015 a large scale lab challenge was performed. Labs were sent samples they were told contained MERS-CoV, but some contained other coronaviruses like those that cause common colds. Many yielded no false positives at all, but around 8% of labs incorrectly detected MERS-CoV. NB: 8% of labs reporting false positives is not the same as an 8% false positive rate.

In 2003 an outbreak of a common cold virus (HCoV-OC43) occurred at a Canadian nursing home. Because this occurred during the SARS-1 epidemic in Asia the samples were subjected to routine PCR and serology testing for SARS. Unexpectedly, both tests indicated a SARS outbreak. Because this was implausible further PCR testing was done, which confirmed the far more likely cause of an OC43 outbreak.

In 2006 an epidemic of whooping cough was announced to be occurring at a U.S. hospital. Over 1,000 staff were furloughed and quarantined, ICU units were closed and 142 people PCR-tested positive for the disease. On further investigation using a simpler and more trusted but slower type of test (culturing the bacteria), all 142 results were found to be false positives. The event was dubbed a “pseudo-epidemic” and reported on by the New York Times. I wrote about this event in more detail last year. The staff concluded they had misdiagnosed whooping cough when in reality it was just an outbreak of a normal respiratory virus because they didn’t want to argue with the apparently more ‘scientific’ PCR results, even when their own clinical experience raised doubts.

This highly heterogenous pattern of false positive rates in which some labs yield none and others yield a large number is a common outcome of lab challenges. It isn’t surprising assuming the primary driver of FPs is contamination, yet means any attempt to characterize the FP rate of an entire mass testing programme at a single point in time is guaranteed to be incomplete. To detect and fix labs generating high FP rates would require a continuous challenge programme, yet governments – misled by scientists claiming that the technique is inherently immune to FPs – haven’t done this.

Logic errors. Many claims about the accuracy of Covid testing turn out on close inspection to be logically invalid. Trying to calculate a ‘true’ FP rate for Covid testing is extremely difficult due to the ubiquity in public health of circular logic that goes like this:

A PCR test returning positive implies a Covid case.
A Covid case implies a positive PCR test.

This situation means that in many discussions Covid PCR testing cannot have false positives by definition. Too often, claims that the tests are FP free or have nearly no FPs must therefore be discarded, because they are merely measuring tests against their own output – an approach which has no scientific validity.

This problem also affects many arguments based on symptoms. A typical dictionary definition of disease is as follows:

a disorder of structure or function in a human, animal, or plant, especially one that produces specific symptoms or that affects a specific location and is not simply a direct result of physical injury.
Oxford English Dictionary

The symptoms of Covid were in the very beginning quite specific, involving a new and unusual type of pneumonia. Once mass PCR testing began the list of symptoms expanded to include whatever seemed to be wrong with people who tested positive. The current definition of Covid includes every symptom you might find in the general population, including no symptoms at all, which is what you’d expect to see happen in an environment where a test that has false positives is treated as if there are none.

Circular logic has corrupted or destroyed attempts to determine the accuracy of Covid testing on a truly staggering scale. In 2020 a meta-analysis study published in the Journal of Infection looked at every paper published up to that point on the accuracy of Covid tests. Of the 43 studies they located, none of them was methodologically valid. They concluded:

Current studies estimating test performance characteristics have imperfect study design and statistical methods for the estimation of test performance characteristics of SARS-CoV-2 tests

…which is quite the understatement:

Critical study details were frequently unreported, including the mechanism for patient/sample selection and researcher blinding to results

In other words, the scientists didn’t explain how they identified people as having Covid, then compared test results to their sample anyway (=results are useless). But in most cases the problem was circular logic:

Eight studies… attempted to determine the accuracy of rRT-PCR by comparing the initial rRT-PCR result to the result after multiple repeated samples from the patient… Suo et al. considered a positive result of either repeated measurements of rRT-PCR or serology to indicate a positive test according to the reference standard… Three studies determined the accuracy or agreement of rRT-PCR or automated rRT-PCR platforms/instruments compared to a reference standard based on the results of several tests as a “composite reference standard”.

etc., etc.

Testing a test against itself is not a valid way to measure its accuracy. The results from this sort of approach are actually a measure of test-retest reliability, not sensitivity or specificity. The problem is compounded due to incorrect interpretation of test flips. A test that returns both yes and no in quick succession should have its output discarded. Instead, scientists routinely classify these as false negatives (because FPs are “impossible”). They then report that Covid tests have a high rate of false negatives but no false positives, which re-assures scientists that FPs are impossible: yet more circular reasoning.

Conclusion. The belief that Covid testing has such low FP rates as to be ignorable is based on a combination of highly heterogenous lab accuracies, an incorrect assumption that PCR FPs are characterisable as random noise, and rampant circular reasoning within public health research.

This essay has some limitations to bear in mind:

There are other sources of PCR false positives that aren’t mentioned because they’re well discussed elsewhere already, like how PCR tests are routinely interpreted as evidence of infectiousness when that isn’t what they actually test for. The topic of cycle thresholds is related to this point. I make no claim to be comprehensive.
Terminology can prove difficult. “False positive rate” can be used to mean the fraction of all tests performed that are incorrectly positive, or to the fraction of all positive results that are false, depending on context and author. These difference can result in wildly different numbers for what is actually the same claim.
Most of the studies relied on were published last year. There may be newer studies with better methodologies I’m not aware of, including studies that contradict the arguments presented here.
To repeat for clarity: these problems do not imply the false positive rate is very high. They imply we do not seem able to characterise Covid test accuracy rigorously, even nearly two years into the pandemic.