The Real Fault with Epidemiological Models

by Hector Drummond

Imperial College’s Professor Neil Ferguson has drawn a lot of criticism recently for the poor state of the code in his COVID-19 model:

This criticism, it should be noted, is not even directed at his original code – which he still refuses to release, so we can guess how bad that is. The criticism concerns the completely rewritten code that has been worked on by teams from Microsoft and Github.

However, Simon Anthony, sometime contributor to Hector Drummond Magazine, has recently written in the Critic magazine that the poor quality of Ferguson’s code is beside the point.

I actually agree with this claim. Of course, it is quite right that the poor quality of Ferguson’s code should have been drawn out into the open, and I think his critics should be congratulated for doing this. I also think it is revealing of the poor standards at work in general with Ferguson’s team. But, in the end, Anthony is right that the sort of modelling Ferguson is doing is not discredited by the fact that his own effort was so shonky, because any number of epidemiological modellers could have come up with similar analyses using impeccable code. If we make this the main point of criticism of Ferguson’s predictions then we risk being undone when another group backs his analysis up with a top-notch piece of coding.

Reliability

I want to focus on what I consider to be the real failure not just with Ferguson’s model, but most epidemiological modelling of this sort: the lack of proven reliability of these models. It doesn’t matter whether your code is the equivalent of a brand-new shiny Rolls-Royce or a beat-up old Ford Transit van that’s been up and down the motorway way too often with rock bands. What matters is that your model makes predictions that we can have reason to trust. Can we trust Ferguson’s predictions? I have no reason to think we can.

For one thing, we have heard many reports of his extreme predictions in the past which have failed to come true. To be fair to Ferguson, it may be that he has learned from all these past failures, and has recently perfected his model by testing it repeatedly against reality, and it is now constantly providing accurate predictions. But I have no evidence that this has happened. Months after Ferguson came to public attention – months in which he has received large amounts of criticism for just this point – he has declined to point anyone in the direction of any reliability tests for his models. So I have no reason to put any store in them.

Models are “speeded-up theories”

It is sometimes said that a computer model is just a “speeded-up theory”, and I think this is true. A computer model is not just a neutral bit of maths that we can nod through. Assumptions are made about how this part of reality works, and about how this part of reality can be imitated by a vastly simplifying piece of computer code and some maths. So a computer model embodies a theory about how this part of the world works.

This theory has no special status in the pantheon of theories. It’s like any other theory: it has to be tested against the world to see if it stands up – and not just tested in easy, artificial situations where it’s not hard to produce the right answers. Nor can it just be tested post hoc – that is, tested against scenarios which have already happened. It’s not hard to adjust a model so that it outputs the correct predictions when you already know what those predictions have to be. Post hoc analysis and adjustment is important, of course, but it alone doesn’t count as testing. Just like any other theory, the theory embodied by the model has to be shown to produce accurate results in advance. And in this particular field, you’d want to see it do so in a wide range of real-world situations. This is not the sort of theory where one decisive experiment can settle things one way or the other.

The demand for testing and reliability is not in any way controversial. Having worked in academia for many years, I saw scientist friends in various fields who would work with models, and their concern was always with the reliability of the models. You couldn’t just use any old model that suited you and which gave you the results you wanted. You had to use something with a track record, and which other people in your field trusted because it had independently proven its worth.

Models in other fields

Many models have proven their worth over the years in many fields. Engineers, for example, have long used computer models to design bridges and other structures. These models can be very sophisticated and complex, and usually involve advanced mathematics (the engineering commentators at my magazine and my Twitter page like to boast of their mathematical superiority over the likes of economists and epidemiologists). Despite that, these models generally work. They pass the reliability test. They have to work, of course: you can’t have your engineering firm build a bridge that falls down at the opening ceremony and kills everyone. So reliability testing is incredibly important in fields such as engineering, but even then they don’t always get it right, because these things can get extremely complex and difficult and mistakes happen.

Grant-funding incentives

There isn’t the same imperative to get it right in academic epidemiology. Neil Ferguson can keep making very high overestimates of the amount of people who will be killed by the latest disease, and he keeps his jobs and government appointments and still gets massive amounts of funding from bodies like the Bill and Melinda Gates Foundation. He won’t be sued and made bankrupt if he gets it wrong. The government didn’t require him to publish reliability tests for his models in order him to be part of the SAGE and NERVTAG committees.

Epidemiology seems to be one of those areas, like climate change, where model reliability matters far less than it should. This can happen to areas that become politicised and where the journals are controlled by strong-armed cliques. It can also be a consequence of modern academia, where the emphasis has shifted almost totally to funding success. Funding success in areas like epidemiology can depend on exaggeration to impress people with agendas and money to burn, like Bill Gates. In an objective field you would expect, after all, underestimates to be as prevalent as overestimates. Yet in this field, overestimates are rife. And the reason for this is the same as the reason why alarmism thrives in climate “science”: it’s because all the research money goes to those who sound the alarm bells.

Creutzfeldt–Jakob disease

The case of variant Creutzfeldt–Jakob disease (vCJD), which can be caught from eating meat from animals that had BSE, or “mad cow disease”) provides a telling example. In August 2000, Ferguson’s team had predicted that there could be up to 136,000 cases of this disease in the UK (and disturbingly, this article mentions that Ferguson and his team had previously predicted 500,000 cases).

A rival team at London’s School of Hygiene and Tropical Medicine developed their own model which predicted there would be up to 10,000 cases, with a “few thousand” being the best case scenario. Ferguson pooh-poohed the work of this rival team, saying it was “unjustifiably optimistic”.

I should note that Ferguson had made some lower predictions as well – in fact he made a wide range of predictions based on whether various factors, such as incubation periods, applied. But the fact that he laid into the rival team in this way tells us that he thought we were looking at the high end of the range.

Seeing as pretty much everyone who gets vCJD dies from it, this was serious.

So how many people died from vCJD in the UK in the two decades since? 178.

My point here isn’t just that Ferguson’s model was stupendously wrong (or, if you want to emphasise the very large range of predictions he made, useless for most practical purposes). The point is that even the team that performed better still greatly overestimated the number of deaths. Their model only looked good compared to Ferguson’s – his model wasn’t even in the right universe – but it was itself highly inaccurate and misleading, and not at all up to the job we required to be done.

The other advantages of bridge-building models over epidemiological models

Bridge-building models also have other advantages over epidemiological models. The principles of physics and chemistry that are involved are very well established, and have been worked on for a very long time by very many, and many great, scientists. Also, the basic principles of physics and chemistry they deal with don’t change. Some things in the field do change, of course – for example, new materials are constantly developed, and one must take account of construction techniques varying from place to place, and that mixtures of materials are not always quite right, and so on. But there is a great advantage in the fact that, for example, the laws of gravity don’t change.

Epidemiology, on the other hand, is dealing with things that are, in general, far less locked down, and which can change from decade to decade. Diseases have more-or-less different structures from one another. They don’t all behave alike. Countries vary from one another in various relevant respects (temperature, sanitary conditions, crowd behaviour, and so on). Medicine improves, but it’s not always well known how a modern medicine interacts with a certain disease. There is little that is fixed in a physics-like way with disease, and even for those things that are somewhat fixed our knowledge of some of the important detail is lacking.

The basics behind bridge-building models are not completely set in stone, of course, but they are much more settled than epidemiological models, which are trying to model far messier situations, with many more unknown parameters and influences.

Adding detail to make the models more realistic

The other problem with this messiness is that the models need to be made more and more complex to try to deal with all these extra factors. It appears that Ferguson has attempted to do this to some degree, which explains the length of his code. It also appears that the rewritten version of his model has added in even more of this sort of thing, for example, the Github page for it originally noted that it had now added in “population saturation effects”.

While I regard the attempt to capture the complexity of the real world as admirable, there can be no end to it when modelling some of the messier parts of reality, and you can end up needing a model as large as reality itself before your model starts to give you any reliable predictions. Models, remember, are attempts at drastic simplifications of reality, embodying various theories and assumptions, and there is no guarantee that any such simplification will work in any particular area. Sometimes they do, sometimes they don’t. The only criterion for deciding this is reliability testing. It’s not enough just to say, “Well, this should work, we’ve taken everything that seems to be influencing the results into account”. If your model still isn’t reliable, then you haven’t accounted for all the complexity – or else you’ve just gone wrong somewhere. Or both. It can be extremely hard to know where the fault lies.

In fact, given the repeated failure of epidemiological models, it seems most likely that a lot of relevant and important factors are being left out. Threatening diseases often just “burn out” quicker than epidemiologist modellers expect, so it’s likely that their models have failed to account for some of these other factors.

For instance, recent research has claimed that many people may have partial or full immunity to COVID-19 due to past encounters with other coronaviruses.

This no doubt applies to a great many new diseases: some proportion of people will have full or partial immunity already. But this is not something that can be easily modelled – at least not without a great deal more information than we currently have.

Epidemiological modellers also try to incorporate knowledge about genetics into their models, but our knowledge of how diseases interact with our differing genetics is still fairly rudimentary. We also know little about why some diseases affect children more than others. Furthermore, our knowledge of how viruses survive in various different environments and temperatures is very incomplete. Without better information about such factors being incorporated into the models, why should we continue to trust them after their many failures?

In the absence of such specific information, one thing modellers could do is to feed the information about their past failures back into their models. If your model consistently overestimates death numbers by, say, 100%, then at the very least you should be changing your model so that it adjusts its predictions 50% downwards every time; but of course it would be extremely embarrassing for any modeller to add in any such “repeated past failure” module into their model, and most likely there is no consistent mathematical pattern to their failures anyway.

The Sensitivity to Inputs

Yet another difference between bridge-building and epidemiological models concerns the issue of inputs. With an epidemiological model, small changes in input data can produce output results which are, for our purposes, vastly different, as Simon Anthony demonstrated recently in an article at Hector Drummond Magazine. The difference between a prediction of a disastrous epidemic or a normal winter virus can depend on small differences in the data that goes into the model. (This may recall to mind all those books and articles on chaos theory from the 1980s, where massive differences can result from small changes in the initial circumstances.)

This issue does not affect bridge-building models to the same extent. Bridges can now be made stable and strong (which is what we want) under a usefully large range of circumstances. This is partly because a bridge is something we are designing and building ourselves to exhibit behaviour we desire and using well-understood materials, whereas an epidemic is a part of nature that we do not have much control over, and also because we still do not understand disease spread that well – other than in the most basic ways.

That is not to deny that there will still be some scenarios – such as those involving extreme weather – where the bridge model will exhibit non-linearities; but in general, the outputs of standard epidemiological models are vastly more sensitive to their inputs than engineering models.

Wrong input data

This brings me to my final point of difference between bridge-building models and our COVID-19 models, and that is the issue of incorrect data being put into the model. This is an enormous problem with epidemiological models, whereas it isn’t a problem to anywhere near the same extent with bridge-building models (although it does sometimes happen, but not as a matter of course).

(Strictly speaking, this is not a fault of the model itself, but since it concerns the overall attempt to model the future of diseases such as COVID-19, I am going to include it here.)

It doesn’t matter how good a model is: if you are putting incorrect data into it, you’re going to be producing incorrect results. You can’t just assume that whatever results come out are going to be good enough for now, because as we have seen with epidemiological models, different data can produce vastly different results.

I won’t labour this point – it is one that has been made many times – except to note that despite about six months having passed since the first COVID-19 case was identified in China, there is still great disagreement over both the disease’s infection fatality rate (IFR), and its transmission rate (R_o) in various scenarios, and these are the critical numbers an epidemiological model requires to have any chance of being accurate. Ferguson could have created (despite appearances) an absolutely perfect model for COVID-19, but that will be of no use to anyone if the wrong numbers are going into it, as appears to be the case.

In fact, this problem cuts even deeper than many appreciate. With many diseases we never get a proper handle on what the real death rate and the transmission rates are. For instance, we still don’t really know much about the Spanish flu. We don’t know that much about how it spread and how many people were really infected. We don’t even know whether the different waves of it were caused by the same virus. And if you don’t trust my word on this, take Anthony Fauci’s word for it.

Even with modern influenza we have to rely on very crude estimates of how many people died with it. Even when we have the leisure to take a close look at some recent epidemic in order to improve a model, we often can’t put any solid, definitive numbers into it, because they just don’t exist. In fact, some of the time we’re using the models themselves to estimate how widely a disease spread, and what the transmission and fatality rates were. It’s not surprising, then, that it’s very difficult even now to create epidemiological models that work well enough to be trusted in difficult situations like the one we face with COVID-19.

Do all the top experts who you’d think love modelling really love modelling?

I leave you with the words of two of the worlds’ most high-profile epidemic scaremongers, who have both been at it since the days of the hysteria over AIDS. It seems that these two have finally started, after many decades, to get an inkling that epidemiological models are more dangerous than useful. (I owe these spots to Michael Fumento.)

The first is the Director of the USA’s Centre for Disease Control (and Administrator of the Agency for Toxic Substances and Disease Registry) Dr Robert Redfield, who said on April 1^st that COVID-19 “is the greatest public health crisis that has hit this nation in more than 100 years”.

A week later, though – as it started to become clear that the models had once again oversold a disease threat – he said, “Models are only as good as their assumptions, obviously there are a lot of unknowns about the virus. A model should never be used to assume that we have a number.”

Consider also Dr Anthony Fauci, the long-term director of the National Institute of Allergy and Infectious Diseases, and the very man who has been telling Donald Trump that the USA has a disaster on its hands. He said, for example, during a hearing of the House Oversight Committee on March 12^th, that COVID-19 “is ten times more lethal than the seasonal flu”.

But the fact that Fauci is very worried about COVID-19 does not mean that he thinks the models are gospel. A few weeks later he was reported as saying, “I’ve looked at all the models. I’ve spent a lot of time on the models. They don’t tell you anything. You can’t really rely upon models.”

If Robert Redfield and Anthony Fauci are not great believers in epidemiological models, I don’t see why the rest of us should be either.

Hector Drummond is a novelist and the author of Days of Wine and Cheese, the first novel in his comic campus series The Biscuit Factory. He is a former academic and the editor of Hector Drummond Magazine. He tweets at hector_drummond.