**By Brian Castellani, Professor of Sociology (University of Durham) **

**24th March 2020 (originally published on the Sociology and Complexity Science Blog) **

*BLOG POST 3 of N*This post is the 3rd of several devoted to addressing the complex challenges of modelling the coronavirus as a public health issue. It is also about clarifying for a wider audience how and why such modelling is important, as well as the value and power of complex systems thinking and computational modelling for public health policy.

QUICK SUMMARY

The focus of the first post was to explain how models influence public health policy and why some models are better at modelling COVID-19 than others, given the challenge of complexity. The post ended asking the question: So, what does an effective model look like? (CLICK HERE for the first post) In response I said I would review two of the models getting the most attention. Before turning to these models, however, the second post reviewed, from a complex systems perspective, what a public health model of infectious disease looks like in the first place. (CLICK HERE for the second post) The current post moves on to review the first of our two models: the simulation model by Ferguson and colleagues at Imperial College London. The fourth post will review the complex network model by Vespignani and colleagues at Northeastern University in the States

QUICK SUMMARY

The focus of the first post was to explain how models influence public health policy and why some models are better at modelling COVID-19 than others, given the challenge of complexity. The post ended asking the question: So, what does an effective model look like? (CLICK HERE for the first post) In response I said I would review two of the models getting the most attention. Before turning to these models, however, the second post reviewed, from a complex systems perspective, what a public health model of infectious disease looks like in the first place. (CLICK HERE for the second post) The current post moves on to review the first of our two models: the simulation model by Ferguson and colleagues at Imperial College London. The fourth post will review the complex network model by Vespignani and colleagues at Northeastern University in the States

**There is no one way to model infectious disease**

As Matt Keeling and Pejman Rohani explain in

**, in the world of public health there are a variety of approaches to mathematically modelling infectious disease.**

*Modeling Infectious Diseases in Humans and Animals***In terms of methods**, for example, there are agent-based models, microsimulations, differential equations, statistical methods, Bayesian approaches, stochastic models, network analyses and geospatial modelling, to name of a few.

And, in terms of how these models are used, there are also a

**variety of theoretical frameworks**; that is, there are a variety of ways to conceptualise how infectious disease spreads through a population. These conceptualisations range from the very simple to the highly complex. For example, one of the simplest models, which is highly useful and still used at the base of most conceptualisations, is the

**SIR model**. The model consists of three compartments:

**S**for the number of

**s**usceptible,

**I**for the number of

**i**nfectious, and

**R**for the number

**r**ecovered (or immune) individuals. For anyone on social media, watching television or reading the newspapers, variants of the

**SIR mode**l have been shown in discussions about 'flattening the curve', as well as explanations about the value of herd immunity.

And, as the global conflict around which model or approach is correct has demonstrated (circa March 2020), there is a tradoff involved in choosing the right method and theoretical framework. For example, different models will yield different results. A network model is very good at showing how diseases spread through a population's network structure; in turn, agent-based models are very good at showing how people interact with and react to the spread of an infection. Meanwhile, differential equation models and stochastic models are good at predicting the prevalence or duration of a disease. Simpler models can be adapted to a variety of different situations, but lack enough specificity to often make useful predictions or forecasts. In turn, highly complex models are not easily adapted to new and different situations. Also, the more complex a model gets, the more difficult it is to discern what is causing what.

*It is within this modelling milieu that the model from Imperial is situated.*

**So, which modelling approach is Imperial College London using?**

Ferguson and colleagues describe their model as a

**microsimulation**. More specifically, their model is an individual-based, stochastic, spatially situated microsimulation for modelling the health outcomes of COVID-19. (We will unpack all of these terms in a minute!) The purpose of their model is to simulate and explore the impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand, given that we are maybe a year away from a vaccine. (Examples include home isolation of suspect cases, home quarantine of those living in the same household as suspect cases, and social distancing of the elderly and others at most risk of severe disease. Their test cases are the UK and USA.

*To make sense of their model, let's walk through their terms and ideas one at a time.***Microsimulations**(MSMs): As Carolyn Rutter and colleagues explain: "MSMs describe events and outcomes at the person-level with an ultimate goal of providing information that can guide policy decisions" (2012, p. 2). To do this, MSMs “simulate individual event histories associated with key components of a disease process; [which are aggregated] to estimate population-level effects of treatment on disease outcomes [e.g., social isolation versus developing herd immunity]" (2012, p. 1).

- Breaking this definition down, an
**event history**is simply a longitudinal record of when**events**occurred for an individual or a sample of individuals. For example, collecting information on a sample of people to determine if and when they had COVID-19. - Components of an infectious disease process include: agent, portals of entry and exit, mode of transmission, immunity.
- In terms of aggregated estimates of the population-level effects of treatment, MSMs predict aggregate trends in disease incidence and mortality under alternative health policy scenarios, or compare the effectiveness and cost-effectiveness of public health strategies, be it health costs or economic wellbeing.

Steps in developing and running an MSM: As Rutter and colleagues explain, “there are three essential steps in developing any MSM: (1) Identifying a fixed number of distinct states and characteristics associated with these states; (2) Specifying stochastic rules for transition through states; and (3) Setting values for model parameters" (2012, p. 2). So, to summarise:

- microsimulation is a modelling technique that operates at the level of individual units such as persons, households, vehicles or firms.
- within the model each unit is represented by a record containing a unique identifier and a set of associated attributes – e.g. a list of persons with known age, sex, marital and employment status; or a list of vehicles with known origins, destinations and operational characteristics.
- a set of rules (transition probabilities) are then applied to these units leading to simulated changes in state and behaviour.
- these rules may be deterministic (probability = 1), such as changes in susceptibility to catching COVID-19 resulting from changes in social distancing, or stochastic (probability <=1), such as chance of recovering from COVID-19 within a given time period.
- in either case the result is an estimate of the outcomes of applying these rules, possibly over many time steps, including both total overall aggregate change and (importantly) the way this change is distributed in the population or location that is being modelled.

Short video demonstration of pandemic evolving across time/space

**Stochastic**: As our summary above just suggested, there are two basic types of microsimulation models. The first is deterministic. In such a model, everything, including the behaviours of the people in them, is defined and fixed. There is no room for chance. Clocks, for example, are deterministic models of time. As such, given some set of initial conditions, the outcome is basically always the same. It is 7AM every day at basically the same time. Or is it? While clocks are rather good there is still a degree of error and randomness in the model. And that is just a clock. Once we move to the level of such highly complex and dynamic phenomena as the spread of infectious disease and pandemics, often there is no choice but to embrace the randomness or what is also referred to as its

*stochasticity*.

In infectious disease modelling, stochasticity is the 'chance' element in disease transmission. As Keeling and Rohani explain, "

**Stochastic models**are concerned with approximating or mimicking [the random or probabilistic element of disease transmission]. In general, the role played by chance will be most important whenever the number of infectious individuals is relatively small, which can be when the population size is small, when an infectious disease has just invaded, when control measures are successfully applied, or during the trough phase of an epidemic cycle. In such circumstances, it is especially important that stochasticity is taken into account and incorporated into models" (2011, p. 190). And stochasticity not only emerges in terms of randomness, but also the hard reality that we will never have perfect data.Thinking of the Imperial College model then, a stochastic individual-based model of COVID-19 seeks to model explicitly random events at the level of the individuals in a population -- in this case the UK and the United States -- based on the best available data (which currently is not great!) and the population's differences in key characteristics, such as where they live, the social distancing practices in which they engage, the types of policies implemented and, over the course of the next year, the availability of vaccinations.

Because of schochasticity, however, a microsimulation is not run just once. Instead, given some set of initial conditions and some set or paramters within which differences are allowed to fall -- say, for example, rates of infection -- "multiple simulations are required to determine the expected range of behavior" (Keeling and Rohani 2011, p. 190).

Relative to this idea of running multiple microsimulations, Keeling and Rohani (2011) make another important point, which needs to be considered when understanding the catious nature of how the Imperial model reports its results. Keeling and Rohani state, "The most obvious element of any stochastic model is that different simulations give rise to different outcomes. This implies that although the general statistical properties (such as the mean and the variance) may be accurately predicted, it is generally impossible to predetermine the precise disease prevalence at any given point in the future" (2011 p. 191).

In terms of COVID-19, this is where people mistake the public health community's differences of opinion for some sort of modelling failure. But, it is not! The vagaries of these models are not demonstrations of confusion or perplexity. Instead, they are the hard realities of the uncertainties of forecasting, given the complexity of this pandemic, the availability of good data, the challenges of getting people to practice what the model preaches, and the unknowns (knock-on effects) presently unknown. That is why, if the reader recalls from my

**2nd blog post**, modelling is best done collectively (we need more than one model) and democratically and self-critically through co-production and participatory research, including not only policy makers but also healthcare providers and the general public.**Individual-based**: As already hinted at, part of why microsimulations are called

*micro*is because they focus on individuals at the microscopic level. That is not to say they do not provide macroscopic insights into how a virus, for example, spreads across a population -- because, in fact, that is exactly what they do. They just do it by examining how the disease spreads from person to person based on some set of key characteristics and, in turn, some key set of interventions.

As Keeling and Rohani explain, "Models with demographic stochasticity force the population to be made up of individuals, but all individuals of the same type are compartmentalized and measured by a single parameter—there is no distinction between individuals in the same class. In contrast,

**individual-based models**monitor the state of each individual in the population. For example, in models with demographic stochasticity we know how many individuals are in the infectious class but not*who*they are, whereas in individual-based models we know the state of each individual and hence*which*individuals are infectious. Individual based models are therefore often more computationally intensive (especially in terms of computer memory) because each individual in the population must have its status recorded" (2011, p. 217).In thinking about COVID-19, this

*'need to know'*is why testing is so important: our data for running our models is only as good as our knowledge of who gets infected and how, who they have interacted with, the incubation period of their illness, whether they were asymptomatic or not (and to what extent) and their health outcome, as well as where they live and so forth. Still, even with such an approach, we seldom model the entire population. Which takes us to the next point.**Spatial**: As you might have already guessed, stochasticity is heavily dependent upon not only time but also space. As such, if one wants to model the unique and nuanced differences in how disease transmission takes place in a particular country or region of the world, it will need to be spatially grounded. As Keeling and Rohani explain, "Spatial structure, the subdivision of the population due to geographical location, is another situation where stochasticity plays a vital and dominant role. Without underlying heterogeneities in the parameters at different locations, most deterministic models are asymptote to a uniform solution at all locations, thereby negating the necessity of a spatial model. However, in a stochastic setting, different spatial locations experience different random effects, and spatial heterogeneity may therefore be maintained" (2011, p. 221).

Different spatial locations also experience differences in density, proximity, etc, as well as how a policy, strategy or intervention is implemented, as well as different population densities. For example, in the case of COVID-19, living in an urban versus rural community, for example, strongly impacts how well someone can engage in social distancing, access grocery stores, self-isolate, get testing, access critical care, and so forth.

4-minute video summary of spatial microsimulation

**How does their microsimulation work?**

So, now that we have a basic sense of how microsimulations work, we can turn specifically to the model Ferguson and colleagues developed. The easiest way to overview their model is to copy-and-paste the excellent summary they provide at the beginning of

**Report 9**(released 16 March 2020) and add comments for clarification (my comments are in green font):We modified an individual-based simulation model developed to support pandemic influenza planning to explore scenarios for COVID-19 in GB. The basic structure of the model remains as previously published.THEY STATE:

Castellani comment: In 2005, Elizabeth Halloran and colleagues (including Ferguson) published 'Modeling targeted layered containment of an influenze pandemic in the United States'. In 2005, Ferguson and colleagues also published, 'Strategies for containing anemerging influenza pandemic in Southeast Asia'. And, finally, in 2006, Ferguson and colleagues published 'Strategies for mitigating an influenza pandemic' The COVID-19 model Ferguson and colleagues used was adopted from these studies. The last two articles, in particular, provide a supplementary paper that outlines in detail the original model, as well as videos on the spread of a pandemic in Thailand, UK and USA. You can download the videos and watch them.

[In our model] individuals reside in areas defined by high-resolution population density data. Contacts with other individuals in the population are made within the household, at school, in the workplace and in the wider community. Census data were used to define the age and household distribution size. Data on average class sizes and staff-student ratios were used to generate aTHEY STATE:syntheticpopulation of schools distributed proportional to local population density. Data on the distribution of workplace size was used to generate workplaces with commuting distance data used to locate workplaces appropriately across the population. Individuals are assigned to each of these locations at the start of the simulation.

Castellani comment: the last sentence above is particularly important for a microsimulation -- the idea that, once the setting and all of the basic parameters are estabalished, based on all sort of real-world data, individuals in the simulated version of the UK and USA are assigned a location on the map and the simulation is started.

Transmission events occur through contacts made between susceptible and infectious individuals in either the household, workplace, school or randomly in the community, with the latter depending on spatial distance between contacts. Per-capita contacts within schools were assumed to be double those elsewhere in order to reproduce the attack rates in children observed in past influenza pandemics. With the parameterisation above, approximately one third of transmission occurs in the household, one third in schools and workplaces and the remaining third in the community. These contact patterns reproduce those reported in social mixing surveys.THEY STATE:

We assumed an incubation period of 5.1 days. Infectiousness is assumed to occur from 12 hours prior to the onset of symptoms for those that are symptomatic and from 4.6 days after infection in those that are asymptomatic with an infectiousness profile over time that results in a 6.5-day mean generation time. Based on fits to the early growth-rate of the epidemic in Wuhan10,11, we make a baseline assumption that R0=2.4 [number of people a sick person will infect],but examine values between 2.0 and 2.6.

Castellani comment: If readers recall from the 1st blog post, R_{o}is the reproduction number, which is defined as how many people each infected person can infect if the transmission of the virus is not hampered by quarantines, face masks, or other factors.

We assume that symptomatic individuals are 50% more infectious than asymptomatic individuals. Individual infectiousness is assumed to be variable, described by a gamma distribution with mean 1 and shape parameter alpha=0.25.THEY STATE:

Castellani comment: remember our review above about microsimulations being individual-based? Well, here it comes into play insomuch as the model assumes that individual infectiousness varies amongst people based on key social and demographic differences, which is important as the population is not a uniform group of people. This relationship can be graphed using a gamma distribution, which is useful for showing the time until k events (k infections) given the rate of infection and because these distributions are always skewed. Also, the guess at people with coronavirus being 50% more infectious is an assumption based on what we know and don't know about infectious diseasese similar to COVID-19, but it is not by any means exact. In other words, there is a bit of handwaving going on here. But, those are the realities of modelling and why modellers tell each other what they did, so others can try different scenarios.

Example of Gamma Distributions

On recovery from infection, individuals are assumed to be immune to re-infection in the short term. Evidence from the Flu Watch cohort study suggests that re-infection with the same strain of seasonal circulating coronavirus is highly unlikely in the same or following season (Prof Andrew Hayward, personal communication).THEY STATE:

Infection was assumed to be seeded in each country at an exponentially growing rate (with a doubling time of 5 days) from early January 2020, with the rate of seeding being calibrated to give local epidemics which reproduced the observed cumulative number of deaths in GB or the US seen by 14th March 2020.

Castellani comment: I am not sure why, but in my 23 years of being a professor, one of the concepts students regularly struggle to understand is exponential growth. In mathematics and statistics there are lots of different types of exponential functions. The one I find easiest to explain is something that triples at each time step. Consider a disease in a population of N=1,000 where on Day1, two people have it. If nothing is done to stop the disease, here is how it progresses:

Day 1 = 2 people infected

Day2 = 2*3 = 6

Day3 = 6*3 = 18

Day4 = 18*3 = 54

Day5 = 54*3 = 162

Day6 = 162*3 = 486

Day7 = 486*3 = 1458

By Day 7 everyone has the virus

**What does the Imperial College London simulation tell us?**

As we stated earlier, the purpose of the model developed by Ferguson and colleagues is to simulate and explore the impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand, given that we are maybe a year or more away from a vaccine.

*So, what did they learn?*Here is the summary they present in their report as of 16 March 2020. As the pandemic progresses, there will obviously be updated modelling and reports. (CLICK HERE For a complete copy of Report 9 and the other reports available at their website).

The global impact of COVID-19 has been profound, and the public health threat it represents is the most serious seen in a respiratory virus since the 1918 H1N1 influenza pandemic. Here we present the results of epidemiological modelling which has informed policymaking in the UK and other countries in recent weeks. In the absence of a COVID-19 vaccine, we assess the potential role of a number of public health measures – so-called non-pharmaceutical interventions (NPIs) – aimed at reducing contact rates in the population and thereby reducing transmission of the virus. In the results presented here, we apply a previously published microsimulation model to two countries: the UK (Great Britain specifically) and the US. We conclude that the effectiveness of any one intervention in isolation is likely to be limited, requiring multiple interventions to be combined to have a substantial impact on transmission.

Two fundamental strategies are possible: (a) mitigation, which focuses on slowing but not necessarily stopping epidemic spread – reducing peak healthcare demand while protecting those most at risk of severe disease from infection, and (b) suppression, which aims to reverse epidemic growth, reducing case numbers to low levels and maintaining that situation indefinitely. Each policy has major challenges.

Figure 2 shows various UK mitigation strategies; Figure 3 shows UK suppression strategies.

We find that that optimal mitigation policies (combining home isolation of suspect cases, home quarantine of those living in the same household as suspect cases, and social distancing of the elderly and others at most risk of severe disease) might reduce peak healthcare demand by 2/3 and deaths by half. However, the resulting mitigated epidemic would still likely result in hundreds of thousands of deaths and health systems (most notably intensive care units) being overwhelmed many times over. For countries able to achieve it, this leaves suppression as the preferred policy option.

We show that in the UK and US context, suppression will minimally require a combination of social distancing of the entire population, home isolation of cases and household quarantine of their family members. This may need to be supplemented by school and university closures, though it should be recognised that such closures may have negative impacts on health systems due to increased absenteeism. The major challenge of suppression is that this type of intensive intervention package – or something equivalently effective at reducing transmission – will need to be maintained until a vaccine becomes available (potentially 18 months or more) – given that we predict that transmission will quickly rebound if interventions are relaxed. We show that intermittent social distancing – triggered by trends in disease surveillance – may allow interventions to be relaxed temporarily in relative short time windows, but measures will need to be reintroduced if or when case numbers rebound. Last, while experience in China and now South Korea show that suppression is possible in the short term, it remains to be seen whether it is possible long-term, and whether the social and economic costs of the interventions adopted thus far can be reduced.

**What is missing from the Imperial model?**

While the COVID-19 microsimulation model designed by Ferugson and colleagues is to be highly commended for all that it does, no single model, no matter how great, is perfect nor sufficient. As we discussed in the previous two blog posts, the best approach therefore is to think of these models as learning tools and also to situate any one model within a wider suite or ensemble of models. And the group at Imperial College London would be the first to make this point (as well as the points below)!

So, what needs to be improved? And what else is needed?

Given all of the above limits and the need for more and other modelling, at the end of the day, all model-based policy recommendations need to be taken into account

So, what needs to be improved? And what else is needed?

**First, we need to remember that this model is best seen as a learning tool.**- Second, we need to take into account the list of issues I provided at the end of my 2nd post (click here to review)
- We also need to remember that all such data forecasting is within a certain set of parameters, and so things are most likely to play out in a different way, as such the model could be wrong.
- Which is why such models need to be run again.
- It is also why we need to constantly make use of the error in our models, in order to take into account what we do not know, and what we do not know that we do not know!
- Also, the shorter the time-frame being examined, the better the chance of being more accurate. For example, it is easier to suggest what might happen in five weeks than what will happen five months from now.
- Also, relative to these forecasting challenge, we need better data -- which is why testing is so important!
- The underlying mechanisms (both social and biological) need to be understood about COVID-19. In other words, we need better theoretical and conceptual models, which we presently lack.
**We also need to augment this microsimulation with other models.**- We need to explore how complex network models help to illuminate the results of microsimulation, as they do more than just focus on individuals; they also focus on the social networks of everyone and how they link-up to one another.
- In this way, complex network models can show us not only how a virus spreads spatially and temporally, but through which particular networks, which gives us very specific insights into which sections of a network are best for us to place an intervention. (We will discuss this in greater detail in the next post.)

- We also need to compare these microsimulations with agent-based models (ABMs). ABMs are very good at showing how an infection spreads through people's interactions with one another as well as how they react to the policies implemented.

- For example, a microsimulation would not be as good at modelling how students on Spring Break in Florida or people at the beaches in Europe spread COVID-19 by not practicing social distancings despite being asked to do so. ABMs would, however, be very good at this.
- In short, microsimulations do not model well social interactions. Which is why ABMs are highly useful, as they allow us to explore how social behaviours and social life impact the spread of a virus. And, they allow us to explore how individuals, groups, communities, complex networks, etc interact with the policies and strategies put into place to mitigate or suppress the disease. As illustration, see this excellent blog post by Wander Jager on ABMS and COVID-19.

**Summary**Given all of the above limits and the need for more and other modelling, at the end of the day, all model-based policy recommendations need to be taken into account

__cautiously__. But, we need to also make use of science. We never have lived in world where serious science should be ignored, particularly when it comes to highly complex issues such as pandemics. And in a very harsh and upsetting way, COVID-19 is demonstrating that, contrary to the post-fact-anti-expert political and cultural climates that so many people, the world over, have willingly embraced, science and medicine are not simply social constructions; instead, they are often the difference between life and death.