It is mid July. Most people have been sort of following the news on COVID19 for four months, trying to make sense of it all. After all, it is not every day that the world gets hit by a pandemic and all future plans lie in a state of suspension until who knows when, and you got to figure out what to do next based on what is happening. But how do you decide what is happening?
Enter data on COVID19. Lots of it.
All Data is Not Made Equal
Data can be overwhelming---and confusing. For starters, it is outright wrong to compare the datasets of one state/ country with another, without reading the fine print, and to read numbers alone, without paying attention to the exact metric being reported.
For example, when a chart reports "number of tests," are they reporting the number of samples tested or the number of individuals tested, or number of positive tests? When some news wire reports the "number of deaths," are they reporting the total number of deaths, deaths per number of cases, or deaths per million?
This is just a tiny preview of the huge nebula of facts and figures, which, instead of being information, adds to the misinformation on the pandemic, causing much anxiety, confusion and stress.
What Should We Measure?
The biggest challenge of this pandemic is that nothing was known about this when it broke out. For a few months after it broke out, still nothing was known about it, atleast not for sure. And almost six months later, there are still very few things that can be said with any certainty.
In this scenario, the best thing that can be done is monitoring what is actually happening (the reality), and trying to map the science and technology to make sense of the reality. This is where data comes in. Data helps us measure the impact of the problem, qualitatively and quantitatively, and helps prioritize what part(s) of the problem need a more urgent or permanent solution, and what parts can be managed with the prevailing uncertainty.
The chart above shows the total confirmed deaths across multiple countries on a logarithmic scale. This is just one example of a metric used to make sense of the severity of the pandemic.
Below is a low down and comparative analysis of which metric is useful in what context, and how it should be interpreted. If you understand these metrics, you will not need the media's comments on them to make sense of a given factoid.
Although most news and media outlets reports only deaths (for a country or state), it is important to remember that every state/ province and country is different demographically and geographically. Not only do they vary in size, they vary in population and population density. So it makes sense to always look at the number of total deaths per million. This gives an indication of the scale we are looking at. For example, you cannot compare the total number of deaths in Singapore with the total number of deaths in India. Moreover, in a disease like COVID19, population density also plays a very important role. The higher the population density, lesser the social distancing is likely to be, and hence, higher the risk of infection. There is no separate metric to measure the scale of devastation which factors in the population density, but number of deaths per million when broken down state-wise or city-wise, does give a reasonable idea of the impact.
It is important to note that the total number of 'confirmed deaths' (which charts report) is not necessarily the actual total number of deaths due to COVID19. This is for multiple reasons. For one, several cases of COVID19 involve co-morbidities, and different hospitals follow different protocols on which death can be attributed to COVID19 as the cause. Another reason is that in the event of insufficient hospitalization facilities, deaths happen at home as well, and there may not be a systematic way to count these. Yet another possible reason is that collecting and compiling this information into the database takes time. Often there are lags between the actual date of death, and it getting updated in the database. So the number of deaths reported on a given day is not necessarily reflective of deaths on that day, and cannot be used to track a very precise trend, but rather, the overall trend.
Number of Cases
There are various metrics being used for reporting the number of COVID19 cases: number of active cases, cumulative number of cases, number of new cases, number of cases per million, and so on. Most of these are reported as confirmed cases, and can be a daily or 7 day rolling average.
The number of active cases compared with the cumulative number of cases gives a sense of what the recovery rate is like for a country or state. Number of new cases gives a sense of how much or how fast the pandemic is spreading. Number of cases per million is representative of the extent of the spread and infection of the disease in a country, taking its size and population into account.
The total number of cases helps estimate the overall impact of the disease. Not everyone who suffers from COVID19 can go back to a normal life after recovery. There are financial and health constraints to consider. Many of these people cannot become a part of the workforce of the economy for several months, and that has economic implications for the country.
Mapping the daily new cases against the total number of cases gives context of how much a country is able to bend the curve. As opposed to what daily news seems to imply, many countries have been able to successfully bend the curve and reduce the infection rates.
Tests ascertain whether a given person is COVID positive at a given point of time. The same person can (and often does) take the test multiple times to check their condition.
Number of positive tests is an indicator of the increase in infection. But it is not accurately representative of the prevalence of the disease in a country or state. It is not the same as the total number of cases either. This distinction needs to be made early on. This is because the tests are performed only on people who want the test and take the test. Tests are not conducted on a random sample of people. If they were random, they could be representative of the population. But in the current pattern, there would be a selection bias in the data.
Moreover, many people who do have COVID often do not take the test, or are in a place where they do not have the proper facility to test. So their numbers never get added to the data on positive tests. Hence, the number of positives is not the same as the number of cases. The accuracy rate of tests is another factor to consider. There is little information on the accuracy of tests, and they are subject to errors. Without knowing the error of a certain test setup, the data on positive tests is only roughly indicative.
Number of tests performed per million people, and number of positives vs total number of tests are two metrics which give us a sense of how much testing is being done, and what the incidence rate roughly works out to be for a given country. If the number of positives is very high, more testing is probably advised, as there may be many more people who are infected but asymptotic.
Mortality rate is a commonly used term, though the "Case Fatality Rate" (CFR) is the more official term for reporting the number of confirmed deaths divided by the number of confirmed cases.
As will be evident from the figure, the CFR is not the same across different countries (as some people seem to assume). The CFR varies from one region to another based on multiple factors like the inherent immunity of the people, the policies in place (like lockdowns, mandatory masks, social distancing etc) and the population density of the place (which decides the level of social distancing actually achievable). So COVID19 has no fixed CFR.
People who have pre existing conditions seem to be affected more severely by COVID19 than those who have no health conditions, based on early findings. A study of patients in Italy and China suggested that people in the 60-80 yrs age group have a higher risk and CFR than those in the lower age groups.
So, the CFR varies drastically, and though it can be estimated for a given population set, it should not be taken as fixed.
Is It Okay to Take Decisions Based Only on Mortality Rate?
The goal of ploughing through all this data is to make informed decisions. There are plenty of questions yet to be answered. What metric(s) should be used to decide whether to open up schools or impose lockdown?
While there is no binary answer to these questions, one flawed (but popular) argument is that if the number of deaths per million is low, one can go back to "normal life."
The findings imply otherwise.
Many people who were afflicted with COVID19 in March, are yet to get back on their feet. There is mounting reason to believe that COVID19 is way more damaging than "just a flu." It seems to destroy the immunity of the patients and leaves them emaciated, where they are not technically "sick" but are not strong enough to resume normal work. They report symptoms like body pain, chronic fatigue, breathing trouble, and overall lack of energy.
So, if a huge number of people are getting infected, going back to normal is not realistic. The risk of infecting more people, destroying their ability to work, is not worth taking. It might be better to "stay home" and work from home while you still have the ability to work.
COVID19 may be here to stay for longer than anyone would like it to. In such a situation, knowing what you are dealing with, and being realistic about it, gives a better chance of survival, than ignoring the numbers and taking uninformed decisions.
PS: The article may be updated from time to time. In such an event, the updates will be highlighted.
All data and figures courtesy ourworldindata.org
Onder G, Rezza G, Brusaferro S. Case-Fatality Rate and Characteristics of Patients Dying in Relation to COVID-19 in Italy. JAMA. 2020;323(18):1775–1776. doi:10.1001/jama.2020.4683