COVID-19 And Some Thoughts on Data Analysis

I am not going to take a position on COVID-19 severity now, if for no other reason as I am not an expert and I think its fine not to clutter the debates about virus responses too much with non-experts (though it is wrong, as discussed below, to censor experts who have heterodox opinions).  I am convinced COVID-19 is "not just the flu" but when I see the governor of Texas being told that there will be a million deaths in Texas alone if there is not a hard quarantine there -- well, I am skeptical.  Like with global warming, the full denier and total alarmist positions are likely both wrong -- with a lot of bad data analysis in the media along the way.  I have decided to focus on the latter.  So here are a few random thoughts:

  • The data we have sucks, and thus any conclusions we are drawing mostly suck too.   The data is worse than just being incomplete or bad -- if it was randomly distributed, we could live with that.  But the lack of test kits and how we have deployed the few we have means that the data is severely biased.  We are only testing people who are strongly symptomatic.  If there is a normal distribution of outcomes from this disease, we are only testing on the right side of the distribution.  We have no idea where the median is or how long the tail is to the left side of asymptomatic outcomes.  The only thing we absolutely know about the disease is its not as deadly as the media is portraying as we are missing hundreds of thousands of cases in the denominator of the mortality rates.  The media has also been terrible about reporting on risk factors of those who died.  When a bunch of people died suddenly in Seattle, one had to read down 5 paragraphs into the story to find that they were all over 70 in an old-age home.  Or when prime-of-life people die, facts such as their being type 1 diabetics -- a known severe risk factor for this virus (and one that makes it different from the flu) are left out.
  • The media is constantly confusing changes in measurement technique and intensity with changes in the underlying progress of the virus itself.  Changes in case numbers have as much to do with testing patterns and availability than they do with the real spread of the disease.
  • While COVID-19 is likely worse than the normal flu, our perceptions of how much worse are strongly affected by observer bias.  Frankly, if every news broadcast every night spent 15 minutes reciting flu deaths each day, we would all be hiding in our homes away from flu.  They present a healthy man in his thirties dying clearly as the tragedy it is, but the spoken or unspoken subtext is, "this is abnormal so this thing is much worse."  But it seems abnormal because we do not report on the very real stories of healthy young people who die of the flu.  My nephew who was 25 years old and totally healthy with no pre-existing conditions died of the flu last month -- and no one featured this tragedy on the national news.
  • The data we are getting sucks worse because the media has decided, as one big group, that for our own good they are going to limit all facts about the virus to only the bad ones.  There is a strong sense -- you see it on Twitter both in Twitter's policies as well as Twitter group attacks -- that saying anything that might in any way reduce one's fear of the disease should be banned for our own good.  One of the more prominent examples was Medium removing an article NOT because it was proven wrong but because it took one side of a very open question and it was obviously decided it was "unsafe" to allow that side to even be aired.

    This strikes me as a terrible precedent and one with a very slippery slope.  We have had to fight this attitude for years in the climate debate, the bad idea that good science is unacceptable if it gets to the wrong answer.
  • The media is never more dangerous than when it understands a little about a scientific topic.  After 40 years of engineering experience with feedback phenomena and exponential effects like positive feedbacks, the media suddenly thinks its the expert now and needs to lecture me that I don't really understand the power of exponential spread.  They are right that exponential disease spread with a highly transmissible virus is dangerous, but their 3rd grade math understanding is so simplistic it makes me scream.  Yes I understand the growth math, but I also understand that the same growth math says that a single bacteria colony in a month of growth should consume the whole Earth and a single chunk of plutonium that fissions indefinitely could destroy the planet.  But neither happens because there are brakes on the doubling process in later iterations.  I don't know in the case of COVID-19 if these brakes are strong or weak, but showing me mindless doubling trees is just insulting.
  • Many of the computer model results I am seeing make no sense to me.  I am exhausted with people talking about computer models as if they are some fact, rather than a really opaque calculation on some researcher's set of non-transparent hypotheses.  The only way I respect a computer model is if someone presents it this way, "If X, Y, and Z are true, and you assume A and B, then this model shows what the result might be, with some large error ranges."  Add to this the fact that most modelers run a range of models based on a range of inputs that yield a range of outputs, and then the media picks the most extreme of all these outcomes and presents it as "the model results of experts" without even showing the range of other outcomes.  Arnold Kling wrote something I nodded my head to about COVID-19 and data modelling:

Once you build a model that is so complex that it can only be solved by a computer, you lose control over the way that errors in the data can propagate through the model. For me, it is important to look at data from a perspective of “How much can I trust this? What could make it misleadingly high? What could make it misleadingly low?” before you incorporate that data into a complex model with a lot of parameters.

  • It will be interesting to see if anyone goes back to the models making the national news today and reconciles them to actual results.  Certainly no one ever does this in the climate debate, so I am not holding my breath.
  • Frankly, I am done with the Precautionary Principle.  This does not mean I am against taking precautions, even strong and expensive precautions, against bad things.  But I am done with the notion that one should ignore the costs of these precautions and not make sensible tradeoffs.  This is even true when trading off the risk to life on one hand with reduction of economic outcomes on the other.  This is in part because reduced economic activity has real effects on human misery and has direct correlations with lifespan and well-being.

Update:  This is exactly the kind of thing I would like to see more of.  Kudos to 538.  When people rattle off ridiculous figures, it causes me to tune out.  I take this seriously.