That Data Discontinuity Is Probably Not What You Think
I could easily make reported crime in this country skyrocket tomorrow with one simple change: Imagine Congress passed a law, roughly equivalent to how things like school lunches are funded, that federal law enforcement dollars would flow to cities in proportion to the number of crimes they experience. Suddenly, at the next reporting period, it would appear that crime has skyrocketed -- without any real change on the ground -- as cities scramble to harvest as much money as possible to report as much crime as possible. The cities that choose not to submit data into the various FBI data bases today would suddenly be sending in full disclosures. With time, cities might even get creative by tweaking the definition of crime -- maybe assaults would be expanded to killing someone's pet or to forcing someone to watch the View.
An observer in 2045 without much detailed knowledge of this dataset would write that there was an explosion in crime in 2025. As they often do, those who are politically active would ascribe the cause to whatever they are already against -- perhaps they might blame it on Trump, or immigrants, or "defund the police", or racism or whatever. They would argue and argue about the causes of what in truth was a just a change in how the data was collected and defined.
I have reported this phenomenon before.
- Critics of the US healthcare system often point out that our infant mortality is much higher than in Europe, but it turns out that the US and Europe use totally different data definitions so the numbers really are not comparable (TL;DR: US counts all born alive babies as a birth while countries like Norway don't count very low birth weight babies as a real birth, and most of the mortality is in this category they do not count).
- Some years ago I called BS on a climate report that showed a huge rise in weather-related grid outages as a proxy for increasing severe weather. I hypothesized it was a change in data definition and data gathering rather than an enormous change (in less than 2 years) in the weather. Contact with the data owner proved me right
- Speaking of climate, one of the best examples of this is the rise in reported US tornado numbers since 1950, which was initially blamed on climate change (of course) but turns out to be almost entirely an artifact of better tornado detection equipment (eg doppler radars and storm chasers).
- This is a frequent problem in the cancer world, where better detection often is hard to untangle from changes in the underlying cancer rates
The latest example involves RFK Jr and the MAHA/vaccine set. Via Flowing Data, which quotes the NY Times
Many large studies have come to the same conclusion: Vaccines don’t cause autism. The role, if any, of environmental toxins is still to be determined, but there is no known environmental factor that can explain the sudden jump in diagnoses. The changes we made to the diagnosis in the D.S.M.-IV can.
Why did autism-related diagnoses explode so far beyond what our task force had predicted? Two reasons. First, many school systems provide much more intensive services to children with the diagnosis of autism. While these services are extremely important for many children, whenever having a diagnosis carries a benefit, it will be overused. Second, overdiagnosis can happen whenever there’s a blurry line between normal behavior and disorder, or when symptoms overlap with other conditions. Classic severe autism had so tight a definition it was hard to confuse it with anything else; Asperger’s was easily confused with other mental disorders or with normal social avoidance and eccentricity. (We also, regrettably, named the condition after Hans Asperger, one of the first people to describe it, not realizing until later that he had collaborated with the Nazis.)