Coyote Blog

Archive for the ‘Data Analysis’ Category.

That Data Discontinuity Is Probably Not What You Think

June 25, 2025, 4:16 pm

I could easily make reported crime in this country skyrocket tomorrow with one simple change: Imagine Congress passed a law, roughly equivalent to how things like school lunches are funded, that federal law enforcement dollars would flow to cities in proportion to the number of crimes they experience. Suddenly, at the next reporting period, it would appear that crime has skyrocketed -- without any real change on the ground -- as cities scramble to harvest as much money as possible to report as much crime as possible. The cities that choose not to submit data into the various FBI data bases today would suddenly be sending in full disclosures. With time, cities might even get creative by tweaking the definition of crime -- maybe assaults would be expanded to killing someone's pet or to forcing someone to watch the View.

An observer in 2045 without much detailed knowledge of this dataset would write that there was an explosion in crime in 2025. As they often do, those who are politically active would ascribe the cause to whatever they are already against -- perhaps they might blame it on Trump, or immigrants, or "defund the police", or racism or whatever. They would argue and argue about the causes of what in truth was a just a change in how the data was collected and defined.

I have reported this phenomenon before.

Critics of the US healthcare system often point out that our infant mortality is much higher than in Europe, but it turns out that the US and Europe use totally different data definitions so the numbers really are not comparable (TL;DR: US counts all born alive babies as a birth while countries like Norway don't count very low birth weight babies as a real birth, and most of the mortality is in this category they do not count).
Some years ago I called BS on a climate report that showed a huge rise in weather-related grid outages as a proxy for increasing severe weather. I hypothesized it was a change in data definition and data gathering rather than an enormous change (in less than 2 years) in the weather. Contact with the data owner proved me right
Speaking of climate, one of the best examples of this is the rise in reported US tornado numbers since 1950, which was initially blamed on climate change (of course) but turns out to be almost entirely an artifact of better tornado detection equipment (eg doppler radars and storm chasers).
This is a frequent problem in the cancer world, where better detection often is hard to untangle from changes in the underlying cancer rates

The latest example involves RFK Jr and the MAHA/vaccine set. Via Flowing Data, which quotes the NY Times

Many large studies have come to the same conclusion: Vaccines don’t cause autism. The role, if any, of environmental toxins is still to be determined, but there is no known environmental factor that can explain the sudden jump in diagnoses. The changes we made to the diagnosis in the D.S.M.-IV can.

Why did autism-related diagnoses explode so far beyond what our task force had predicted? Two reasons. First, many school systems provide much more intensive services to children with the diagnosis of autism. While these services are extremely important for many children, whenever having a diagnosis carries a benefit, it will be overused. Second, overdiagnosis can happen whenever there’s a blurry line between normal behavior and disorder, or when symptoms overlap with other conditions. Classic severe autism had so tight a definition it was hard to confuse it with anything else; Asperger’s was easily confused with other mental disorders or with normal social avoidance and eccentricity. (We also, regrettably, named the condition after Hans Asperger, one of the first people to describe it, not realizing until later that he had collaborated with the Nazis.)

Tags: Autism Diagnosis, Cancer Detection, climate change, Crime Reporting, Data Collection, Data Discontinuity, Healthcare System, Infant mortality, Political Misinterpretation, Tornado Detection
Category: Data Analysis | 3 Comments

Creating Conspiracies By Reading History Backwards

November 12, 2019, 10:57 am

Sorry for the absence, I have taken a bit of vacation and simultaneously been consumed in a deluge of interest for our company's new offerings.

I saw this story a while back, titled "Japan's General Staff Office Knew About Hiroshima and Nagasaki Atomic Bombing in Advance and Did Nothing, According to 2011 NHK Documentary" I am only going by the author's summary because I can't understand the Japanese original, but this fits in with a whole class of revisionist history of which I have written before. A historian digs through piles and piles of intelligence reports and decrypts and finds 2 or 3 that seem to point in advance to some catastrophic event in advance of that event. A classic example was the revisionist claim that FDR knew in advance of Pearl Harbor but willfully ignored the warnings because he wanted a reason to pull isolationist US into the war with Germany. More recently, whole conspiracy theories rest on similar hints that the GWB White House knew about the 9/11 attacks in advance.

The problem with all these theories is that they are reading history backwards. Intelligence agencies weed through thousands of rumors, decrypts, and hints every day. The historian can wade through this mass and latch onto the couple of correct and prescient such rumors because she knows how history turns out. She knows Japan bombed Pearl Harbor so she knows how to jump right to the needle in the haystack. But officials at the time had no such foreknowledge. Sure there may have been hints of attacks on 9/11 but there were also likely hints that turned out to be incorrect on scores of other potential plots and attacks, plots that would have (at the time) looked no more or less realistic than a hinted attack on 9/11.

There is a related problem that is a pet peeve of mine related to probability. Let's say I offered you a 50/50 bet that you would win if a 6-sided die came up 1-5 but lose if the die came up 6. Clearly, all day long the right decision is to take the bet. But then imagine you took the bet and the day came up 6. Was this, in retrospect, a bad decision? I would argue absolutely not, you made a great decision that simply did not work out this one time, but over time making similar decisions will be a winner. On the flip side, imagine someone who took the opposite side of the bet, a 50/50 bet that only pays off with a 6. If a 6 comes up, did they make a good decision? Absolutely not. It was a terrible decision that they got bailed out on by luck, but over time they are going to bankrupt themselves.

These may seem like contrived examples, but I see exactly this sort of bad analysis all the time of risky decisions taken in an array of fields from sports to business. I am sorry, but a football coach that goes for it on 4th and 8 from his own 30 and makes a first down did NOT make a good decision, despite the fact it worked out okay this one time. But almost everyone in the media brings a retrospective bias to analysis of such decisions, rating them a good decision if they worked out all right and a bad decision if it did not work out all right, irrespective of whether the decision, when made, made a lick of sense.

Category: Data Analysis | Comments Off

Being Skeptical of Data, Even When It Supports Your Position - Fire Edition

September 26, 2018, 2:25 pm

This is the, uh, whateverth installment in a series on using your common sense to fact check data, even when the data is tantalizingly useful for the point one is trying to make.

For the last decade or so, global warming activists have used major fires as further "proof" that there is a global warming trend. Often these analyses are flawed, for a variety of reasons that will be familiar to readers, e.g.

A single bad fire is just one data point and does not prove a trend, you need a series of data to prove a trend
There is no upward trend in US acreage in fires over the last 10 years, but there is in the last 20 years, which gives lots of nice opportunities for cherry-picking on both sides
Acres burned is a TERRIBLE measure of global warming, because it is trying to draw global trends from a tiny fraction of the world land mass (western US); and because it is dependent on many non-climate variables such as forest management policies and firefighting policy.
The better more direct metric of possible warming harm is drought, such as the Palmer drought severity index, which shows no trend (click to enlarge below)

An even better metric, of course, is that there IS an actual upward trend in temperatures. There is not, however, much of an upward trend in bad weather like drought, hurricanes, or tornadoes. In this context fire is a third order variable (temp--->drought---> fire) which makes it a bad proxy, particularly when the first order variable is telling the tale.

AAAAaaaand then, there is this chart, much loved by skeptics, for long-term US fire history:

I am pretty sure that I have avoided ever using this piece of skeptic catnip (though I could be wrong, I can have moments of weakness). The reason is that nothing about this chart passes the smell test. While it is true that the 1930's were super hot and dry, likely hotter in the US than it has been this decade, there is absolutely no reason to believe the entire period of 1926-1952 were so much higher than today. Was there a different fire management policy (e.g. did they just let all fires burn themselves out)? Was there a change in how the data was recorded?

Here is my rule of thumb -- when you see a discontinuity like this (e.g. before and after 1955) you better have a good explanation and understanding of the discontinuity. This is not just to be a good person and be true to good scientific process (though we all should) but also from the practical and selfish desire to avoid having someone come along who DOES know why the discontinuity exists and embarrass you for your naivete.

I have never trusted this chart, because I have not really understood it. This week, the Antiplanner (who before he focused on transit focused most of his writing on the Forest Service and forest policy) has an explanation.

The story begins in 1908, when Congress passed the Forest Fires Emergency Funds Act, authorizing the Forest Service to use whatever funds were available from any part of its budget to put out wildfires, with the promise that Congress would reimburse those funds. As far as I know, this is the only time any democratically elected government has given a blank check to any government agency; even in wartime, the Defense Department has to live within a budget set by Congress.

This law was tested just two years later with the Big Burn of 1910, which killed 87 people as it burned 3 million acres in the northern Rocky Mountains. Congress reimbursed the funds the Forest Service spent trying (with little success) to put out the fires, but — more important — a whole generation of Forest Service leaders learned from this fire that all forest fires were bad....

This led to a conflict over the science of fire that is well documented in a 1962 book titled Fire and Water: Scientific Heresy in the Forest Service. Owners of southern pine forests believed that they needed to burn the underbrush in their forests every few years or the brush would build up, creating the fuels for uncontrollable wildfires. But the mulish Forest Service insisted that all fires were bad, so it refused to fund fire protection districts in any state that allowed prescribed burning.

The Forest Service’s stubborn attitude may have come about because most national forests were in the West, where fuel build-up was slower and in many forests didn’t lead to serious wildfire problems. But it was also a public relations problem: after convincing Congress that fire was so threatening that it deserved a blank check, the Forest Service didn’t want to dilute the message by setting fires itself.

When a state refused to ban prescribed fire, the Forest Service responded by counting all fires in that state, prescribed or wild, as wildfires. Many southern landowners believed they needed to burn their forests every four or five years, so perhaps 20 percent of forests would be burned each year, compared with less than 1 percent of forests burned through actual wildfires. Thus, counting the prescribed fires greatly inflated the total number of acres burned.

The Forest Service reluctantly and with little publicity began to reverse its anti-prescribed-fire policy in the late 1930s. After the war, the agency publicly agreed to provide fire funding to states that allowed prescribed burning. As southern states joined the cooperative program one by one, the Forest Service stopped counting prescribed burns in those states as wildfires. This explains the steady decline in acres burned from about 1946 to 1956.

There were some big fires in the West in the 1930s that were not prescribed fires. I’m pretty sure that if someone made a chart like the one shown above for just the eleven contiguous western states, it would still show a lot more acres burned in real wildfires in the 1930s than any decade since — though not by as big a margin as when southern prescribed fires are counted. The above chart should not be used to show that fires were worse in the 1930s than today, however, because it is based on a lie derived from the Forest Service’s long refusal to accept the science behind prescribed burning.

There you go, the discontinuity seems to be from a change in the way the measurement is calculated.

By the way, I work closely with the Forest Service every day and mostly this partnership is rewarding. But I can tell you that the blank check still exists for fire suppression costs and results in exactly the sort of inefficient spending that you would imagine. Every summer, much Forest Service work comes to a halt as nearly every manager and professional gets temporarily assigned to fire -- something FS employees love because they get out of the grind of their day job and essentially get to go camping.

Tags: Big Burn, budget, Defense Department, Forest Service, FS, global warming, management, Rocky Mountains, TERRIBLE, US
Category: Data Analysis | Comments Off

Here is a Fun Challenge: Be Skeptical of Statistics Even When They Support Your Point of View

August 21, 2018, 9:21 am

I sometimes wonder if the media and the punditocracy have any ability any more to reality-check statistics. Two examples:

One

Trump supporters were running around in circles patting themselves on the back with this story:

African American business owners are on the rise. According to the Minority 2018 Small Business Trends survey, the number of black-owned small businesses in the U.S. increased by a staggering 400% in a year-over-year time period from 2017 to 2018.

I call bullsh*t on this. There is no WAY that the number of black-owned businesses increase by a factor of 5** in just one year. There are millions of black-owned small businesses in this country and there is no way this quintupled** in a year. It does not pass any kind of smell test. It is clearly some sort of measurement error, either a small sample size for a survey or a change in data source and definitions from one year to another. I could go investigate the study and try to figure out the cause but I do not even need to bother because economic and demographic data simply do not change at this pace in one year.

Two

The other example I have is this absurd figure:

A recent survey conducted by OVW and the Bureau of Justice Statistics found that an average of one in four undergraduate females experience sexual assault by the time they finish college.

Here is the deal with this stat: no one actually really believes it. Why do I say this with confidence? Because parents still send their daughters to college -- in fact they fight and scrap and invest huge amounts of time and money to send their daughter to college. If they really believed their little darling had a 1 in 4 chance of being sexually assaulted, they would never do so.

Here is a point of comparison: The Japanese brutal occupation of Nanjing, China is commonly known as the "Rape of Nanjing." It is called this in part because so many local women were raped. The numbers are fought over by historians, but the best estimate is that 20,000 of the approximately 100,000 women who were in Nanjing at the time were raped by Japanese soldiers, or about one in five. This means that if the one in four number is correct, then colleges are more dangerous for women than being in Nanjing during the Japanese occupation. Now, I would venture to guess that if I tried to stuff you daughter into a time machine and send her back to Nanjing on December 13, 1937 you would probably fight me to the death to prevent it. But parents don't act anything like this vis a vis going to college, ergo no one believes this figure. So why does everyone keep using it like it is accurate?

** I had put quadrupled but my son just called and reminded me that a 400% increase means quintupled. Thanks, Nic. Though I will say there is a good chance the source incorrectly used 400% to mean quadrupled, so I can't rule that out either.

Tags: African American, china, college, media, OVW, running, Small Business Trends
Category: Data Analysis | Comments Off

I Told You We Were Focused on the Wrong Thing

February 11, 2009, 2:01 pm

For years I have complained that the opposition to the GWB administration was focused on the wrong things vis a vis the detention policy at Gitmo. There was too much focus on Gitmo itself as a lightening rod, and too much discussion of whether flushing a Koran down the toilet was torture. My point was that there didn't have to be torture for it to be wrong to hold non-uniformed suspected non-combatants in a non-declared war indefinitely, as if they were captured Nazi U-boat commanders. For example:

I believe strongly that the Bush administration's invented concept of unlimited-length detentions without trial or judicial review is obscene and needed to be halted. But critics of Bush quickly shifted the focus to "torture" at Gitmo, a charge that in light of the facts appears ridiculous to most rational people, including me. As a result, the administration's desire to hold people indefinitely without due process has been aided by Bush's critics, who have shifted the focus to a subject that is much more easily defended on the facts.

Justice Scalia argued that giving habeas corpus rights to enemy combatants during war time was unprecedented, but I responded:

I don't have enough law background to know if this is truly unprecedented in this way, but what it if is? One could easily argue that the nature of the "enemy" here, being that they don't have the courtesy to wear uniforms that indicate their combatant status and which side they are on, is fairly unprecedented as well. As is the President's claim that he has unilateral power to declare that there is a war at all, who this war is against, and who is or is not a combatant. I know from past posts on this topic that many of my readers disagree with me, but I think it is perfectly fine [that] the Supreme Court, encountering this new situation, sides with the individual over the government.

So now, just as I feared, the soil was fertile for a classic political bait and switch. Obama agreed to close Gitmo, the lightening rod of the controversy, thereby inspiring us to believe he is changing policy. When, at its heart, the real problem is still there:

Harvard Law Dean Elena Kagan, President Obama's choice to represent his administration before the Supreme Court, told a key Republican senator Tuesday that she believed the government could hold suspected terrorists without trial as war prisoners.

She echoed comments by Atty. Gen. Eric H. Holder Jr. during his confirmation hearing last month. Both agreed that the United States was at war with Al Qaeda and suggested the law of war allows the government to capture and hold alleged terrorists without charges.

If confirmed as U.S. solicitor general, Kagan, 48, will defend the administration's legal policy in the courts.

I assume she and Holder are toeing the Obama line on this, though they could be the bearers of a trial balloon and it may be Obama has not made up his mind. I hope so. Here is some more.

"Do you believe we are at war?" Graham asked.

"I do, Senator," Kagan replied.

Graham cited the example of someone who is not carrying a gun or fighting on a battlefield. "If our intelligence agencies should capture someone in the Philippines that is suspected of financing Al Qaeda worldwide, would you consider that person part of the battlefield?" he asked. He added that he had asked the same question of Holder, who replied that he agreed that person was on the battlefield.

"Do you agree with that?" the senator said.

"I do," Kagan replied.

Graham said that under the law of war, the government can say, "If you're part of the enemy force, there is no requirement to let them go back to the war and kill our troops. Do you agree that makes sense?"

Kagan replied, "I think it makes sense, and I think you're correct that that is the law."

"So America needs to get ready for this proposition that some people are going to be detained as enemy combatants, not criminals," Graham concluded.

I may have missed it, but did the AUMF or whatever it was that Congress passed before we entered Afghanistan and Iraq actually declare we were at war with the organization named "Al Qaeda." Or does the president saying the words "war on terror" enough times in 8 years just make it so?

Tags: al queda, detention, gitmo
Category: Capitalism & Libertarian Philospohy, Data Analysis, Individual Rights | 5 Comments

In Medias Res

December 19, 2008, 10:23 am

You certainly don't have to spend very long convincing me that a significant government action can be distortive of markets, so I won't argue too much with Kevin Drum that the capital gains tax changes maybe played a contributing factor to the housing bubble (though it is hilarious that the left considers tax reductions as the only distortive government actions).

However, thinking back on events, its a little hard for me to ascribe the lion's share of the bubble to capital gains tax changes, as opposed to, say, the mortgage interest deduction or Federal Reserve interest rate policies or local zoning controls.

I probably wouldn't have bothered blogging on this, but I found the chart Drum uses from the NY Times to be hilarious:

Do you see the problem? I will help by simplifying the chart:

trand

Its a pretty heroic assumption to say that Event B caused Trend A.

Update: Russel Roberts thinks the Times is right, but that they are using the wrong data to prove it. 1997 looks much more like the critical inflection point if you look at prices rather than sales (chart via Roberts, from a different NY Times article, click to enlarge)

Tags: Housing, NY Times
Category: Data Analysis | 14 Comments

Coyote Blog

That Data Discontinuity Is Probably Not What You Think

Creating Conspiracies By Reading History Backwards

Being Skeptical of Data, Even When It Supports Your Position - Fire Edition

Here is a Fun Challenge: Be Skeptical of Statistics Even When They Support Your Point of View

I Told You We Were Focused on the Wrong Thing

In Medias Res

Join the CoyoteBlog Community

Recent Posts

Climate Summaries

Past Favorites

Archives

Categories

Search

Statistics