Archive for the ‘Data Analysis’ Category.

Creating Conspiracies By Reading History Backwards

Sorry for the absence, I have taken a bit of vacation and simultaneously been consumed in a deluge of interest for our company's new offerings.

I saw this story a while back, titled "Japan's General Staff Office Knew About Hiroshima and Nagasaki Atomic Bombing in Advance and Did Nothing, According to 2011 NHK Documentary"  I am only going by the author's summary because I can't understand the Japanese original, but this fits in with a whole class of revisionist history of which I have written before.  A historian digs through piles and piles of intelligence reports and decrypts and finds 2 or 3 that seem to point in advance to some catastrophic event in advance of that event.  A classic example was the revisionist claim that FDR knew in advance of Pearl Harbor but willfully ignored the warnings because he wanted a reason to pull isolationist US into the war with Germany.  More recently, whole conspiracy theories rest on similar hints that the GWB White House knew about the 9/11 attacks in advance.

The problem with all these theories is that they are reading history backwards.  Intelligence agencies weed through thousands of rumors, decrypts, and hints every day.  The historian can wade through this mass and latch onto the couple of correct and prescient such rumors because she knows how history turns out.  She knows Japan bombed Pearl Harbor so she knows how to jump right to the needle in the haystack.  But officials at the time had no such foreknowledge.  Sure there may have been hints of attacks on 9/11 but there were also likely hints that turned out to be incorrect on scores of other potential plots and attacks, plots that would have (at the time) looked no more or less realistic than a hinted attack on 9/11.

There is a related problem that is a pet peeve of mine related to probability.  Let's say I offered you a 50/50 bet that you would win if a 6-sided die came up 1-5 but lose if the die came up 6.  Clearly, all day long the right decision is to take the bet.  But then imagine you took the bet and the day came up 6.  Was this, in retrospect, a bad decision?  I would argue absolutely not, you made a great decision that simply did not work out this one time, but over time making similar decisions will be a winner.  On the flip side, imagine someone who took the opposite side of the bet, a 50/50 bet that only pays off with a 6.  If a 6 comes up, did they make a good decision?  Absolutely not.  It was a terrible decision that they got bailed out on by luck, but over time they are going to bankrupt themselves.

These may seem like contrived examples, but I see exactly this sort of bad analysis all the time of risky decisions taken in an array of fields from sports to business.  I am sorry, but a football coach that goes for it on 4th and 8 from his own 30 and makes a first down did NOT make a good decision, despite the fact it worked out okay this one time.  But almost everyone in the media brings a retrospective bias to analysis of such decisions, rating them a good decision if they worked out all right and a bad decision if it did not work out all right, irrespective of whether the decision, when made, made a lick of sense.

Being Skeptical of Data, Even When It Supports Your Position - Fire Edition

This is the, uh, whateverth installment in a series on using your common sense to fact check data, even when the data is tantalizingly useful for the point one is trying to make.

For the last decade or so, global warming activists have used major fires as further "proof" that there is a global warming trend.  Often these analyses are flawed, for a variety of reasons that will be familiar to readers, e.g.

  • A single bad fire is just one data point and does not prove a trend, you need a series of data to prove a trend
  • There is no upward trend in US acreage in fires over the last 10 years, but there is in the last 20 years, which gives lots of nice opportunities for cherry-picking on both sides
  • Acres burned is a TERRIBLE measure of global warming, because it is trying to draw global trends from a tiny fraction of the world land mass (western US); and because it is dependent on many non-climate variables such as forest management policies and firefighting policy.
  • The better more direct metric of possible warming harm is drought, such as the Palmer drought severity index, which shows no trend (click to enlarge below)

 

  • An even better metric, of course, is that there IS an actual upward trend in temperatures.  There is not, however, much of an upward trend in bad weather like drought, hurricanes, or tornadoes.  In this context fire is a third order variable (temp--->drought---> fire) which makes it a bad proxy, particularly when the first order variable is telling the tale.

AAAAaaaand then, there is this chart, much loved by skeptics, for long-term US fire history:

I am pretty sure that I have avoided ever using this piece of skeptic catnip (though I could be wrong, I can have moments of weakness).  The reason is that nothing about this chart passes the smell test.  While it is true that the 1930's were super hot and dry, likely hotter in the US than it has been this decade, there is absolutely no reason to believe the entire period of 1926-1952 were so much higher than today.  Was there a different fire management policy (e.g. did they just let all fires burn themselves out)?  Was there a change in how the data was recorded?

Here is my rule of thumb -- when you see a discontinuity like this (e.g. before and after 1955) you better have a good explanation and understanding of the discontinuity.  This is not just to be a good person and be true to good scientific process (though we all should) but also from the practical and selfish desire to avoid having someone come along who DOES know why the discontinuity exists and embarrass you for your naivete.

I have never trusted this chart, because I have not really understood it.  This week, the Antiplanner (who before he focused on transit focused most of his writing on the Forest Service and forest policy) has an explanation.

The story begins in 1908, when Congress passed the Forest Fires Emergency Funds Act, authorizing the Forest Service to use whatever funds were available from any part of its budget to put out wildfires, with the promise that Congress would reimburse those funds. As far as I know, this is the only time any democratically elected government has given a blank check to any government agency; even in wartime, the Defense Department has to live within a budget set by Congress.

This law was tested just two years later with the Big Burn of 1910, which killed 87 people as it burned 3 million acres in the northern Rocky Mountains. Congress reimbursed the funds the Forest Service spent trying (with little success) to put out the fires, but — more important — a whole generation of Forest Service leaders learned from this fire that all forest fires were bad....

This led to a conflict over the science of fire that is well documented in a 1962 book titled Fire and Water: Scientific Heresy in the Forest Service. Owners of southern pine forests believed that they needed to burn the underbrush in their forests every few years or the brush would build up, creating the fuels for uncontrollable wildfires. But the mulish Forest Service insisted that all fires were bad, so it refused to fund fire protection districts in any state that allowed prescribed burning.

The Forest Service’s stubborn attitude may have come about because most national forests were in the West, where fuel build-up was slower and in many forests didn’t lead to serious wildfire problems. But it was also a public relations problem: after convincing Congress that fire was so threatening that it deserved a blank check, the Forest Service didn’t want to dilute the message by setting fires itself.

When a state refused to ban prescribed fire, the Forest Service responded by counting all fires in that state, prescribed or wild, as wildfires. Many southern landowners believed they needed to burn their forests every four or five years, so perhaps 20 percent of forests would be burned each year, compared with less than 1 percent of forests burned through actual wildfires. Thus, counting the prescribed fires greatly inflated the total number of acres burned.

The Forest Service reluctantly and with little publicity began to reverse its anti-prescribed-fire policy in the late 1930s. After the war, the agency publicly agreed to provide fire funding to states that allowed prescribed burning. As southern states joined the cooperative program one by one, the Forest Service stopped counting prescribed burns in those states as wildfires. This explains the steady decline in acres burned from about 1946 to 1956.

There were some big fires in the West in the 1930s that were not prescribed fires. I’m pretty sure that if someone made a chart like the one shown above for just the eleven contiguous western states, it would still show a lot more acres burned in real wildfires in the 1930s than any decade since — though not by as big a margin as when southern prescribed fires are counted. The above chart should not be used to show that fires were worse in the 1930s than today, however, because it is based on a lie derived from the Forest Service’s long refusal to accept the science behind prescribed burning.

There you go, the discontinuity seems to be from a change in the way the measurement is calculated.

By the way, I work closely with the Forest Service every day and mostly this partnership is rewarding.  But I can tell you that the blank check still exists for fire suppression costs and results in exactly the sort of inefficient spending that you would imagine.   Every summer, much Forest Service work comes to a halt as nearly every manager and professional gets temporarily assigned to fire -- something FS employees love because they get out of the grind of their day job and essentially get to go camping.

Here is a Fun Challenge: Be Skeptical of Statistics Even When They Support Your Point of View

I sometimes wonder if the media and the punditocracy have any ability any more to reality-check statistics.  Two examples:

One

Trump supporters were running around in circles patting themselves on the back with this story:

African American business owners are on the rise. According to the Minority 2018 Small Business Trends survey, the number of black-owned small businesses in the U.S. increased by a staggering 400% in a year-over-year time period from 2017 to 2018.

I call bullsh*t on this.  There is no WAY that the number of black-owned businesses increase by a factor of 5** in just one year.  There are millions of black-owned small businesses in this country and there is no way this quintupled** in a year.   It does not pass any kind of smell test.   It is clearly some sort of measurement error, either a small sample size for a survey or a change in data source and definitions from one year to another.  I could go investigate the study and try to figure out the cause but I do not even need to bother because economic and demographic data simply do not change at this pace in one year.

Two

The other example I have is this absurd figure:

A recent survey conducted by OVW and the Bureau of Justice Statistics found that an average of one in four undergraduate females experience sexual assault by the time they finish college.

Here is the deal with this stat:  no one actually really believes it.  Why do I say this with confidence?  Because parents still send their daughters to college -- in fact they fight and scrap and invest huge amounts of time and money to send their daughter to college.  If they really believed their little darling had a 1 in 4 chance of being sexually assaulted, they would never do so.

Here is a point of comparison:  The Japanese brutal occupation of Nanjing, China is commonly known as the "Rape of Nanjing."  It is called this in part because so many local women were raped.  The numbers are fought over by historians, but the best estimate is that 20,000 of the approximately 100,000 women who were in Nanjing at the time were raped by Japanese soldiers, or about one in five.  This means that if the one in four number is correct, then colleges are more dangerous for women than being in Nanjing during the Japanese occupation.  Now, I would venture to guess that if I tried to stuff you daughter into a time machine and send her back to Nanjing on December 13, 1937 you would probably fight me to the death to prevent it.  But parents don't act anything like this vis a vis going to college, ergo no one believes this figure.  So why does everyone keep using it like it is accurate?

** I had put quadrupled but my son just called and reminded me that a 400% increase means quintupled.  Thanks, Nic.  Though I will say there is a good chance the source incorrectly used 400% to mean quadrupled, so I can't rule that out either.

I Told You We Were Focused on the Wrong Thing

For years I have complained that the opposition to the GWB administration was focused on the wrong things vis a vis the detention policy at Gitmo.  There was too much focus on Gitmo itself as a lightening rod, and too much discussion of whether flushing a Koran down the toilet was torture.  My point was that there didn't have to be torture for it to be wrong to hold non-uniformed suspected non-combatants in a non-declared war indefinitely, as if they were captured Nazi U-boat commanders.   For example:

I believe strongly that the Bush administration's invented concept of unlimited-length detentions without trial or judicial review is obscene and needed to be halted.  But critics of Bush quickly shifted the focus to "torture" at Gitmo, a charge that in light of the facts appears ridiculous to most rational people, including me.  As a result, the administration's desire to hold people indefinitely without due process has been aided by Bush's critics, who have shifted the focus to a subject that is much more easily defended on the facts.

Justice Scalia argued that giving habeas corpus rights to enemy combatants during war time was unprecedented, but I responded:

I don't have enough law background to know if this is truly unprecedented in this way, but what it if is?  One could easily argue that the nature of the "enemy" here, being that they don't have the courtesy to wear uniforms that indicate their combatant status and which side they are on, is fairly unprecedented as well.  As is the President's claim that he has unilateral power to declare that there is a war at all, who this war is against, and who is or is not a combatant.  I know from past posts on this topic that many of my readers disagree with me, but I think it is perfectly fine [that] the Supreme Court, encountering this new situation, sides with the individual over the government.

So now, just as I feared, the soil was fertile for a classic political bait and switch.  Obama agreed to close Gitmo, the lightening rod of the controversy, thereby inspiring us to believe he is changing policyWhen, at its heart, the real problem is still there:

Harvard Law Dean Elena Kagan, President Obama's choice to represent his administration before the Supreme Court, told a key Republican senator Tuesday that she believed the government could hold suspected terrorists without trial as war prisoners.

She echoed comments by Atty. Gen. Eric H. Holder Jr. during his confirmation hearing last month. Both agreed that the United States was at war with Al Qaeda and suggested the law of war allows the government to capture and hold alleged terrorists without charges.

If confirmed as U.S. solicitor general, Kagan, 48, will defend the administration's legal policy in the courts.

I assume she and Holder are toeing the Obama line on this, though they could be the bearers of a trial balloon and it may be Obama has not made up his mind.  I hope so.  Here is some more.

"Do you believe we are at war?" Graham asked.

"I do, Senator," Kagan replied.

Graham cited the example of someone who is not carrying a gun or fighting on a battlefield. "If our intelligence agencies should capture someone in the Philippines that is suspected of financing Al Qaeda worldwide, would you consider that person part of the battlefield?" he asked. He added that he had asked the same question of Holder, who replied that he agreed that person was on the battlefield.

"Do you agree with that?" the senator said.

"I do," Kagan replied.

Graham said that under the law of war, the government can say, "If you're part of the enemy force, there is no requirement to let them go back to the war and kill our troops. Do you agree that makes sense?"

Kagan replied, "I think it makes sense, and I think you're correct that that is the law."

"So America needs to get ready for this proposition that some people are going to be detained as enemy combatants, not criminals," Graham concluded.

I may have missed it, but did the AUMF or whatever it was that Congress passed before we entered Afghanistan and Iraq actually declare we were at war with the organization named "Al Qaeda."  Or does the president saying the words "war on terror" enough times in 8 years just make it so?

In Medias Res

You certainly don't have to spend very long convincing me that a significant government action can be distortive of markets, so I won't argue too much with Kevin Drum that the capital gains tax changes maybe played a contributing factor to the housing bubble  (though it is hilarious that the left considers tax reductions as the only distortive government actions).

However, thinking back on events, its a little hard for me to ascribe the lion's share of the bubble to capital gains tax changes, as opposed to, say, the mortgage interest deduction or Federal Reserve interest rate policies or local zoning controls.

I probably wouldn't have bothered blogging on this, but I found the chart Drum uses from the NY Times to be hilarious:

19tax-graf01-190

Do you see the problem?  I will help by simplifying the chart:

trand

Its a pretty heroic assumption to say that Event B caused Trend A.

Update:  Russel Roberts thinks the Times is right, but that they are using the wrong data to prove it.  1997 looks much more like the critical inflection point if you look at prices rather than sales  (chart via Roberts, from a different NY Times article, click to enlarge)

house_prices