"Abnormal" Events -- Droughts and Perfect Games

Most folks, and I would include myself in this, have terrible intuitions about probabilities and in particular the frequency and patterns of occurance in the tail ends of the normal distribution, what we might call "abnormal" events.  This strikes me as a particularly relevant topic as the severity of the current drought and high temperatures in the US is being used as absolute evidence of catastrophic global warming.

I am not going to get into the global warming bits in this post (though a longer post is coming).  Suffice it to say that if it is hard to accurately directly measure shifts in the mean of climate patterns given all the natural variability and noise in the weather system, it is virtually impossible to infer shifts in the mean from individual occurances of unusual events.  Events in the tails of the normal distribution are infrequent, but not impossible or even unexpected over enough samples.

What got me to thinking about this was the third perfect game pitched this year in the MLB.  Until this year, only 20 perfect games had been pitched in over 130 years of history, meaning that one is expected every 7 years or so  (we would actually expect them more frequently today given that there are more teams and more games, but even correcting for this we might have an expected value of one every 3-4 years).  Yet three perfect games happened, without any evidence or even any theoretical basis for arguing that the mean is somehow shifting.  In rigorous statistical parlance, sometimes shit happens.  Were baseball more of a political issue, I have no doubt that writers from Paul Krugman on down would be writing about how three perfect games this year is such an unlikely statistical fluke that it can't be natural, and must have been caused by [fill in behavior of which author disapproves].  If only the Republican Congress had passed the second stimulus, we wouldn't be faced with all these perfect games....

Postscript:  We like to think that perfect games are the ultimate measure of a great pitcher.  This is half right.  In fact, we should expect entirely average pitchers to get perfect games every so often.  A perfect game is when the pitcher faces 27 hitters and none of them get on base.  So let's take the average hitter facing the average pitcher.  The league average on base percentage this year is about .320 or 32%.  This means that for each average batter, there is a 68% chance for the average pitcher in any given at bat to keep the batter off the base.  All the average pitcher has to do is roll these dice correctly 27 times in a row.

The odds against that are .68^27 or about one in 33,000.  But this means that once in every 33,000 pitcher starts  (there are two pitcher starts per game played in the MLB), the average pitcher should get a perfect game.  Since there are about 4,860 regular season starts per year (30 teams x 162 games) then average pitcher should get a perfect game every 7 years or so.  Through history, there have been about 364,000 starts in the MLB, so this would point to about 11 perfect games by average pitchers.  About half the actual total.

Now, there is a powerful statistical argument for demonstrating that great pitchers should be over-weighted in perfect games stats:  the probabilities are VERY sensitive to small changes in on-base percentage.  Let's assume a really good pitcher has an on-base percentage against him that is 30 points less than the league average, and a bad pitcher has one 30 points worse.   The better pitcher would then expect a perfect game every 10,000 starts, while the worse pitcher would expect a perfect game every 113,000 starts.  I can't find the stats on individual pitchers, but my guess is the spread between best and worst pitchers on on-base percentage against has more than a 60 point spread, since the team batting average against stats (not individual but team averages, which should be less variable) have a 60 point spread from best to worst. [update:  a reader points to this, which says there is actually a 125-point spread from best to worst.  That is a different in expected perfect games from one in 2,000 for Jared Weaver to one in 300,000 for Derek Lowe.  Thanks Jonathan]

Update:  There have been 278 no-hitters in MLB history, or 12 times the number of perfect games.  The odds of getting through 27 batters based on a .320 on-base percentage is one in 33,000.  The odds of getting through the same batters based on a .255 batting average (which is hits but not other ways on base, exactly parallel with the definition of no-hitter) the odds are just one in 2,830.  The difference between these odds is a ratio of 11.7 to one, nearly perfectly explaining the ratio of no-hitters to perfect games on pure stochastics.

15 Comments
Inline Feedbacks
View all comments

Everything you need to know about the counterintuitiveness of probability:

It is very, very, VERY unlikely that a person playing the lottery will win a multi-state megajackpot. Yet it is quite rare for more than a month (16 drawings, give or take one or two based on the precise length of the month in question, and where the drawing days fall in it) to go by without it happening to somebody or other.

2430 games per year as it still takes two to tango on the diamond.

Coyote: Yet three perfect games happened, without any evidence or even any theoretical basis for arguing that the mean is somehow shifting.

It's called a Poisson Distribution. Assuming perfect games are randomly distributed, then if the average rate of perfect games is 1/3 per year, then the probability of at least three perfect games in a single year is about 0.48%. Rare, possibly due to chance, but more likely that the assumption of random distribution is false. If you look at the distribution of perfect games, you will see that they lean towards the present.
http://mlb.mlb.com/mlb/history/rare_feats/index.jsp?feature=perfect_game

I have no doubt that writers from Paul Krugman on down would be writing about how three perfect games this year is such an unlikely statistical fluke that it can't be natural, and must have been caused by [fill in behavior of which author disapproves].

Not to defend Krugman, but the blank might be:

The end of the steroid age. However, that wouldn't explain why we didn't see more perfect games BEFORE the steroid age.

Warren:

Our intuition about probabilities makes sense in the context in which humans evolved. Back in the stone ages, our observations were pretty much limited to where we were physically. If you take just the events that you personally experience, that is a pretty decent predictor of future events that will happen to you personally.

Once we start taking into account events that happen to other people far away, our intuition breaks down. I suspect this is because we equate things that we hear about with things that happen to us (maybe discounting for credibility). When we were cavemen, this still makes some sense. We would hear about really rare events from far away, and about things that happened to grandpa (who, by the way, would probably be dead, given a life expectance in the late 20's). So that small additional data would probably not make much difference in our perception of statistically rare events. Now we hear about them all the time.

Max

@Zachriel. Unless you've actually run a statistical significance test, you can't draw that conclusion. The chance of us seeing 1 season with 3 perfect games in the history of MLB is roughly 50-50 given the straight probabilities. It's notoriously difficult to identify a statistically significant regime change when you're talking about extremely low probability events.

It's been a while since graduate school, but my recollection is that you'd actually want to model the underlying stochastic process by looking at OBP and BA. Then do a trend analysis on these component variables to see if there's a significant trend.

What are the odds that you would have 3 no-hitters in the same season all in one stadium, Safeco Field, in Seattle? 2 perfect games and 1 combined no-hitter this year.
Anybody want to tackle that one?

Wait! I thought the tornadoes and hurricanes indicated AGW. But this July was one of the lowest for tornadoes in history and so far hurricanes are doing all that great. But the drought is due to a La Nina weather pattern. This is cyclical and we are in one again. We got droughts with all the rest of them with the worst in 1934. The good news is La Nina weather patterns don't last long. Well it could continue next year no one knows but in general they last a year to three years. I can't predict when the La Nina will end but I can predict that when it does and our weather turns rainy from an El Nino weather cycle the Warmies will say "see all this rain is a result of man caused global warming and if you just allow us to tax you to death, cede all your rights and give us your first born child we will do magic and stop the end of the world." What I want to know is when will the high priests of the Warmie religion begin wearing those cool robes and make incantations and cool stuff like that???

Point 1: Clustering is a well-known statistical phenomenon. For example, if you flip a coin 1000 times, there is a 50:50 chance of getting 4 sets of 8 consecutive heads or tails and 1 set of 10 consecutive heads or tails. Low probability events, such as perfect baseball games, also can cluster.

Point 2: You cannot use random chance statistics to describe athletic competitions. An average pitcher would have to pitch far more than 33,000 games to have a 50:50 chance of a perfect game. A superb pitcher may only have to pitch 330 games to have a 50:50 chance of a perfect game. A mediocre pitcher will never pitch a perfect game.

Kevin Dick: The chance of us seeing 1 season with 3 perfect games in the history of MLB is roughly 50-50 given the straight probabilities.

You're right. We only calculated for it appearing in a given year.

Is it possible for a rain-shortened game to count as a perfect game? That could theoretically only require 15 at-bats....

Gloobnib,

At one point rain shortened (or 8 innings in the case of a losing pitcher on the road) games were considered no hitters MLB, but they made a change to the definition in the early 90s, wiping out Harvey Haddix's 12 inning masterpiece against Milwaukee and depriving me of my in person rain-shortened no-hitter (Pascual Perez against the Phillies) in late 1987.

Max is on to something. When I was a kid, the national news on TV was all about Washington DC, with only the rare occasional story of CA, or midwest, now with all sorts of media outlets and global monitoring we hear about rare events from all over the world and internalize them as common.

A common example. In any year there is going to be drought somewhere in the northern Hemisphere, in my yout' I would never hear about it, unless it was severe or local. Now I hear there is a drought in the Midwest, Last year there was a drought in Russia, and the year before Europe, and hearing about drought all the time, without context, when I didn't hear about it at all in my yout' 'cept for California cases, makes me think that drought is much more common today. When the reality is that I am just hearing about it more because News gathering and distribution has become more efficient.

It is the same with crime. I used to hear about local crime, now I hear about any unusual crime event in the whole country. 30 years ago, I wouldn't have even heard of George Zimmerman - that would be local. So now because I hear about the interesting crime from all over the country rather than just my local burg, I assume there is a lot more crime (even though statistically there is much less!)

I think most folks do not have the capacity to deal with the problems of the world. We are more geared to dealing with the problems of just our own cave, and maybe the one next door.

MingoV: You could argue that several perfect games have been thrown by mediocre pitchers. See: Dallas Braden and Phillip Humber.

The analysis in the original post didn't account for errors, which aren't included in a batter's OBP. In other words, some of that 68% of the time that a batter doesn't reach base (according to an OBP of .320), he actually reaches on an error, thus negating a perfect game.

Per Wikipedia (http://en.wikipedia.org/wiki/Perfect_game#No-hit.2C_no-walk.2C_no.E2.80.93hit_batsman_games) there have been 8 no-hit, no walk, no hit-batsmen games. There may have been more that would have been perfect games if not for an error (e.g., an error facing the 27th batter followed by a hit).