Begin with a data set, preferably one in which many people are interested. Let’s say, World Series results from 1903 to the present.

Now ask a question about the data, one that should be easy to answer with a highly simplified model. Our question will be: have World Series teams, historically, been evenly matched?

Our model will ignore home-field advantage. In baseball the home team wins 53% or 54% of the time; nonetheless, we will assume that each team has a probability of 0.5 of winning each game. This gives the following expected probabilities for a best-of-seven series running four, five, six, or seven games:

P(4) = 0.125
P(5) = 0.250
P(6) = 0.3125
P(7) = 0.3125

Remember that if the model is too simple to fit the data, you can clean the data. Since 1903, the World Series has been played every year but two. There were a few best-of-nine series and a few more that included ties, which are too complicated to deal with. Throw them out. This leaves 95 series. Draw up a little chart comparing actual and expected probabilities, like so:

Possible outcomes P(Expected) P(Actual)
4-0 0.125 0.179
4-1 0.250 0.221
4-2 0.3125 0.242
4-3 0.3125 0.358

Now answer your own question. If the teams were evenly matched, the results would hew reasonably closely to the expected probabilities from the model. In fact there are anomalies. There are always anomalies. The World Series has been swept 17 times, five more than the model would predict. Plug this into the BINOMDIST function in Excel. (Understanding how this function works is optional and may in some cases be a disadvantage.) You find that, if the probabilities in the model were correct, there would be 17 or more sweeps in 95 occurrences only 8% of the time. A rotten break: you’re three lousy percent under statistical significance. But that aside, eleven of those were won by the team with the better regular-season record, several by teams considered among the all-time greats, including the 1927, 1939 and 1998 Yankees. That probably means something. On the other hand, the team that held the American League record for wins before 1998, the 1954 Indians, was swept by the Giants. Conclude judiciously that, on the whole, the data imply an occasional mismatch.

Look for any bonus anomalies. It doesn’t matter if they have nothing to do with your original question. Our data set turns up a nice one; the series went to seven games 34 out of 95 times — five too many, according to the model. This would occur randomly, assuming correct probabilities, only 20% of the time.

Damn, we’ve missed out on statistical significance again. Instead of looking at how often the series went seven, we can look at how often the team behind 3-2 won the sixth game. 34 out of 57, a somewhat more unusual result. Plug it back into BINOMDIST: we’re down to 9%, which is close but not close enough.

It has become inconvenient to look at the entire data set; let’s take just a chunk of it, say, 1945 to 2002. In those 58 years the World Series lasted seven games 27 times, which would happen by chance a mere 1% of the time. Furthermore, the team behind 3-2 won the sixth game 27 of 39 times; again, a 1% chance. Statistical significance at last!

Next, concoct plausible explanations for your new, statistically significant anomaly. Maybe the team that is behind plays harder, with their backs against the wall. Maybe they use all of their best pitchers, holding nothing in reserve for the seventh game. Maybe the team that is ahead chokes and cannot close it out.

Under no circumstances should you test these explanations. In the World Series the team that won Game Six also won Game Seven 18 times out of 34 — not likely if they had squandered their resources to win Game Six. In basketball, in the NBA Finals, the team that led 3-2 won Game Six 26 times out of 45. This is the opposite of what we found in baseball, in a sport that rewards hard play more and is far more conducive to choking, as anyone knows who has tried to shoot a free throw in a big game. In other words, your explanations, though plausible, are false. The result is probably due to random variation. This should not discourage you from completing your article. Write up your doubts in a separate note several months later.

Finally, check the literature to make sure your idea is original. If it isn’t, which is likely, mention your predecessor prominently in your acknowledgements, and include a footnote in which you pick a few nits.

Submit to suitable journals. Repeat unto death, or tenure, whichever comes first.

Update: Actual professional statisticians comment. Evolgen, who may or may not be a professional statistician, comments.

Five years ago, after the 1999 season, a fellow fantasy league baseball owner and I fell into an argument about Roger Clemens. Clemens was 37 years old. In 1998 he had a brilliant season with Toronto, winning the pitching triple crown — ERA, wins, and strikeouts — and his fifth Cy Young Award. In 1999, his first year with the Yankees, he slipped considerably, finishing 14-10 with an ERA higher than league average for the only time since his rookie season. His walks and hits were up, his strikeouts were down, and my friend was sure he was washed. He argued that Clemens had thrown a tremendous number of innings, that old pitchers rarely rebound from a bad season, and that loss of control, in particular, is a sign of decline. I argued that Clemens is a classic power pitcher, a type that tends to hold up very well, that his strikeout ratio was still very high, that his walks weren’t up all that much, and that his diminished effectiveness was largely traceable to giving up more hits, which is mostly luck.

Of course Clemens rebounded vigorously in 2000 and won yet another Cy Young in 2001. He turned out not be finished by a long shot, and still isn’t. Does this mean I won the argument? It does not. Had Clemens hurt his arm in 2000 and retired, would my friend have won the argument? He would not.

Chamberlain wasn’t wrong about “peace in our time” in 1938 because the history books tell us Hitler overran Europe anyway. He was wrong because his judgment of Hitler’s character, based on the available information in 1938, was foolish; because, to put it in probabilistic terms, he assigned a high probability to an event — Hitler settling for Czechloslovakia — that was in reality close to an engineering zero. He would still have been wrong if Hitler had decided to postpone the war for several years or not to fight it at all.

“Time will tell who’s right” is a staple of the barroom pedant. Of course it will do no such thing: time is deaf, blind, and especially, mute. Yet it is given voice on blogs all the time; here’s Richard Bennett in Radley Balko’s comments section: “Regarding the Iraq War, your position was what it was and history will be the judge.” It’s not an especially egregious instance, just one I happened to notice.

Now you can take this too far. If your best-laid predictions consistently fail to materialize, perhaps your analyses are not so shrewd as you think they are. You might just be missing something. Or not. But this should be an opportunity for reflection, not for keeping score.

We fumble in the twilight, arguing about an uncertain future with incomplete knowledge. Arguments over the future are simply differences over what Bayesian probability to assign the event. There is a respectable opposing school, frequentism, which holds that Bayesian probability does not exist, and that it makes no sense to speak of probabilities of unique events; but it has lost ground steadily for the last fifty years, and if it is right then most of us spend a great deal of time talking about nothing at all. Like Lord Keynes, one of the earliest of the Bayesian theorists, we are all Bayesians now.

This, for argument, is good news and bad news. The good news is that history won’t prove your opponent out. The bad news is that it won’t prove you out either. You thrash your differences out now or not at all. Then how do you know who won the argument? You don’t. Argument scores like gymnastics or diving, not football. It will never, for this reason, be a very popular American indoor sport.

Congratulations, to begin with, to all Red Sox and Cubs fans, who burnished their reputations as lovable losers, with their teams both snatching defeat from the jaws of victory in dramatic fashion. There is a lesson for them in the plight of the Rangers fan. For decades New York Rangers fans had to endure the mocking chants of 1940! 1940! — the last time they won the Stanley Cup — until 1994, when they finally won it again, only to relapse almost immediately into the mediocrity in which they are still mired today. Now the Rangers fan has no mocking chants to endure, because no one cares; the Rangers have just become another average team that hasn’t won for a while. If you can’t always win, next best is to always lose, which is a distinction. I suspect that many Red Sox and Cubs fans secretly root for their teams to lose, or better, almost win.

Last night’s Yankees-Sox game was certainly thrilling (note to Floyd McWilliams: I’m not listening), although I took advantage of the break between the top and the bottom of the 11th to take out the trash and consequently missed Aaron Boone’s game-winning home run. But at various points Fox showed two players and several fans with their hands clasped together, as if in supplication. Yes, the big bearded man in the sky apparently concerns himself with whether the Yankees rally against Pedro in the bottom of the 8th. Aristotle had the first word on this subject:

[F]or while thought is held to be the most divine of things observed by us, the question how it must be situated in order to have [divine] character involves difficulties. For if it thinks of nothing, what is there here of dignity. It is just like one who sleeps…what does it think of? Either of itself or of something else; and if of something else, either of the same thing always or something different…Evidently, then, it thinks of that which is most divine and precious, and it does not change; for change would be change for the worse, and this would be already a movement…Therefore it must be of itself that the divine thought thinks (since it is the most excellent of things), and its thinking is a thinking on thinking.

Aristotle, Platonizing, makes God sound rather like Wittgenstein, but you catch his drift. Spinoza is blunter:

For the reason and will which constitute God’s essence must differ by the breadth of all heaven from our reason and will and have nothing in common with them except the name; as little, in fact, as the dog-star has in common with the dog, the barking animal.

And the last from a god, who ought to know, Dr. Manhattan of Watchmen, chastising Veidt for trying to kill him (note to Jim Henley: I am too a comics blogger!):

I’ve walked across the sun. I’ve seen events so tiny and so fast they hardly can be said have occurred at all. But you…you are a man. And this world’s smartest man means no more to me than does its smartest termite.

Surely God, if He can rouse Himself to intervene in human affairs at all, will find beneath His dignity anything less than the World Series.

I have avoided writing about Kobe Bryant until now, and promise to do so forevermore, because I find it hard to understand how anyone, except a deeply interested party like a Laker fan, could possibly have a dog in this fight. In one corner is the superstar modern athlete, the closest thing one finds today to a Roman Emperor, except without the responsibilities or risk of assassination. Tens of thousands cheer him at mass rallies. Children adorn their clothing with his name. (Hey, where’s my “CALIGULA 44” Starter jersey?) Like Nero, he foists his art on an unsuspecting and indifferent public. He devotes his leisure to sexual excesses at which Tiberius would have blushed.

The superstar athlete has been surrounded since early adolescence with sycophants, handlers, agents, and coaches, all imparting the single message that, so long as he performs on the field, everything else will be taken care of. Kobe was playing in the NBA at an age when most of us are staggering home, retching, from our first kegger. As with the emperors, being protected from all of the consequences of one’s decisions is a bad character factory, turning ordinary people into brutes and marginal ones into criminals. A creditable federal cell block could be assembled from the early-90s Dallas Cowboys or the current Portland Trailblazers.

The athlete, like the emperor, is bound by the law mostly in theory. Occasionally some particularly egregious offense draws hard time, but usually his well-paid shysters run rings around the local DA and he winds up getting away with murder, sometimes literally.

To disguise these facts sportswriters engage in ritual character inflation. Mean players are “fiery” or “intense.” Borderline-retarded players are “friendly” and “unpretentious.” Sociopaths are “misunderstood.” Players who have managed not to acquire a police record, like Kobe in his pre-sexual-assault days, are “role models.”

On the other hand, these barely-socialized, easily identifiable, and immensely rich young men are targets wherever they go. At bars yobs pick fights with them and file assault charges. Women throw themselves at them and file paternity suits. In the other corner of the Bryant case we have a 19-year-old girl of, shall we say, dubious judgment, whether one credits the accusations of “basketball groupie” or not. A professional athlete invites her up to his hotel room late one night. Did she think it was for Scrabble? If, let us plausibly suppose, a little voluntary foreplay ensued, is it really sexual assault when she changes her mind? If it’s always a crime when the woman says no and the man does yes, books and movies, just for starters, have an awful lot to answer for.

Of course I have no idea what really happened, and neither do you. But Kobe’s formerly pristine reputation may actually tell against him by making it more difficult for his lawyers to slander his accuser. Even most beauty queens and American Idol contestants understand that a midnight tÃªte-Ã -tÃªte with Mike Tyson is a poor idea. But Kobe — he seems harmless, and he looks so cute in his television ads! Ladies, male professional athletes are testosterone-generating machines of frightening efficiency: proceed at your own risk. That’s not much of a lesson, I grant, but this isn’t much of a morality tale.

(Update: George Wallace comments.)

Niceness counts, your mother used to tell you, and so it does, for you and me. When you are one of the best in the world at what you do, niceness stops counting. I am reminded of this by the sportswriters’ treatment of Barry Bonds.

Barry Bonds is one of the greatest hitters who ever lived, and his unearthly bat speed, unerring plate discipline and perfect balance make him a joy to watch. The pleasure he has given anyone who enjoys baseball, including some sportswriters, can never be repaid. He is also rather surly with the media and disinclined to give interviews. Tough. Nobody cares about how Barry Bonds’ relations with the press except the press, and if they had any respect for greatness they would keep quiet about it.

Babe Ruth, in another era, was celebrated for promising to hit home runs for sick children, although by the authoritative account he was a lout. But really, does anything matter about him except the way he played baseball?

I have quoted Yvor Winters before on the relations between distinguished poets and scholars, but his words serve equally well to describe the relations between great athletes and sportswriters:

To the scholar in question, the poet is wrong-headed and eccentric, and the scholar will usually tell him so. This is bad manners on the part of the scholar, but the scholar considers it good manners. If the poet, after some years of such experiences, loses his temper occasionally, he is immediately convicted of bad manners. The scholar often hates him (I am not exaggerating), or comes close to hating him, but if the poet returns hatred with hatred (and surely this is understandable), he is labeled as a vicious character, for, after all, he is a member of a very small minority group.

David Halberstam, he’s talking to you.

Jacques Barzun, in The House of Intellect, has an anecdote about a distinguished jurist, a member of the Supreme Court, who was profiled in a newspaper article the largest point of which was that the jurist rose early every morning and cooked breakfast for his family. In the forty-odd years since Barzun’s book was published his anecdote has been reprised countless times, almost exactly in the case of Justice Rehnquist, about whom ten people could tell you that he put stripes on his gown and sings Christmas carols for every one who could tell you a thing about his jurisprudence. This is supposed to “humanize” great men. By “humanizing” is meant “making seem more like you and me,” although what is interesting about the great is precisely what makes them unlike the rest of us. These “human” qualities are attractive or unattractive, according to the disposition of the writer: they are always irrelevant. I don’t want to see great men humanized. I want to see them praised, or even damned, for the qualities that make them great. Everything else is pornography.

(Update: Howard Owens comments.)

I just finished Michael Lewis’s terrific book about Billy Beane, the Oakland A’s general manager who consistently fields a great team with one of the lowest payrolls in the major leagues. The A’s are baseball commissioner Bud Selig’s particular albatross. Selig harps on the need for more baseball socialism (“revenue-sharing”) because of the alleged “inability of small market teams to compete,” when in fact it is only incompetently managed small market teams who can’t, Selig’s own Milwaukee Brewers prominent among them. Beane must drive him to drink. Now to anyone who has played fantasy baseball and read Bill James, which seems to be half of the male portion of the blogosphere, how to put together a winning baseball team with little money is no secret. You exploit inefficiencies, which is to say, you take advantage of the fact that many baseball executives are stupid. Certain traits are overvalued by other teams, like sculpted physiques or blazing speed or cannon arms. These don’t translate very well into on-field success anyway, and you ignore them. Other, more useful traits, like a deceptive pitching motion or the ability to draw walks, are undervalued, and these are what you look for.

The golden rule is that past performance indicates future performance, and ugly doesn’t count. Essentially you work from the spreadsheet instead of the scouting report. Scouts hate that. So do fans, stat geeks like me excepted, because it slights any knowledge of the game that comes from actually watching it. When I played in a fantasy league I would regularly tell other owners that they watched too much baseball, and that they needed to stop believing their own eyes. I was delighted to note that Beane often tells his scouts the same thing.

Beane himself is a former major-league player and hot prospect of exactly the type that he has trained himself, and his staff, to ignore. He was a high-school “tools” player, the type who looks better playing than he actually plays, and so highly regarded that many scouts and executives wanted to draft him first in his class, ahead of such future luminaries as Darryl Strawberry. But Beane’s tools never translated into major-league success. By his own account, his temper destroyed him as a player: he couldn’t cope with failure, and one bad at-bat would wreck his game, or his week.

In other words, Beane, instead of hiring in his own image, has become a brilliant success by doing the opposite. If there are other executives who have done this, I don’t know who they are.

(Dr. Manhattan reviews the book at greater length.)

(Update: Floyd McWilliams comments.)

(Update: Robert Birnbaum has an interesting interview with Lewis.)

It requires a certain type of mind to excite itself over “fragments of fragments,” but the normally sober baseball analyst Rob Neyer exults giddily over them in his column the other day.

The question at issue is how lucky the 2002 Detroit Tigers were. On the one hand, they lost 106 games. On the other, if you apply Pythagorean analysis to their run margin, they “should” have lost 112 games. So they were lucky. But on the third hand, as one of Neyer’s correspondents points out, they scored fewer runs than one would expect from their offensive components, and allowed more than would expect from the offensive components of their opponents, and they really should have lost 98 games. So they were unlucky.

But why stop there?

All hits, for example, are not created equal. If two players hit 120 singles, we consider those accomplishments the same. But what if one of the players hit 80 line drives and 40 ground balls with eyes, and the other hit 120 line drives? Would we expect them to match performances the next season?

No, we wouldn’t. We’d expect the guy with 120 line drives to outperform the guy who got lucky with the grounders.

That is just one tiny example, of hundreds we could come up with. And for the people who care about such things, finding the fragments of the fragments of the fragments is the next great frontier.

Ah, fragments of fragments of fragments. Perennial employment for baseball analysts! More work for Rob Neyer!

Neyer analogizes this process to pricing financial derivatives, which I happen to know something about, having worked as a programmer for several years for a software company that did exactly that. On slow afternoons the analytics boys would quarrel over whether to construct the yield curve using a two- or three-factor Heath-Jarrow-Morton model. Sure, with a two-factor model you might be able to price the bond to four decimal points, but with a three-factor model you can price it to seven! Eventually someone, usually me, would have to rain on their parade by pointing out that bonds are priced in sixteenths (of a dollar), and that the bid/offer spread dwarfs anything beyond the first decimal point.

In baseball granularity is not measured in sixteenths, but in wins. Since it takes about eight to ten additional runs for each additional win, any variance below five runs or so is a big, fat engineering zero. And I can assure Rob Neyer without even firing up a spreadsheet that a team’s line drive/ground ball ratio when hitting singles won’t get you anywhere near five runs. It’s barely conceivable that it could help you draft a fantasy team. Knock yourself out.

Hitting has been well understood since John Thorn and Pete Palmer published The Hidden Game of Baseball twenty years ago. All work since has been on the margins. The new frontiers in baseball analysis lie elsewhere. Pitching is still imperfectly understood, because its results are mixed with fielding, which, until Bill James’s new book on Win Shares, was not understood at all. Voros McCracken (where do you sign up for a name like that?) recently demonstrated that a pitcher’s hits allowed, relative to balls in play, is almost entirely random. That’s serious work. Fragments of fragments is masturbation.

The lesson here, which applies more broadly to the social sciences, is not to seek more precision than is proper to your subject. Fortunately Professors Mises and Hayek have already given this lecture, and I don’t have to.

(Update: Craig Henry comments.)

It’s been a while since I’ve thrown a sop to my baseball-oriented readers and the season is under way, so I’m gonna make it up to you with a new statistic, because the one thing baseball suffers from is not enough statistics.

I was trying to explain the game to an Icelandic friend of mine the other day. What’s with guys charging the mound? he wanted to know. (This from a hockey fan.) Well, they get upset when pitchers throw at them, I said. So why do the pitchers throw at them? he asked. To instill fear, I said. It’s a lot harder to hit when you’re worrying that the next pitch might come at your head. Don’t pitchers get thrown out for doing that? he asked. Yes and no, I explained. It’s complicated. He asks, can’t they at least keep track of the pitchers who do it all the time and punish them later? Why yes, I mused. Yes they can. And then and there I conceived the VI, or Viciousness Index.

VI relies on the premise that a pitcher’s true wildness can be roughly judged by the number of walks he allows. The fewer he allows, the better idea he has of where the ball is going most of the time. So if he allows very few walks and still hits a lot of batters, the way Pedro Martinez does, one can assume that it’s not entirely or even mostly by accident. Therefore VI = HBP/BB. I submit this will prove an excellent index to pitcher viciousness.

I’d like to oblige you with some actual numbers, but HBP pitcher data turns out to be scarce. It’s not in the Lahman database, Baseball Reference doesn’t have it, and that means I don’t have it either. In lieu of numbers, I offer two hypotheses. First, pitchers with headhunting reputations, like Bob Gibson and Don Drysdale, will have high VIs. Second, the VI leaders, seasonally and career, will be a better set of pitchers than the VI trailers. (This is of course largely because the trailers walk more hitters. A stronger version is that if you match pitchers with similar walk/inning ratios, the ones with the higher VIs will tend to be better.) If somebody out there has HBP data for pitchers and wants to share it with me so I can confirm or deny, I pledge that I will not only publish the lifetime and 2002 leaders for the Viciousness Index, but I will add the data to my pitching search engine. Now is that a deal or what?