Assuming that two different datasets were drawn from identical distributions, what is the likelihood that the results are as different as they are. usually you set a significance level (5% is common) where if the results were less than 5% likely, you reject the Null hypothesis that they are actually from the same distribution.

(Note: the chi-squared test takes advantage of the fact the binomial distribution is approximately normal when the true probability p is not too close to 0, or the sample size is large enough. These hypothesis are easily satisfied here.)

How does that work here?

See: http://en.wikipedia.org/wiki/Pearson's_chi-square_test

But here is the calculation:

Let Ei=expected number of 4-i series=Probability given by aaron*95

Let Ai=actual number of 4-i series=Probability given by aaron*95

Form the sum{ (Ai-E-)^2/Ei }

In this case this equals 4.68

This is a chi-squared variable with 3 degrees of freedom (there are 4 possible outcomes but if you know the number for 3 of them, the 4’th is determined by subtraction from 95).

For a chi-squared variable with 3 degrees of freedom, the cutoff for p=0.05 significance is 7.82.

Thus we cannot reject the Null hypothesis that the distribution of results is actually generated by independent coin flips.

]]>Thanks muchly for your sidebar link! (I didn’t know you were even aware of my existence.) Trivial note: the first link, under my name, is broken.

Thanks again.

]]>Jean-Luc pointed me to Anomaly Hunt; or, How To Write a Research Paper. This brings me to the vague topic of what is interesting. They say that you haven’t understood a concept until you have been able to explain it…

]]>I believe there is more randomness in baseball results than in basketball because pitchers have such a huge influence on the outcome. It’s common to see matchups of starting pitchers where the team that is inferior overall has a big advantage in a single game due to sending, say, its #1 starter out against the better team’s #4 starter.

]]>Albatross: Unfortunately you can usually find “significant” regressions in even completely random data sets. Fortunately there are more rigorous tests that can help to weed out spurious ones. Econometricians run into this problem all the time.

DavidB: At least 50, though not in the sense you mean.

]]>The funny thing is that going all out in a baseball game just isn’t that hard, except for the pitchers. You’d think baseball players wouldn’t give up, but it looks like they sometimes do.

Still, when you work through the history of a sweep, you can see why the losers might pack it in.

In the past, when teams had four man rotations in the regular series, they’d use their three best pitchers in the Series (there are off days after Games 2 and 5). If they all won, that could be depressing to the team that was down.

For example, in the 1963 World Series, the mighty Yankees lost to Sandy Koufax in the first game 5-2 in Yankee Stadium, with Koufax striking out 15, then lost to Johnny Podres in the second 4-1. Then they went to Dodger Stadium, and Don Drysdale beat them 1-0.

So, now the Yankees are down 3-0 on the road, the Dodgers are giving up 1.3 runs per game, and the opposing pitcher in Game 4 is, oh crap, Sandy Koufax again, who went 25-5 during the season. And if they manage to beat Koufax, then they’ve got to beat Podres in Game 5, who had 5 shutouts during the season, and then beat Drysdale in Game 6, who had won 25 the year before.

And, then, even if they somehow won three straight, they’d still have to to beat Koufax again in Game 7. Not surprisingly, they lost Game 4 2-1 and were swept.

So you can see how teams down 3-0 would get depressed.

Nowadays, with 5 man rotations, a team winning 3-0 is likely to send their number 4 starter out for the 4th game (assuming both teams won the LCS quickly), while the desperate trailing team might send their ace out on 3 days rest, so the immediate situation isn’t so dire, but the long term situation is even worse, because your pitchers will all be on short rest for the rest of the series, unless it rains.

]]>