<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Anomaly Hunt; or, How To Write a Research Paper</title>
	<atom:link href="http://www.godofthemachine.com/?feed=rss2&#038;p=603" rel="self" type="application/rss+xml" />
	<link>http://www.godofthemachine.com/?p=603</link>
	<description>Culling my readers to a manageable elite since 2002.</description>
	<lastBuildDate>Tue, 17 Aug 2010 10:28:15 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Josh Sher</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-80554</link>
		<dc:creator>Josh Sher</dc:creator>
		<pubDate>Thu, 30 Aug 2007 06:40:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-80554</guid>
		<description>Well the usual test used for this type of problem is the Chi-squared test. The Chi-Squared test asks:

Assuming that two different datasets were drawn from identical distributions, what is the likelihood that the results are as different as they are. usually you set a significance level (5% is common) where if the results were less than 5% likely, you reject the Null hypothesis that they are actually from the same distribution.

(Note: the chi-squared test takes advantage of the fact the binomial distribution is approximately normal when the true probability p is not too close to 0, or the sample size is large enough. These hypothesis are easily satisfied here.)

How does that work here?

See: http://en.wikipedia.org/wiki/Pearson&#039;s_chi-square_test

But here is the calculation:
Let Ei=expected number of 4-i series=Probability given by aaron*95
Let Ai=actual number of 4-i series=Probability given by aaron*95

Form the sum{ (Ai-E-)^2/Ei }

In this case this equals 4.68

This is a chi-squared variable with 3 degrees of freedom (there are 4 possible outcomes but if you know the number for 3 of them, the 4&#039;th is determined by subtraction from 95).

For a chi-squared variable with 3 degrees of freedom, the cutoff for p=0.05 significance is 7.82.

Thus we cannot reject the Null hypothesis that the distribution of results is actually generated by independent coin flips.</description>
		<content:encoded><![CDATA[<p>Well the usual test used for this type of problem is the Chi-squared test. The Chi-Squared test asks:</p>
<p>Assuming that two different datasets were drawn from identical distributions, what is the likelihood that the results are as different as they are. usually you set a significance level (5% is common) where if the results were less than 5% likely, you reject the Null hypothesis that they are actually from the same distribution.</p>
<p>(Note: the chi-squared test takes advantage of the fact the binomial distribution is approximately normal when the true probability p is not too close to 0, or the sample size is large enough. These hypothesis are easily satisfied here.)</p>
<p>How does that work here?</p>
<p>See: <a href="http://en.wikipedia.org/wiki/Pearson" rel="nofollow">http://en.wikipedia.org/wiki/Pearson</a>&#8217;s_chi-square_test</p>
<p>But here is the calculation:<br />
Let Ei=expected number of 4-i series=Probability given by aaron*95<br />
Let Ai=actual number of 4-i series=Probability given by aaron*95</p>
<p>Form the sum{ (Ai-E-)^2/Ei }</p>
<p>In this case this equals 4.68</p>
<p>This is a chi-squared variable with 3 degrees of freedom (there are 4 possible outcomes but if you know the number for 3 of them, the 4&#8242;th is determined by subtraction from 95).</p>
<p>For a chi-squared variable with 3 degrees of freedom, the cutoff for p=0.05 significance is 7.82.</p>
<p>Thus we cannot reject the Null hypothesis that the distribution of results is actually generated by independent coin flips.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gary Farber</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-20737</link>
		<dc:creator>Gary Farber</dc:creator>
		<pubDate>Tue, 16 Jan 2007 20:06:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-20737</guid>
		<description>Humble apologies for being off-topic; I&#039;d prefer to e-mail, but don&#039;t see an e-mail address.

Thanks muchly for your sidebar link!  (I didn&#039;t know you were even aware of my existence.)  Trivial note: the first link, under my name, is broken.

Thanks again.</description>
		<content:encoded><![CDATA[<p>Humble apologies for being off-topic; I&#8217;d prefer to e-mail, but don&#8217;t see an e-mail address.</p>
<p>Thanks muchly for your sidebar link!  (I didn&#8217;t know you were even aware of my existence.)  Trivial note: the first link, under my name, is broken.</p>
<p>Thanks again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Statistical Modeling, Causal Inference, and Social Science</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19584</link>
		<dc:creator>Statistical Modeling, Causal Inference, and Social Science</dc:creator>
		<pubDate>Wed, 10 Jan 2007 19:15:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19584</guid>
		<description>&lt;strong&gt;Theories of information and interestingness&lt;/strong&gt;

Jean-Luc pointed me to Anomaly Hunt; or, How To Write a Research Paper. This brings me to the vague topic of what is interesting. They say that you haven&#039;t understood a concept until you have been able to explain it...</description>
		<content:encoded><![CDATA[<p><strong>Theories of information and interestingness</strong></p>
<p>Jean-Luc pointed me to Anomaly Hunt; or, How To Write a Research Paper. This brings me to the vague topic of what is interesting. They say that you haven&#8217;t understood a concept until you have been able to explain it&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Sailer</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19475</link>
		<dc:creator>Steve Sailer</dc:creator>
		<pubDate>Tue, 09 Jan 2007 22:59:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19475</guid>
		<description>&lt;p&gt;No NBA team has ever come back from being down 3-0 at any playoff level. Two NHL teams have. &lt;/p&gt;
&lt;p&gt;I believe there is more randomness in baseball results than in basketball because pitchers have such a huge influence on the outcome. It&#039;s common to see matchups of starting pitchers where the team that is inferior overall has a big advantage in a single game due to sending, say, its #1 starter out against the better team&#039;s #4 starter.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>No NBA team has ever come back from being down 3-0 at any playoff level. Two NHL teams have. </p>
<p>I believe there is more randomness in baseball results than in basketball because pitchers have such a huge influence on the outcome. It&#8217;s common to see matchups of starting pitchers where the team that is inferior overall has a big advantage in a single game due to sending, say, its #1 starter out against the better team&#8217;s #4 starter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bill Kaplan</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19437</link>
		<dc:creator>Bill Kaplan</dc:creator>
		<pubDate>Tue, 09 Jan 2007 16:42:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19437</guid>
		<description>This is the excellent foppery of the world of baseball, that, when a team is sick in fortune -- often the surfeit of its own behavior -- it makes guilty of its disasters its prior disasters; and, despite they be champions, to lay its present circumstances on those immediately before.</description>
		<content:encoded><![CDATA[<p>This is the excellent foppery of the world of baseball, that, when a team is sick in fortune &#8212; often the surfeit of its own behavior &#8212; it makes guilty of its disasters its prior disasters; and, despite they be champions, to lay its present circumstances on those immediately before.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aaron Haspel</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19435</link>
		<dc:creator>Aaron Haspel</dc:creator>
		<pubDate>Tue, 09 Jan 2007 15:56:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19435</guid>
		<description>Steve: Before we buy into this theory of mailing it in, we should probably check it against other sports, like basketball. In the NBA (and ABA) finals teams down 3-0 have come back to win the next game 6 out of 13, approximately what you&#039;d expect. That&#039;s all I can be bothered to check, but I&#039;ll be less willing to credit the baseball results if they can&#039;t be reproduced in other sports. 

Albatross: Unfortunately you can usually find &quot;significant&quot; regressions in even completely random data sets. Fortunately there are &lt;a href=&quot;http://www.economics.unimelb.edu.au/rdixon/206/Dickey-Fuller.pdf&quot; rel=&quot;nofollow&quot;&gt;more rigorous tests&lt;/a&gt; that can help to weed out spurious ones. Econometricians run into this problem all the time.

DavidB: &lt;a href=&quot;http://www.baseball-reference.com/bio/&quot; rel=&quot;nofollow&quot;&gt;At least 50&lt;/a&gt;, though not in the sense you mean.</description>
		<content:encoded><![CDATA[<p>Steve: Before we buy into this theory of mailing it in, we should probably check it against other sports, like basketball. In the NBA (and ABA) finals teams down 3-0 have come back to win the next game 6 out of 13, approximately what you&#8217;d expect. That&#8217;s all I can be bothered to check, but I&#8217;ll be less willing to credit the baseball results if they can&#8217;t be reproduced in other sports. </p>
<p>Albatross: Unfortunately you can usually find &#8220;significant&#8221; regressions in even completely random data sets. Fortunately there are <a href="http://www.economics.unimelb.edu.au/rdixon/206/Dickey-Fuller.pdf" rel="nofollow">more rigorous tests</a> that can help to weed out spurious ones. Econometricians run into this problem all the time.</p>
<p>DavidB: <a href="http://www.baseball-reference.com/bio/" rel="nofollow">At least 50</a>, though not in the sense you mean.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: albatross</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19430</link>
		<dc:creator>albatross</dc:creator>
		<pubDate>Tue, 09 Jan 2007 14:43:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19430</guid>
		<description>So, what we need is a formula for setting the required significance level based on how long the researcher can afford to sift through the data, looking for an anomaly, and how many models he can test per unit of time?  Should the review reject the paper if he can produce an equally significant observation from the data with no apparent meaning or theoretical significance?</description>
		<content:encoded><![CDATA[<p>So, what we need is a formula for setting the required significance level based on how long the researcher can afford to sift through the data, looking for an anomaly, and how many models he can test per unit of time?  Should the review reject the paper if he can produce an equally significant observation from the data with no apparent meaning or theoretical significance?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David B</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19397</link>
		<dc:creator>David B</dc:creator>
		<pubDate>Tue, 09 Jan 2007 09:20:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19397</guid>
		<description>Why is it called the World Series?  How many countries participate?</description>
		<content:encoded><![CDATA[<p>Why is it called the World Series?  How many countries participate?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Sailer</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19395</link>
		<dc:creator>Steve Sailer</dc:creator>
		<pubDate>Tue, 09 Jan 2007 08:43:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19395</guid>
		<description>&lt;p&gt;It&#039;s not hugely uncommon for a team to lose the first two on the road, then come home, win game 3, and go on to take the series in seven or even six games. But losing game 3 at home seems to be a psychological death blow. It will be interesting over the next several decades to see if the Red Sox rally from down 3-0 in 2004 will change that psychology. &lt;/p&gt;
&lt;p&gt;The funny thing is that going all out in a baseball game just isn&#039;t that hard, except for the pitchers. You&#039;d think baseball players wouldn&#039;t give up, but it looks like they sometimes do.&lt;/p&gt;
&lt;p&gt;Still, when you work through the history of a sweep, you can see why the losers might pack it in. &lt;/p&gt;
&lt;p&gt;In the past, when teams had four man rotations in the regular series, they&#039;d use their three best pitchers in the Series (there are off days after Games 2 and 5). If they all won, that could be depressing to the team that was down.&lt;/p&gt;
&lt;p&gt;For example, in the 1963 World Series, the mighty Yankees lost to Sandy Koufax in the first game 5-2 in Yankee Stadium, with Koufax striking out 15, then lost to Johnny Podres in the second 4-1. Then they went to Dodger Stadium, and Don Drysdale beat them 1-0. &lt;/p&gt;
&lt;p&gt;So, now the Yankees are down 3-0 on the road, the Dodgers are giving up 1.3 runs per game, and the opposing pitcher  in Game 4 is, oh crap, Sandy Koufax again, who went 25-5 during the season. And if they manage to beat Koufax, then they&#039;ve got to beat Podres in Game 5, who had 5 shutouts during the season, and then beat Drysdale in Game 6, who had won 25 the year before. &lt;/p&gt;
&lt;p&gt;And, then, even if they somehow won three straight, they&#039;d still have to to beat Koufax again in Game 7. Not surprisingly, they lost Game 4 2-1 and were swept.&lt;/p&gt;
&lt;p&gt;So you can see how teams down 3-0 would get depressed.&lt;/p&gt;
&lt;p&gt;Nowadays, with 5 man rotations, a team winning 3-0 is likely to send their number 4 starter out for the 4th game (assuming both teams won the LCS quickly), while the desperate trailing team might send their ace out on 3 days rest, so the immediate situation isn&#039;t so dire, but the long term situation is even worse, because your pitchers will all be on short rest for the rest of the series, unless it rains.
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>It&#8217;s not hugely uncommon for a team to lose the first two on the road, then come home, win game 3, and go on to take the series in seven or even six games. But losing game 3 at home seems to be a psychological death blow. It will be interesting over the next several decades to see if the Red Sox rally from down 3-0 in 2004 will change that psychology. </p>
<p>The funny thing is that going all out in a baseball game just isn&#8217;t that hard, except for the pitchers. You&#8217;d think baseball players wouldn&#8217;t give up, but it looks like they sometimes do.</p>
<p>Still, when you work through the history of a sweep, you can see why the losers might pack it in. </p>
<p>In the past, when teams had four man rotations in the regular series, they&#8217;d use their three best pitchers in the Series (there are off days after Games 2 and 5). If they all won, that could be depressing to the team that was down.</p>
<p>For example, in the 1963 World Series, the mighty Yankees lost to Sandy Koufax in the first game 5-2 in Yankee Stadium, with Koufax striking out 15, then lost to Johnny Podres in the second 4-1. Then they went to Dodger Stadium, and Don Drysdale beat them 1-0. </p>
<p>So, now the Yankees are down 3-0 on the road, the Dodgers are giving up 1.3 runs per game, and the opposing pitcher  in Game 4 is, oh crap, Sandy Koufax again, who went 25-5 during the season. And if they manage to beat Koufax, then they&#8217;ve got to beat Podres in Game 5, who had 5 shutouts during the season, and then beat Drysdale in Game 6, who had won 25 the year before. </p>
<p>And, then, even if they somehow won three straight, they&#8217;d still have to to beat Koufax again in Game 7. Not surprisingly, they lost Game 4 2-1 and were swept.</p>
<p>So you can see how teams down 3-0 would get depressed.</p>
<p>Nowadays, with 5 man rotations, a team winning 3-0 is likely to send their number 4 starter out for the 4th game (assuming both teams won the LCS quickly), while the desperate trailing team might send their ace out on 3 days rest, so the immediate situation isn&#8217;t so dire, but the long term situation is even worse, because your pitchers will all be on short rest for the rest of the series, unless it rains.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aaron Haspel</title>
		<link>http://www.godofthemachine.com/?p=603&#038;cpage=1#comment-19163</link>
		<dc:creator>Aaron Haspel</dc:creator>
		<pubDate>Sun, 07 Jan 2007 05:26:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.godofthemachine.com/archives/00000602.html#comment-19163</guid>
		<description>&lt;p&gt;In the simple model a team could be expected to rally from a 3-0 deficit once in 16 tries. A team has fallen behind 3-0 in games 20 times. It has won the fourth game three times, and never the fifth, let alone the sixth or seventh. &lt;/p&gt;
&lt;p&gt;If the simple model were true — which it isn&#039;t quite — the team leading 3-0 should win Game 4 half the time. The actual result of 17 out of 20 would occur by chance approximately 0.6% of the time. (The chance that not a single team would reach Game 6 in 20 tries is even lower, 0.3%.) Yeah, I&#039;d call that unreasonably low. &lt;/p&gt;
&lt;p&gt;There are at least two plausible explanations. One, as Steve suggests, is that teams throw in the towel. Another is that the team down 3-0 is simply overmatched. If we use a 0.4 probability of winning for 0.5, the chance of the actual result rises to 5.1%, or just outside of statistical significance. But 0.4 is pretty low for a team that won a league championship. I&#039;m inclined to think that Steve&#039;s explanation is probably true. Can anyone think of a better one?
&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>In the simple model a team could be expected to rally from a 3-0 deficit once in 16 tries. A team has fallen behind 3-0 in games 20 times. It has won the fourth game three times, and never the fifth, let alone the sixth or seventh. </p>
<p>If the simple model were true — which it isn&#8217;t quite — the team leading 3-0 should win Game 4 half the time. The actual result of 17 out of 20 would occur by chance approximately 0.6% of the time. (The chance that not a single team would reach Game 6 in 20 tries is even lower, 0.3%.) Yeah, I&#8217;d call that unreasonably low. </p>
<p>There are at least two plausible explanations. One, as Steve suggests, is that teams throw in the towel. Another is that the team down 3-0 is simply overmatched. If we use a 0.4 probability of winning for 0.5, the chance of the actual result rises to 5.1%, or just outside of statistical significance. But 0.4 is pretty low for a team that won a league championship. I&#8217;m inclined to think that Steve&#8217;s explanation is probably true. Can anyone think of a better one?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
