FanPost

Why Win Projections Fail

Inevitably, when one predicts or projects what will happen in an upcoming baseball season, a lot of those projections will miss the mark, some by impressive margins. There's a common belief that the navel-gazing professional baseball analysts, a group in which I ended up after a series of improbable events, secretly gnash their teeth when a team drastically overperforms or underperforms expectations, ever additional win or loss an additional twist of the knife in the abdomen.

In fact, anyone who analyzes baseball for a living knows (whether or not they know it) that even the best and brightest will be wrong. A lot. And nothing shows more potential for missing hilariously than projecting the number of wins a team will win over the course of 162 games.

But how accurate should we be at making these projections? There's a simple thought exercise I like to present to indicate why we're so infallible at being fallible.

Imagine a league in which every team is a .500 team. Not a guess, based on complicated analysis of the individual making up the team, but an absolute certainty, brought down from the heavens by a white bearded-and-robed diety and written impregnably a stone flash drive. And that these teams will continue to have a 50/50 shot in every game they play, all 2430 games played by 30 teams over the course of a baseball season.

Naturally, if you were projecting such a league, your best guess at a projection would be a 30-way tie for 81-81, giving that you know for a fact, buttressed by an omnipotent deity, that all 30 teams are equal. So, how would your projections turn out in the end?

I've attached a table of probabilities for the 30 teams on how many wins an individual team among that group would win (I left out 0 wins from the table accidentally). Remember, this is assuming perfect, infallible, unchanging knowledge of the odds of each team's abilities. This is the product of a hypergeometric distribution. Binomial distributions tend to be used for this kind of thing, but they don't quite work in this situation, because wins in baseball are a zero-sum game. Binomial would leave open the possibility that, say, the average team ends up with 82 or 80 wins, which we know isn't the case.

 
Wins	Probability	1 in	Cumulative	   
1	0.00%	5.41E+47	0%	   
2	0.00%	6.28E+45	0%	   
3	0.00%	1.10E+44	0%	   
4	0.00%	2.59E+42	0%	   
5	0.00%	7.69E+40	0%	   
6	0.00%	2.76E+39	0%	   
7	0.00%	1.16E+38	0%	   
8	0.00%	5.63E+36	0%	   
9	0.00%	3.09E+35	0%	   
10	0.00%	1.90E+34	0%	   
11	0.00%	1.30E+33	0%	   
12	0.00%	9.71E+31	0%	   
13	0.00%	7.94E+30	0%	   
14	0.00%	7.04E+29	0%	   
15	0.00%	6.74E+28	0%	   
16	0.00%	6.94E+27	0%	   
17	0.00%	7.65E+26	0%	   
18	0.00%	9.00E+25	0%	   
19	0.00%	1.13E+25	0%	   
20	0.00%	1.49E+24	0%	   
21	0.00%	2.10E+23	0%	   
22	0.00%	3.11E+22	0%	   
23	0.00%	4.86E+21	0%	   
24	0.00%	8.00E+20	0%	   
25	0.00%	1.38E+20	0%	   
26	0.00%	2.50E+19	0%	   
27	0.00%	4.74E+18	0%	   
28	0.00%	9.39E+17	0%	   
29	0.00%	1.94E+17	0%	   
30	0.00%	4.19E+16	0%	   
31	0.00%	9.44E+15	0%	   
32	0.00%	2.21E+15	0%	   
33	0.00%	5.38E+14	0%	   
34	0.00%	1.36E+14	0%	   
35	0.00%	3.58E+13	0%	   
36	0.00%	9.77E+12	0%	   
37	0.00%	2.76E+12	0%	   
38	0.00%	8.09E+11	0%	   
39	0.00%	2.45E+11	0%	   
40	0.00%	7.70E+10	0%	   
41	0.00%	2.50E+10	0%	   
42	0.00%	8.39E+09	0%	   
43	0.00%	2.91E+09	0%	   
44	0.00%	1.04E+09	0%	   
45	0.00%	3.85E+08	0%	   
46	0.00%	1.47E+08	0%	   
47	0.00%	57,835,898	0%	   
48	0.00%	23,461,479	0%	   
49	0.00%	9,809,139	0%	   
50	0.00%	4,225,484	0%	   
51	0.00%	1,874,789	0%	   
52	0.00%	856,498	        0%	   
53	0.00%	402,785	        0%	   
54	0.00%	194,928  	0%	   
55	0.00%	97,054   	0%	   
56	0.00%	49,704	        0%	   
57	0.00%	26,176   	0%	   
58	0.01%	14,173	        0%	   
59	0.01%	7,888   	0%	   
60	0.02%	4,511   	0%	   
61	0.04%	2,651   	0%	   
62	0.06%	1,601   	0%	   
63	0.10%	993     	0%	   
64	0.16%	632     	0%	   
65	0.24%	414     	1%	   
66	0.36%	278     	1%	   
67	0.52%	191	        2%	   
68	0.74%	135      	2%	   
69	1.02%	98      	3%	   
70	1.36%	73      	5%	   
71	1.78%	56      	6%	   
72	2.27%	44	        9%	   
73	2.82%	35       	12%	   
74	3.41%	29      	15%	   
75	4.03%	25      	19%	   
76	4.63%	22      	24%	   
77	5.20%	19      	29%	   
78	5.68%	18      	35%	   
79	6.05%	17      	41%	   
80	6.29%	16      	47%	   
81	6.37%	16      	53%	   
82	6.29%	16      	60%	   
83	6.05%	17      	66%	   
84	5.68%	18      	71%	   
85	5.20%	19      	76%	   
86	4.63%	22      	81%	   
87	4.03%	25      	85%	   
88	3.41%	29      	89%	   
89	2.82%	35      	91%	   
90	2.27%	44      	94%	   
91	1.78%	56	        95%	   
92	1.36%	73      	97%	   
93	1.02%	98      	98%	   
94	0.74%	135      	99%	   
95	0.52%	191	        99%	   
96	0.36%	278     	99%	   
97	0.24%	414     	100%	   
98	0.16%	632     	100%	   
99	0.10%	993     	100%	   
100	0.06%	1,601   	100%	   
101	0.04%	2,651   	100%	   
102	0.02%	4,511   	100%	   
103	0.01%	7,888   	100%	   
104	0.01%	14,173  	100%	   
105	0.00%	26,176  	100%	   
106	0.00%	49,704	        100%	   
107	0.00%	97,054  	100%	   
108	0.00%	194,928 	100%	   
109	0.00%	402,785 	100%	   
110	0.00%	856,498	        100%	   
111	0.00%	1,874,789	100%	   
112	0.00%	4,225,484	100%	   
113	0.00%	9,809,139	100%	   
114	0.00%	23,461,479	100%	   
115	0.00%	57,835,898	100%	   
116	0.00%	1.47E+08	100%	   
117	0.00%	3.85E+08	100%	   
118	0.00%	1.04E+09	100%	   
119	0.00%	2.91E+09	100%	   
120	0.00%	8.39E+09	100%	   
121	0.00%	2.50E+10	100%	   
122	0.00%	7.70E+10	100%	   
123	0.00%	2.45E+11	100%	   
124	0.00%	8.09E+11	100%	   
125	0.00%	2.76E+12	100%	   
126	0.00%	9.77E+12	100%	   
127	0.00%	3.58E+13	100%	   
128	0.00%	1.36E+14	100%	   
129	0.00%	5.38E+14	100%	   
130	0.00%	2.21E+15	100%	   
131	0.00%	9.44E+15	100%	   
132	0.00%	4.19E+16	100%	   
133	0.00%	1.94E+17	100%	   
134	0.00%	9.39E+17	100%	   
135	0.00%	4.74E+18	100%	   
136	0.00%	2.50E+19	100%	   
137	0.00%	1.38E+20	100%	   
138	0.00%	8.00E+20	100%	   
139	0.00%	4.86E+21	100%	   
140	0.00%	3.11E+22	100%	   
141	0.00%	2.10E+23	100%	   
142	0.00%	1.49E+24	100%	   
143	0.00%	1.13E+25	100%	   
144	0.00%	9.00E+25	100%	   
145	0.00%	7.65E+26	100%	   
146	0.00%	6.94E+27	100%	   
147	0.00%	6.74E+28	100%	   
148	0.00%	7.04E+29	100%	   
149	0.00%	7.94E+30	100%	   
150	0.00%	9.71E+31	100%	   
151	0.00%	1.30E+33	100%	   
152	0.00%	1.90E+34	100%	   
153	0.00%	3.09E+35	100%	   
154	0.00%	5.63E+36	100%	   
155	0.00%	1.16E+38	100%	   
156	0.00%	2.76E+39	100%	   
157	0.00%	7.69E+40	100%	   
158	0.00%	2.59E+42	100%	   
159	0.00%	1.10E+44	100%	   
160	0.00%	6.28E+45	100%	   
161	0.00%	5.41E+47	100%	   
162	0.00%	9.39E+49	100%	 

What does this chart mean? Essentially, even with all this swanky perfect knowledge that I described above, you're still going to miss by 7 games or more, 30% of the time, when checking your prediction for nearly a third of teams. You'd expect, on average, one of these perfectly average teams to win 69 or fewer and another one to win 93 or more, through no outside factors except random chance.

Now, fast-forward to real baseball, in which no all-powerful entities are giving us perfect knowledge of how good teams are, just imagine how much harder it is to predict what will happen with precision. Even having perfect projections for the 25 players on opening day roster would still leave us grasping for straws, because of all sorts of events that can happen over the course of a season, most notably injuries and trades, that are difficult to predict with any kind of accuracy.

I'll wrap up this quick post with a little personal experience from one of my worst projections ever, the Seattle Mariners scraping out an AL West title in a weak division in 2010, with (I think) 84 wins, just enough to edge out the other teams. They ended up 61-101 and my projection was immortalized with a cover in the ESPN Magazine baseball preview issue that year. Nothing will make that projection a good one, but it's illustrative just how much of a difference simply knowing exact rosters and playing time would be. That projection was run in the end of February. By the time the season actually started, playing time assumptions pushed them down to 2nd behind the Rangers and with actual knowledge of who ended up getting the PAs and IPs, that projection would have dropped all the way down to 73-89. That's halfway to their actual record before you look at a single player projection (damn you, Chone Figgins!).

But there's no reason to be personally upset by these things. Being wrong is inevitable - getting angry is the equivalent of getting angry that the sun's future expansion means that life on earth is past the halfway point.

Dan Szymborski covers baseball for ESPN Insider. He has written about the sport since 2001 for the Baseball Think Factory, where he is an editor. He is also the developer of the ZiPS projection system. You can find his ESPN archives here and follow him on Twitter here.

FanPosts are user-created content and do not necessarily reflect the views of the editors of Camden Chat or SB Nation. They might, though.