Ah, the win predictor. The holy grail of those looking to know how a team’s season is going to end. In 2015 I continued using Neil Paine’s technique to estimate the Orioles’ final winning percentage (PCT). I took measurements every week and posted the results in my Weekly Wrap article. To review the technique, see examples, and see how well it worked in 2014, see my post on this topic from last year.
Before the 2015 season I estimated the Orioles would win 86 games. To get this number I averaged 17 sources, such as the team’s actual 2014 record, its 2014 Pythagorean and BaseRuns records, and any specific prediction I could find of the team’s chances. This method has its flaws, chief of which is giving equal weight to everyone’s pick, but it removes any bias I might have and it's a repeatable process with some logic behind it.
The highest predicted record came from the team’s 2014 record of 96 games. The lowest predicted source came from somebody named Bruce Bukiet who predicted the team would win 74 games. Of all my sources, Mike Oz of Yahoo! Sports hit the nail on the head with a predicted total of 81 wins. FanGraphs did well by predicting 80 wins. Chris Cwik, also of Yahoo!, and the Sports Illustrated panel of experts had 82 wins. The most optimistic picks were by Joe Sheehan and Ken Rosenthal who both predicted 91 wins for the team. Close behind them was our own Mark Brown who saw an 89-win season in the cards. (That’s right Mark, I called you optimistic!)
I kept track of the each week's predictions during the season. Here is the chart:
The model thought the most of the team early in the year and right before the All-Star Break. Keep in mind that through most of the first half, the team struggled to break .500. That's how regression to the mean works; teams can play well above or below their true talent level for weeks at a time. Forecasting results by using regression to the mean will smooth out bumps in results.
The predicted win total reached 86 wins three times: on April 17th when the team was 5-4, on June 26th when they were 39-34, and on July 3rd when they were 42-37. The latter two predictions reflect the team’s strong run in June, during which they went 18-10 and outscored their opponents by 46 runs. Unfortunately that good month came on the heels of a 10-10 April and a 13-16 May.
The low point in the model first came on June 5th with a prediction of 78 wins. At that time the Orioles were 24-29. The model next predicted a total that low after the August swoon. Here they received a 78-win prediction two weeks in a row: first when the team was 64-69 and again when it was 67-72.
The largest week-to-week changes were drops of four games. One occurred from May 29th to June 5th during a week where the Orioles went 2-5. Another four-game drop occurred from August 28th to September 4th when the team went 1-5. This last bit included the last of their miserable 3-15 stretch that all but eliminated them from contention. The next prediction, on September 11th, was for 78 games as well.
Here is a different look at the weekly predictions: the absolute error at each point in the season with respect to the actual 81-game win total:
This year’s model had much less error than last year. In 2014 the model’s maximum error was as high as 18 games and the predicted win total continually lagged behind the eventual win total of 96 games until the last month of the season. This year, the maximum error was just five games and the most commonly observed deviation was just one game. The next-highest error totals were three games and, fantastically, zero games. That’s very good, but I don’t think the improvement in the model was because of anything I did. I think the Orioles simply played to everyone's expectations to a much greater degree than in 2014.
In analyzing this model's performance, I took one final measurement. In order for a model to be useful, it has to be better than something. Because baseball teams and their fans feed off their current record that day, what if fans had just assumed that each week, whatever PCT the 2015 Orioles had would be their final PCT? Would they have been better off doing this?
To test this I compared the root-mean-square error (RMSE) of this approach vs. relying on the regression model:
- RMSE of assuming the 2015 Orioles would’ve kept each week’s pace: .026
- RMSE of regression model: .015
So you’d have been nearly twice as accurate using the regression model this year.
Overall I’m pleased with this two-year experiment, and I hope to continue it in 2016. Regression to the mean is a powerful force in baseball as it is in life. The model's performance this year was even better than last year's, showing us that you can use it to get a reasonable idea of where the team will finish when the season’s over. Hopefully that good performance, like the team on the field, continues.