Greetings, everyone. I'm pretty sure none of you know me or have seen me around much because I'm more tuned into Broad Street Hockey, SB's Philadelphia Flyers blog, especially this time of year. One of the things I've been doing while listening to the games for the past several weeks has been coding up a baseball simulation in visual basic. It took a while, and some hairs were pulled out as the bugs were worked out, but I finally got it work and I then ran 1000 iterations of the 1977 American League on it. Why '77? Not sure really, but it does offer some very good teams (KC, NY, BOS, BAL) and some very bad teams (TOR, SEA, OAK). After the jump, I'll discuss what the simulation considers and the results of the 1000 iterations of the 1977 American League.
The simulation runs using several different visual basic programs. The progression is fairly simple, and I imagine is the same as other sims out there. It goes as follows:
- Before each PA, determine if there was a SB, CS, pickoff, error, balk, wild pitch or passed ball. All these odds are based off the stats of the pitcher and baserunner.
- Determine odds of 15 possible outcomes of a PA based on platoon splits, park factors, and of course the pitcher and batter
- Based on the outcome of the PA, and the stats of the defense, determine if there was an error on the play, and place all baserunners. Placement is based on the team defense in the field, and individual baserunning of the baserunners involved. If there was an insufficient sample size (<25 instances) a weighted average of team stats and individual stats was used.
- Determine the pitcher for the next PA, based on pitch count, game situation and game performance
- Determine the batter for the next PA
At bat per at bat, inning per inning, game per game, the program would simulated around 20-25 games per second, so a 1131 game slate takes a little over 50 seconds. After each season, the number of wins for each team was recorded. Here is a summary of the results, compared with the actual win totals for each team:
The large circles in those plots are the actual win totals from 1977.
There are several things worth exploring here, and I'm sure you'll get to some in the comments. I have a couple myself.
The biggest one, in my mind, is that not all the breaks will even out over the course of a 162 game season. It may seem like a long season, but there is still plenty of room for luck to enter the equation. All teams had a standard deviation of 6 games from their averages, so the difference between a 95-win contender and an 85-game also-ran could just be a roll of the dice. In extreme cases, it could determine the difference between a bad 70-win season, and a outstanding 90-win season. However, this does have limitations. A really bad team won't get to .500 even with luck consistently running with them, as Seattle and Toronto never registered a winning season throughout the entire simulation. The opposite of this is also true. New York never had a sub-.500 season, and Texas only had 9 losing seasons in 1000. The most striking teams are the ones in the middle- Detroit, Minnesota and Cleveland. All of them won at least 1 division title, and all of them also lost 100 games except for Detroit at least once. Of the three, Minnesota seems to be the strongest and did have the best actual record of the three.
It's also fun to look at the pure flukes, the seasons where incredible luck is with a team for an entire year. New York set the record for wins in a season with 124 in one simulated season, and tied the record (116) 4 times, and broke it 9 other times as well. So, if this simulation is accurate, there was about a 1% chance for the Yanks to have set the win record that year.
Assuming this simulation is accurate, and I don't think it's perfect, but it serves as a very good approximation, there are some significant deviations from the mean that didn't even out in 1977. For example, the Orioles got lucky, while the Rangers got very unlucky. Not only did they fall short of their expected win total, but the Royals may have been playing over their heads and therefore lost the division. In fact, the division races seem reversed. The East race, which was a tight thriller between three teams for most of the year, was dominated by the Yankees who won it over 80% of the time and had an average win total nine higher then anyone else in the division. The West race, where the Royals got ahead earlier and were able to more or less cruise into the post-season, looks like it should have been a lot closer, and that the better team may not have won.
One of the reasons I did this silly little exercise was to see just how much luck enters into play into baseball, and the answer seems to be quite a bit. The only difference between these seasons was the random numbers generated- the same starting lineups, statistics and replacement player logic was used, and completely different results were sometimes seen. The thing is, though, is that we only get to see one iteration play out, and for all the narratives that the talking heads make and come up with, it could all be just a bunch of coins landing the right way...