FanPost

Just how bad was it really? A look at 2018 using Pythagorus, Runs Created, and Visual Basic

Greetings everyone,

With another season mercifully over, I decided to see just how this season went from a bit more of an analytical viewpoint. It seemed to me that the National League was a jumbled mess, that is one team wasn't too much stronger than the other- it almost appeared that the wins and losses were dictated from a random number generator rather than teams being better than each other with a couple exceptions, whereas the American League was just the opposite- there was a well-defined hierarchy with a couple levels of teams that were clearly separated. What would my spreadsheet say about it? How's about the Runs Created formula? What about Pythagorus? There's but one way to find out for sure.

The three methods

The first method is the well-known Pythagorean ratio using actual runs scored and allowed. In this study I'm using the exponent 1.83 which has been shown to have the best correlation to actual wins and losses historically. The formula is elegant in its simplicity and is usually the first thing people, including the media, looks at to determine if a team is lucky or not. The formula is as follows:

Wins=games played x (runs scored)^1.83/((runs scored)^1.83+(runs allowed)^1.83)

This is a good rule of thumb and works a majority of the time, however runs scored and allowed are often dictated by luck as well, so as an extra filter against luck, raw offensive stats can be calculated to arrive at an expected number of runs. There are a number of calculation methods out there, but the one I'll be using in this study is the technical version of the Runs Created formula. That formula can be found here.

This can be done for individuals and a team as a whole. In this study, I'll also be using this formula to calculate allowed runs created, although there is the obvious flaw of a team's defense (errors, outs on base, etc are not considered in this formula), so there is a chance that this isn't totally accurate. However, it should serve as an estimate for it.

Finally, there is the simulation that I have developed in VB and Excel. It is the same spreadsheet and programs that I used in a study of the 2011 MLB season. The methodology is the same- before each at bat, determine what happens to the baserunners (do they steal? Was there a wild pitch/balk/passed ball? Does anyone get picked off or thrown out?, etc) based on the baserunner's and pitcher's season statistics. The result of the at bat is then determined, again using the stats of the pitcher, and batter. After that is determined, baserunners are placed according to their stats, the result of the at bat, and the defense in the field. After each at bat, a determination is made as to whether or not there is a pinch-hitter or pitching change, using usage data of each team and pitcher. I use the log5 method for each of these calculations, and take a randomly generated number and compare it to a range of values to get results for all these things. You can find a more thorough discussion on the log5 method here.

The raw data comes courtesy of Retrosheet, which is a wonderful resource to get information on past games, and seasons. So, with that in mind, I'll do what I'm supposed to do and say this:

     The information used here was obtained free of
     charge from and is copyrighted by Retrosheet.  Interested
     parties may contact Retrosheet at "www.retrosheet.org".

Right, so what happens when we look at the 2018 season using these three methods? Let's look at each division. For each of these divisions, I'll present a histogram showing the results of 1000 years of the simulation, a table showing some details of the simulation, and another table comparing the three methods compared to the actual number of wins for each team.

AL East

47211430_10156501209865280_5191346675985154048_o.0.jpg

Wild cards Division playoffs average st. dev min max real
TOR 0.00 0.00 0.00 70.95 6.08 54 90 73
NYA 450.33 345.33 795.67 95.56 6.08 77 112 100
BAL 0.00 0.00 0.00 51.71 5.68 34 76 47
TBA 478.33 203.33 681.67 92.74 6.14 71 109 90
BOS 413.67 451.33 865.00 97.13 5.88 80 118 108

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
TOR 840.20 818.53 832 741.68 712.23 709 71.80 70.75 69.23 73
NYA 726.06 656.06 669 895.60 820.91 851 96.37 97.38 98.55 100
BAL 957.97 902.41 892 628.64 618.44 622 51.24 54.06 55.21 47
TBA 638.43 593.44 646 761.20 725.75 716 93.92 95.75 88.60 90
BOS 687.28 665.64 647 863.41 845.63 876 97.67 98.46 102.90 108

Ooof. This doesn't paint a pretty picture, but it's nothing we should be surprised about. It looks like while they might not have been quite 115-loss bad in reality, they were still a very bad team. The 51.71 win average is the lowest in the 3.5 seasons I've simulated with my spreadsheet, and by a wide margin. In 49 out of the 1000 simulated they lost at least 120, suggesting about a 5% chance at setting a new mark of futility. Perhaps the most extreme example was season 38 out of 1000 in my simulation- in that season they finished 34-128, only 74 games behind Boston, with a run differential of -536, scoring only 548 runs while giving up 1084. It can always be worse, folks. It appears that the offense performed as expected, although they may have been lucky to "only" give up 892 runs. They were a bit unlucky with Pythagorus, and taken together they might be a tick better than last year based on this study. However, without Gausman and Machado propping up the team's numbers, and no obvious upgrades on the farm, another long summer in Baltimore seems like a foregone conclusion.

In the rest of the division, it appears Boston, while still a very good team, may not quite have been the world-beaters their gaudy real-life win total would suggest. They were still the class of the division, but the gap between themselves and New York (who also appears to have overachieved, but not by as much) and surprisingly Tampa may not be as large as people think. Toronto's 73 wins were about right for them.

AL Central

47573291_10156501209880280_8513812455661502464_n.0.jpg

Wild cards Division playoffs average st. dev min max real
KCA 0.00 0.50 0.50 66.31 6.18 45 87 58
MIN 21.17 17.00 38.17 78.72 6.23 57 100 78
CLE 2.50 982.50 985.00 99.40 5.86 79 116 91
CHA 0.00 0.00 0.00 67.69 5.82 50 88 62
DET 1.00 0.00 1.00 67.56 6.08 48 88 64

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
KCA 786.26 838.09 833 637.50 633.20 638 65.64 60.67 61.62 58
MIN 739.52 774.73 775 714.84 709.28 738 78.48 74.47 77.38 78
CLE 682.01 672.99 648 872.38 799.63 818 98.94 93.67 98.01 91
CHA 823.27 797.50 848 672.34 648.81 656 66.16 65.89 62.32 62
DET 797.85 765.82 796 649.75 609.35 630 65.96 64.30 63.93 64

In the Central, the narrative of it being Cleveland and bunch of bad teams holds water. The only thing that is standing out here is that Cleveland should have been even more dominate- they appear that they were a high-90s win team instead of a low-90s win team. They won this division over 98% percent of the time, and I'm not seeing anything- at least in this study- to suggest that a substantial change is coming for 2019.

AL West

47350596_10156501209860280_1372024137316302848_n.0.jpg

Wild cards Division playoffs average st. dev min max real
TEX 0.00 0.00 0.00 62.49 5.98 45 85 67
HOU 162.67 720.50 883.17 97.44 5.82 77 115 103
ANA 35.83 3.50 39.33 79.47 6.02 59 101 80
SEA 50.00 14.00 64.00 80.59 6.06 63 96 89
OAK 384.50 262.00 646.50 92.22 5.95 70 110 97

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
TEX 920.19 840.83 848 709.02 695.95 737 62.04 67.12 70.66 67
HOU 554.71 535.28 534 738.07 740.07 797 101.70 104.33 109.42 103
ANA 739.44 706.14 722 731.69 699.38 721 80.22 80.29 80.90 80
SEA 746.62 685.78 711 744.29 678.36 677 80.77 80.19 77.37 89
OAK 693.57 629.60 674 830.24 782.22 813 94.21 96.88 94.76 97


Elias's old team holds up well here, as Houston was the clear class of this division. Their all-world pitching staff was no fluke, and they had plenty of help from their offense, even if their underlying numbers may not be quite as good as their actual run total would indicate. Oakland was a worthy competitor, with an outstanding offense and an OK pitching staff. On the opposite end of the spectrum, Texas was undone by their pitching and it would appear it's no fluke- it prevented them from reaching the postseason in all 1000 simulations. Seattle's fade out of the postseason appears to have been a regression to the mean- they cheated Pythagorus across all methods.

NL East

47322327_10156501209975280_101453930759716864_n.0.jpg

Wild cards Division playoffs average st. dev min max real
NYN 88.25 42.67 130.92 81.22 5.92 60 98 77
WAS 255.08 463.00 718.08 91.22 6.01 72 109 82
MIA 0.00 0.00 0.00 63.21 5.92 40 80 63
ATL 254.08 419.50 673.58 90.66 6.14 71 111 90
PHI 137.67 74.83 212.50 82.91 6.01 65 103 80

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
NYN 596.58 674.98 707 591.96 652.37 676 80.42 78.48 77.68 77
WAS 539.41 680.03 682 622.11 769.16 771 91.51 90.09 90.05 82
MIA 697.23 757.36 809 535.41 577.26 589 61.80 61.28 58.12 63
ATL 621.80 626.19 657 726.27 739.77 759 92.43 93.26 91.63 90
PHI 667.88 685.10 728 690.32 667.05 677 83.45 79.02 75.62 80


Our start into the mess that was the NL starts in the East, where Miami was the one exception we figured they'd be. The Nationals appear to have gotten very unlucky- they should have been right there with the Braves but they were on the wrong side of Pythagorus all year. What I still can't figure out is how my spreadsheet is knocking a good 150 runs off their runs allowed and runs scored totals- that is obviously a major outlier in this study, but they appear to have cancelled each other out in terms of wins and losses. The Braves themselves acquitted themselves well here, it appears that they are ahead of some of the other rebuilding teams in this division and should be considered favorites in 2019 given their youth movement and possible free agent departures in Washington. The Mets and Phillies were behind, although both teams with a fair amount of luck could have snuck into the postseason.

NL Central

47181761_10156501209955280_7312580695052779520_n.0.jpg

Wild cards Division playoffs average st. dev min max real
SLN 141.83 153.83 295.67 84.28 6.43 57 102 88
PIT 53.25 31.50 84.75 78.73 6.18 60 101 82
CHN 207.83 238.33 446.17 87.20 6.08 67 105 95
MIL 178.67 574.33 753.00 91.57 6.09 72 107 95
CIN 2.67 2.00 4.67 71.67 5.99 55 88 67

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
SLN 640.17 668.35 691 685.23 715.56 759 86.03 86.05 87.94 88
PIT 693.12 701.04 693 677.55 679.29 692 79.32 78.67 80.89 82
CHN 668.77 653.84 645 735.92 758.61 761 88.07 91.95 93.16 95
MIL 630.23 632.94 659 737.83 737.82 754 92.60 92.29 90.93 95
CIN 830.46 816.87 819 722.84 710.35 696 70.77 70.70 69.03 67

This division was the most jumbled in the study, with all teams showing similar numbers except for the Reds pitching being below average which dragged them behind the other four teams. Everyone more or less did as they were supposed to here, although Chicago overachieved just enough to tie up a slightly better Milwaukee team. There is a little separation within the teams in this division, and it wasn't the pure random number generation I thought it might have been.

NL West

47393069_10156501209990280_5853577439858393088_n.0.jpg

Wild cards Division playoffs average st. dev min max real
LAN 11.50 988.50 1000.00 108.77 5.76 91 123 91
SFN 0.00 0.00 0.00 65.44 6.01 47 85 73
ARI 399.50 7.50 407.00 86.64 5.91 70 106 82
SDN 0.00 0.00 0.00 61.02 6.00 45 78 66
COL 269.67 4.00 273.67 84.49 5.89 64 104 91

Sim RA RC RA Actual RA Sim RS RC RS Actual RS Sim pythag RC pythag Actual Pythag Actual
LAN 550.23 600.66 610 849.54 811.24 804 111.60 102.73 101.04 91
SFN 704.49 683.24 699 556.71 590.64 603 63.82 70.27 70.12 73
ARI 619.39 639.93 644 668.44 663.34 693 86.64 83.66 86.43 82
SDN 776.91 742.07 767 565.24 595.17 617 58.07 64.87 65.08 66
COL 720.44 710.05 745 734.44 754.11 780 82.43 85.46 84.40 91

And now we come to the most perplexing aspect of this study- my spreadsheet's love affair with the Los Angeles Dodgers. Not only were they clearly the best team in this division, their 108 wins were 6 wins better than any team in the 3.5 seasons I've simulated (2011, 2015, 2018 and 1977 AL). The old record was held by the... 2015 Dodgers. Other indicators show they should have won several more games than they did and run away with the division, but Colorado did just enough and the Dodgers appear to have severely underperformed- the 17-win difference between a team's actual record and simulated average was also a record- to force a playoff. Of course, we'll never know for certain, but this might be one of the most under-the-surface division races that should have never happened of all time.

Closing remarks

Unlike the 2015 study, where every team made the playoffs at least once in the 1000 years simulated, eight teams did not in this study, with an additional three teams making it less than 5 times. With more teams averaging more than 95 wins or 95 losses than that study, it appears that the competitive balance of MLB has taken a step backwards in these three years.

One of the reasons that drove me to simulate this year as opposed to some other years was to see what having seemingly extremely good and bad teams did to the standard deviation of wins throughout the league. Previous simulations showed all teams right around 6 wins, and this year was no different. The standard deviation was lowered a little bit for the Dodgers and the Orioles, but not enough to dent it significantly. There would be a point were a team would be so good/bad that this number would start to drop, but it appears to be beyond that seen in the MLB- even with a team as bad as the 2018 Orioles.

One more note on the Orioles- the most wins they got in the 1000 year study was 76. In that year they scored 699 runs, and gave up 879. Imagine that they did this instead- would they have cleaned house as they did? I would think they probably wouldn't, although they'd still be the same team that won only 47 this year, only they got really lucky.

On the other side of this coin are the Dodgers, who it appears are a lot better than their record indicated. Would Roberts be on the hot seat if they simply played to their numbers? Tough to say. He might be responsible for this drastic underpermance to some extent, but the algorithm using in the spreadsheet takes managerial decisions into account, so... maybe?

All methods here did not use BABIP or FB% or anything like that- the simulation only considered absolute results in its model. This might be something I can work on in future simulation models.

Using the methods here to predict 2019 results might not be the best idea as teams age- for better or for worse- trades and free agents happen, and injuries, rookies and all kinds of other things happen during the offseason and during the season that are unpredictable. However, using this study we can gain a better perception of where a team's talent was last year and make assumptions as to which way a team can regress. With that in mind, on the strong upside I'd have the Dodgers (by a whole bunch), Washington, Cleveland, and Kansas City (but it won't matter), and on the strong downside I'd have Boston (but they'll still be very good), Seattle, the Cubs, and the Giants. Baltimore, absent any other factors, might have a very small improvement, but not enough to escape another potentially record-setting year of futility at Camden Yards.

tl;dr version

Yeah, they were that bad.

FanPosts are user-created content and do not necessarily reflect the views of the editors of Camden Chat or SB Nation. They might, though.