clock menu more-arrow no yes mobile

Filed under:

The predictive qualities of advanced pitching statistics have failed the Orioles

xFIP? More like xFAIL, amirite?

Mike Stobe

There were a great many things about the 2012 Orioles that led people to think that they couldn't repeat their success in 2013. Most of these things were related -- the team's record in one-run games, its otherworldly bullpen, its record in extra-inning games and its win-loss record being outrageously higher than its Pythagorean record (based on run differential). All of those factors really tied back to the same massively lucky stretch that the Orioles had in the first half of 2012, and they haven't translated into failure in 2013 because this year's club overall looks more like the post-All Star Break club of 2012 (which actually played to its talent level) rather than the mostly lucky first-half team.

But another insidious narrative was also lurking around the edges of the commentariat, and it had to do with the starting rotation. Around the corners of the internet who closely monitor such things, a common thread emerged that several Orioles' starters had been lucky or unlucky in 2012, and were due to regress or improve this year simply based on a return to the statistical mean.

Ever since Billy Beane and the Moneyball A's famously exploited a market inefficiency by noticing that ballclubs undervalued walks and extra-base hits, a statistical movement in baseball has emerged to look for the next glaring weakness in baseball analytics. When it comes to pitching, the common application of these techniques is to look past ERA (the results on the field) and instead focus on factors like batting average on balls in play (BABIP), left on base percentage (LOB%), groundball-to-flyball ratio (GB/FB), and the like -- many of which are encapsulated in fielding-independent pitching (FIP) and expected fielding-independent pitching (xFIP), ratios designed to resemble ERA but to take luck out of the equation. The theory goes, then, that pitchers with an ERA below their FIP/xFIP are likely to regress, while pitchers with an ERA above their FIP/xFIP are likely to improve naturally. And guess what? This theory was wrong (at least partially) about every single member of the 2013 Orioles' Opening Day rotation. Let's take a look at them, one by one. All 2013 stats are as of Monday.

Jason Hammel

2012 ERA/FIP/xFIP: 3.43/3.29/3.46

2013 ERA/FIP/xFIP: 5.20/5.09/4.67

Of all the unexpected contributions to the 2012 Orioles starting rotation, the sabermetric crowd would have had you believe that Hammel's 2012 performance was legit. His FIP and xFIP were essentially in line with his real results; he had discovered a new two-seam fastball that apparently enabled him to finally pitch to his true potential. His K/BB, GB/FB and BABIP numbers all supported the notion that Hammel was a sustainably good pitcher. And yet, so far in 2013, Hammel has gone back to a below-average pitcher who coughs up too many longballs and doesn't strike out enough guys. The stats that were specifically designed to flush out fluky single seasons said that Hammel wasn't having a fluky single season. He was.

Chris Tillman

2012 ERA/FIP/xFIP: 2.93/4.25/4.34

2013 ERA/FIP/xFIP: 3.62/4.71/4.11

I'm going to give the advanced stats partial credit on this one; they forecast that Tillman would regress from his 2012 performance, and he has, a little bit. However, he hasn't fallen back to nearly the point of his FIP or xFIP in either season. Tillman illustrates one of the key failings of these advanced statistics, which assume that all home runs are created equal in terms of their likelihood to allow runs to score, when in fact Tillman has a propensity to yield solo homers. Whether this is because Tillman pitches better from the stretch than the windup, or because he focuses and executes better under pressure, he seems to be demonstrating at least some prolonged ability to defy regression to FIP's mean.

Wei-Yin Chen

2012 ERA/FIP/xFIP: 4.02/4.42/4.34

2013 ERA/FIP/xFIP: 2.78/3.70/4.72

The effect is least pronounced with Chen, but once again FIP and xFIP asserted that Chen got at least a little bit lucky in 2012. This demonstrates another failing of xFIP in particular -- it dislikes flyball pitchers like Chen, assuming that all flyballs are equally likely to be turned into home runs, when in fact the eye test can show that Chen excels at inducing medium-distance flyballs that aren't particularly scary. For a more pronounced version of this effect, check out Darren O'Day, who has a career xFIP more than a run higher than his actual ERA, because he makes a living out of inducing weak pop-ups with his rising fastball.

Miguel Gonzalez

2012 ERA/FIP/xFIP: 3.25/4.38/4.63

2013 ERA/FIP/xFIP: 3.69/4.27/4.34

If you're thinking that I'm picking on FIP unfairly in my first three examples, here's where it really starts to come into question. Adherents to the notion that FIP can predict regression were wildly insistent that Gonzalez was skating by on smoke and mirrors, whether it was due to a lack of available scouting, the inability of the league to adjust to his pitching or just sheer luck. In 2013, we were led to believe, Gonzalez would be first in line to fall back to earth (just ahead of Tillman). No such thing has happened. The only bump in the road for Gonzalez has been a short DL trip for a blister, aside from which, he has turned in consistent, #2-starter-like production, managing to keep opponents out of big innings without showing flashy stuff.

Jake Arrieta

2012 ERA/FIP/xFIP: 6.20/4.05/3.65

2013 ERA/FIP/xFIP: 7.23/4.60/4.86

If you think I've demonstrated that FIP and reality have their squabbles, here's where FIP cheats on reality with reality's sister, and reality burns all of FIP's stuff in the front yard while the neighbors watch in awe. Jake Arrieta, said FIP, got screwed over by a big heap of bad luck in 2012. FIP saw Arrieta's rising strikeout rate, falling walk and home run rates, high BABIP and low strand rate and said that he had the ultimate hard-luck season, with all of his hard hits and home runs just randomly coming at the most inopportune times. Unfortunately for those of us who actually watch baseball, it was obvious that the distribution of these events was in no way random, and that Arrieta had a tendency to serve up a juicy meatball whenever it mattered most. His BABIP was high because he gave up lots of hard contact, and his strand rate was low because he did it most with men on base. So it was no surprise to a lot of us when Arrieta's 2013 didn't simply snap into line with his FIP or xFIP, even if we were really rooting for it to do so.


I'm sure this little illustration will cause some to lump me in with the anti-science/anti-math crowd, so let me be clear: I don't hate advanced statistics as such. Baseball has come a long way from using batting average, pitcher wins, RBIs and saves in a vacuum, and that's happened in large part because a statistical community has emerged to demonstrate that players can add value to ballclubs in ways that don't always show up in those luck/team-based traditional stats. But for a new peripheral-based stat (like FIP or xFIP) to displace a reliable one that reflects actual results resultsbetween the lines (like ERA or ERA+), it needs to demonstrate that it can outperform that statistic with some consistency, in terms of either reflecting or predicting a player's actual value better than the old statistic. The fact that FIP and xFIP failed to do so not once or twice, but with every single member of the Orioles' 2013 rotation, tells me that they're not there yet.

I'm not a deep statistical mind; I can't tell you every detail of why FIP and xFIP aren't working as they're supposed to. Some have surmised that it's due to the league-wide increase in strikeouts. FIP and xFIP place a high premium on Ks, this logic says, and if the league starts to accept Ks from more and more of its hitters, FIP might overvalue mediocre pitchers who strike out a lot of guys. This seems valid to me, in addition to the arguments I made above that FIP undervalues flyball pitchers who can regularly keep the ball in the park, and refuses to acknowledge that pitchers may respond differently to high-leverage situations or pitch with different amounts of skill from the stretch or the windup. As it stands, I spent much of 2012 intrigued by the relationship of the Orioles staff to FIP, and until I see more players' reality start to resemble their FIP, I'm going to say that it's not the Miguel Gonzalezes of the world who get by on smoke and mirrors -- it's the stats that are used to put those very pitchers down.