Mark Twain is reputed to have popularized the saying, "[t]here are three kinds of lies: lies, damned lies, and statistics." Twain was right -- stats, when used in ways beyond their designs, can skew our understanding of information.
The purpose of this article is to identify common and popular stats that evaluate pitching performance and also understand the limitations of these stats. During the regular season, I will use many of the below measurements for opposing pitcher PDBs. No one number can tell us how well a pitcher performs, however.
Instead, PDBs will present a comprehensive picture where stats are just a piece in the puzzle. That means I will use game theory, external factors, stats, media reports, scouting observations, and Pitchf/x in PDBs.
Unless otherwise noted, all data is courtesy of Fangraphs.
-------------------------------------------------
In Parts I and II, I reviewed ways to understand pitching performance that weren't necessarily related to the pitcher's delivery of the ball. Some of these factors limit stats in important ways when trying to figure out how effective a hurler has been, is, or can be. I'll refer back to those articles where appropriate, but they aren't required to understand what's going on below.
Also, I imagine that the readership has a range of understanding on the topic. I've tried to review a swath of information to hopefully add something for everyone.
Sample Size
Before drawing conclusions on whether a statistic is valid -- that is, whether the measurement corresponds accurately to the real world -- the size of the underlying sample must be considered.
Take an extreme example. No one really believes that Stephen Strasburg's true talent is giving up 5 earned runs in 9 innings pitched. Yet, if you only reviewed his September 2012 splits, it would appear he wasn't awesome.
Pizza Cutter studied the reliability of pitching statistics at certain numbers of plate appearances in his 2008 "Statistically Speaking" study. He concluded that with certain sample sizes, you can't tell a lot about a pitcher to get an accurate idea of true talent.
So, when can we start to trust certain pitching stats? Below, I've listed Pizza Cutter's observed stabilization rates:
As he summarized, "you can get a pretty good idea of how often [a pitcher] walks and strikes batters out, and what type of batted balls he gives up generally' but that's about it."
Nearly all at Federal Baseball probably know the refrain: when presented with pitching data, keep the sample size in mind.
Earned Run Average (ERA)
Explained
ERA "measures the mean of runs given up by a pitcher per nine innings pitched," as Marilyn Green at RedBird Rants explains. The stat can be calculated by multiplying the number of earned runs given up by nine, and then dividing the product by the number of innings a pitcher has thrown.
For your viewing pleasure, here is the average Major League ERA since 2008:
ERA does show depression in run scoring, which is accurate. But, we'll see why it's not great at capturing pitcher talent.
2012: Qualified Leaders and Followers
Limitations
I mentioned this in Part I, but ERA may include many different things that are outside of a pitcher's control. Defense, ballpark, and luck can all affect ERA (also known as stat "noise"), but those things don't tell us about talent level or performance. Because of this, ERA is not the best stat to use to evaluate pitchers.
But don't take my word for it. Derek Johnson, who recently accepted the Chicago Cubs' minor league pitching coordinator position and who previously was the pitching coach at Vanderbilt, has stated that he believes that "ERA is a relatively poor indicator of what the pitcher is actually doing or not doing in his outings," later emphasizing the role of defense and contact rates when trying to evaluate pitchers. Certainly, the industry is coming around to this view.
See also
ERA- is used by some to evaluate how far a pitcher's ERA is above or below league average, which is designated as 100. Because lower is better, a pitcher with an ERA- of 80 is considered to be 20% better (on an ERA basis) than the average big-league hurler.
Our own dc_Roach also came up with a nifty measurement, (e)ERA, detailed here. This stat uses inherited runners and run probability percentiles to better evaluate ERA.
Walks and Hits Plus Innings Pitched (WHIP)
What it is
WHIP measures how many baserunners a pitcher gives up an inning. It is calculated by adding walks and hits together, then dividing the sum by innings pitched. Any pitcher around or under 1.20 is going pretty good.
2012: Qualified Leaders and Followers
Limitations
WHIP isn't too bad a number to use when trying to get a rough idea of baserunners per inning, but it doesn't entirely help to understand a pitcher's talent for a couple of reasons. First, not all hits are created the same. A single is better for a pitcher to give up than a double or triple, for instance. Second, umpire strike zones -- discussed in Part I -- can influence whether a pitcher walks a batter, and both are outside the hurler's control. Finally, the next statistic can affect the underlying hit data, too.
Batting Average on Balls in Play (BABIP)
What it is
BABIP measures how frequently balls in play become hits. This number focuses only on pitches that hitters make contact with. You can see from the table below that BABIP doesn't vary much by year; on average, once a ball is put in play, it will drop in for a hit 29% of the time.
2012: Qualified Leaders and Followers
The Detroit Tigers had two hurlers with pretty high BABIPs. Rick Porcello took "honors" at .344, and Max Scherzer registered .333. Maybe this has something to do with Miguel Cabrera manning the hot corner (pure speculation).
On the flip side, interestingly, the Los Angeles Angels had pitchers with the lowest BABIPs in 2012. Jered Weaver led the league with a .241 mark, and his former teammate Ervin Santana checked in at .241 as well.
Limitations
BABIP isn't the holy grail (well, nothing is) because a pitcher's defense can influence hits on balls in play. Also, BABIP varies significantly when you break down hits by batted ball type:
For this reason, it is best to pair BABIP with batted ball rates to see whether a pitcher is closer to unlucky, or was giving up frozen ropes left and right.
Batted Ball Rates
What it is
Batted ball rates are pretty straightforward: the official scorer at each game categorizes each ball in play as either a line drive, ground ball, fly ball, or infield fly ball, according to the linked source. However, Fangraphs says that Baseball Info Solutions "tracks" the classification, and I am not sure if that is a distinction with a difference. Either way, the types are the same.
As the table directly above shows, line drives hurt. Although grounders go for hits more often, they go for extra bases less, and consequently are less likely than fly balls to lead to a run. In 2012, the average rates for each batted ball type were as follows:
- 21% line drives;
- 45% ground balls;
- 34% fly balls; and
- 10% infield flies (subsumed within fly balls, which is why it totals 110%. They're giving it their all!).
Between batted ball, strikeout, and walk rates, what combination leads to pitcher success? The following chart plots K/9 and GB% for 2012 pitchers. The horizontal bar represents the league average strikeout rate per 9 innings, 7.1. The vertical bar represents the league average ground ball rate. I've noted a few names, but generally speaking, if you strike a lot of guys out and get near average or better grounders, you're going to have success (unless you're a walk machine like Edinson Volquez).
Batted ball rates affect many of other stats presented here. For that reason, they are a helpful cross-check to asses validity.
---------------------------------------------------
Following are ERA estimator stats, which pull together different bits of information in an attempt to isolate pitcher skill on an ERA scale. The phrase "ERA estimator" can be a little misleading; Matt Swartz (of game theory fame) describes it as an "estimate [of] how well a pitcher pitched in the present."
Fielding Independent Pitching (FIP)
What it is
FIP measures how a pitcher performed based on "true" outcomes (i.e., plays where defense is factored out): walks, strikeouts, hit by pitches, and home runs. The formula is a bit dense for this format, but the statistic uses a constant to scale to ERA so that it can be appreciated in the same way the longer-tenured ERA is. The notion is to cut out the variability of defense to have a better idea how talented a pitcher really is.
What makes a good FIP? Fangraphs' Glossary assists:
2012: Qualified Leaders and Followers
Santana should probably consider himself fortunate that more runners weren't on base (thanks to his low BABIP) when he was giving up an average of 2 home runs per 9 innings in 2012 (the reason why his FIP is so poor).
Limitations
FIP can fluctuate based on small samples. It is not unusual to have an FIP/ERA disagreement; about a third of ML pitchers have a difference of .2 or more.
When ERA and FIP disagree, the tendency is to chalk it up to "luck" (meaning, BABIP and defense), or lack thereof. But some pitchers sequence well, or can strand runners on base more frequently than others. There have been many good pieces on the FIP/ERA gap, including this recent one by Glenn DuPaul at Beyond the Boxscore. When looking at the FIP-ERA difference, check out batted ball and strand rates to evaluate whether BABIP is giving us a picture that explains a high or low ERA and any associated gap between that and FIP.
Also consider that, as Pizza Cutter found, a pitcher's HR/FB ratio can vary wildly each year. The average pitcher gives up around 11 home runs per 100 fly balls, and the elite sit around 6-7% HR/FB. So, if a pitcher happens to have a particularly high ratio one year, and a low one the next, it can be challenging to determine where his true talent lies. For this reason, we should look at FIP with a pitcher's career HR/FB, along with a pitcher's historical BB and K rates, against those for the current year. If you're not feeling up for that, the next stat can help reduce your labor.
See Also
FIP- measures how far above or below a pitcher rates relative to league average in FIP. 100 is average, and lower is better.
Expected Fielding Independent Pitching (xFIP)
What it is
Instead of using a pitcher's actual HR/FB rate, xFIP uses a league average HR/FB rate (again, about 11%) and multiplies that by a pitcher's fly ball rate when calculating a pitcher's figure. xFIP tells us how many home runs a pitcher should have given up, assuming a league average HR/FB rate.
Why is xFIP useful? Because (cue the broken record) HR/FB rates vary throughout a year, career, or part of a season, this statistic regresses information to league average. This better frames what we could have reasonably expected performance-wise. Also, it correlates well with future ERA.
Limitations?
Some pitchers give up more, or less, HR/FB consistently. Is xFIP less helpful in this case, because these pitchers have shown that they are consistently above or below the HR/FB average? Sort of, as the following charts will show.
Here is a list of NL pitchers who have higher than average HR/FB rates over the last four years, meaning their xFIP appears to paint a shinier picture than FIP:
Because these HR/FB rates are higher than the league average of 11%, xFIP is telling us that their performance should have been better. On the other hand, this is four years of high HR/FB data. On the, uh, third hand, both Haren and Gallardo had one year of the four under the league average HR/FB rate in the sample.
So, we're still seeing some level of HR/FB fluctuation by season within the sample, and xFIP helps to temper the conclusion that these guys are inherently homer-happy pitchers (which I am glad to say in Haren's case, not so much the other three).
Now to look at NL pitchers who, in the aggregate, have lower than average HR/FB rates between 2009-2012:
Here, the xFIPs are greater because xFIP adjusts these hurlers' HR/FB rate northward. But, again, this is four years' worth of above-average performance (due to "below" average HR/FB rates).
What issues come up? Sanchez is an interesting case. Aside from a depressed HR/FB rate in 2010 (4.5%), he's been within a percentage point of league average each year of the sample (to the good side). Yet 2010 puts him far below average overall.
The point is that for some players, aggregate historical data can explain the FIP-xFIP distinction. But, sometimes looking at a large sample to say whether a pitcher consistently out-pitches his xFIP can't be entirely trusted. Because xFIP looks at league average HR/FB rate, it helps to minimize the influence of a volatile stat while also reducing inaccurate conclusions on true pitcher talent.
Skill Interactive ERA (SIERA)
What it is
As I've gone through these stats, the measurements have gotten a little more complex mathematically. In a different way, though, their limitations have diminished.
SIERA completes the list and trend.
Like FIP and xFIP, SIERA is an ERA estimator - it evaluates what happened independent of defense. Unlike FIP and xFIP, SIERA takes into account batted balls. Of course, batted ball data underlie many of the stats in this article, so the inclusion is a significant one.
SIERA values strikeouts more than FIP, because strikeouts aren't just good in themselves; high strikeout pitchers generate weaker contact than low strikeout pitchers, and weaker contact turns into outs more often. SIERA also differentiates between high and low walk pitchers. Simply, walks don't hurt as much if you don't give up much of them. Finally, SIERA adjusts for batted balls. As Fangraphs states, "the higher a pitcher's groundball rate, the easier it is for their defense to turn those ground balls into outs."
Final note: SIERA adjust for park and accounts for the modern run scoring environment.
2012: Qualified Leaders and Followers
You can see how SIERA loves strikeouts (Mad Max rocked a 29.4% K rate, best in the bigs) and low walks (Lee walked 3.3% of batters, or twenty-eight of eight-hundred forty-seven (!) batters faced). It hates the opposite (Romero was nearly even in K% and BB%).
Limitations
Other than small sample size, not a lot.
And, mercifully, that's it! I didn't bring up Wins Above Replacement, because that is probably for another post, and we're pushing 2,700 words. I'll cover the basics of Pitchf/x in the next post. Most importantly, pitchers and catchers in four.
0 komentar:
Posting Komentar