« Home | More Draft History online » | Loading up in Big D » | He's Baaaaack » | Football on the Hill » | The Color of Money » | Treasure hunting » | Welcome to the Neighborhood » | Mr. Smith Goes To 'Frisco » | All Shook Up » | Switching Channels » 

Sunday, May 08, 2005 

Lies, Damn Lies, and ...

This is an example of how statistics can lie if you do not use a big enough sample set.

I started looking through some 2004 NFL team stats from nfl.com, and was wondering if "negative plays" have as much of an impact as we're led to believe. These would include turnovers, penalties, and even sacks. I could measure the last two using yardage, and figured I could use some rules of thumb to convert turnovers into yards too. I tried the following:

Negative Yards = Penalty Yards + Sack Yards + (Fumbles Lost * 20) + (INT * 50)

My yardage numbers for Fumbles and INT are based loosely on Pete Palmer's research for his Hidden Game of Football and Pro Football Abstract books, plus some research from Football Outsiders. Using this formula, the results are as follows:

Team Neg. Yards
New York Jets 1524
Indianapolis 1550
San Diego 1624
Jacksonville 1866
Pittsburgh 1897
Minnesota 1902
Seattle 1925
Baltimore 1931
New England 1944
Philadelphia 1951
Detroit 1998
New York Giants 2126
Houston 2149
Denver 2150
Green Bay 2201

League Average 2205

Carolina 2236
Kansas City 2240
Atlanta 2265
Cincinnati 2329
Washington 2339
Buffalo 2352
New Orleans 2364
Arizona 2388
Tennessee 2430
Tampa Bay 2475
Dallas 2505
Oakland 2534
Cleveland 2536
San Francisco 2608
Chicago 2625
St.Louis 2795
Miami 2798

The best way to judge whether this statistic is meaningful is to look at the records of the teams who are above and below the average. The 15 teams whose Negative Yardage was less than the league average won an average of 10 games during the 2004 season, while the other 17 teams averaged 6.2 wins.

At first glance, the biggest flaw seems to be that poor teams like Detroit, Houston, and the Giants all finished above the average although they finished below .500. However, 10 of the 12 playoff teams came from those top 15 teams. The exceptions were Atlanta and St. Louis - it's interesting that Atlanta finished 21st overall using this metric, while St. Louis, who were lucky to make the playoffs at 8-8, finished next-to-last. Obviously, this formula is not considering a team's ability to make big (positive) plays that overcome the negative ones.

When I dug a little deeper, the numbers looked better. Since a reasonable margin of error is 10%, I figured that an average team would be expected to accumulate +/- 10% the league average of negative yards. This created 3 "buckets" of teams - above average, average, and below average:

B 10% 11.1 wins 10 teams
+/- 10% 7.8 wins 13 teams
A 10% 4.9 9 teams

Those 13 teams in the middle would all be considered to have "earned" an average amount of negative yardage, and their average record was nearly 8-8, as expected. The fact that the average win totals for the highest and lowest performers spread apart also led me to believe that I was on the right track.

Finally, I ran a standard correlation calculation in Excel. The correlation coefficent between Negative Yards and wins in 2004 was -0.705, which, while not as strong as I would like, is a fairly strong relationship.

So then I switched to 2003 data and my results were radically different.

Above Average 9.7 wins
Below Average 6.7 wins

Correlation Coefficent -0.463

B 10% 10.0 wins 7 teams
+/- 10% 10.3 wins 18 teams
A 10% 6.7 wins 7 teams

The reasons were pretty clear when I looked at the data. Four playoff teams exceeded the average, with St. Louis and Baltimore the worst offenders, and half the playoff field resided in the +/- 10% range. But the Rams make a lot of big plays, and Baltimore has a very good defense. That said, "negative yards", taken by itself, is far from a perfect predictor of team performance.

Powered by Blogger
and Blogger Templates