Better NFL Ratings
I've been a little tough on the guys over at
Football Outsiders for some questionable comments they've made. But it's time to acknowledge that they are doing some very good work over there.
A big part of the Football Outsiders website are the
DVOA ratings derived by Aaron from an analysis of every single play of every single game. When I looked at them last year, the ratings did not seem very good. For example, when I looked at how he'd rated the teams as of Week 6 and how those teams had done since, I found that his top 16 teams combined for a record of 84-97 after that week. By contrast, my top 16 teams as of that week combined for a record of 102-82 after Week 6. It seemed to me that if you were going to claim you had a good rating system, it should at least be able to separate good teams from bad teams. (Correction: All the records shown are after Week 5 of last year's NFL season, not Week 6).
Aaron and I debated this point on his website last year at some length. He stated that the purpose of his ratings was not to predict the future but to analyze the past, but of course that is begging the question: Have you really analyzed the past well when your ratings don't have any relevance to the future?
I emphasized at the time that I thought Aaron's ratings would eventually be better than mine. All I do is enter the scores of the games into a spreadsheet, then translate those scores into an assumed score against an average opponent on a neutral field. My thought was that there might be something in the scores that was not always reflected in the Won/Lost records, and indeed, I have found that is true. So it seemed logical to me that Aaron's analysis of play-by-play data would find things that were not always reflected in the score of the game.
Well, I am pleased to say that either Aaron has made some improvements to his DVOA ratings, or he has gotten very lucky indeed, because his DVOA ratings this year are better than my Power Ratings, and by a significant enough margin that I can recommend them as the best way I've seen to rate the teams.
For starters, let's establish a baseline. The simplest method of rating the teams is by won/lost record. Teams are of course rated this way every day in the standings published in the newspaper. So if you wanted to rate teams, the simplest thing would be to say that New England, Pittsburgh and Philadelphia are the best teams in the league, and that San Francisco and Miami are the worst. And you wouldn't be far wrong, obviously.
So it seems to me that any method of rating the teams should at least do as well as simply using the won/lost record. The way of checking how good the rating is involves a statistical method known as correlation. Correlation is just a way of checking how closely two sets of numbers fit each other. For example, if you had one set of the numbers 1,2,3,4, and another set of the numbers 2,3,4,5, it should be obvious that there is a high degree correlation between the two sets. In fact, the correlation would be 100%, because the second set can be produced by adding 1 to each of the numbers in the first set.
What I did was look at the correlation between the won/lost percentage of each team as of Week 3 and their Won/Lost percentage in the games since then. For example, Philadelphia had one of the best Won/Lost Percentages (1.000) in the league at 3-0 after the third week, and they have gone 9-1 (.900) since then, for one of the best Won/Lost Percentages after Week 3. Obviously there is a great deal of correlation in looking just at Philly. But Seattle was also 3-0 (1.000 and they have gone 4-6 (.400 since) so the correlation isn't always high. Looking at all the records as of Week 3, the correlation between a team's record then and their record since is 33%. I then did the same calculation for the next four weeks as well:
Week 3 : 33%
Week 4 : 33%
Week 5 : 32%
Week 6 : 27%
Week 7 : 32%
Pretty consistent there. So that's the benchmark. How does my system do?
Week 3 : 40%
Week 4 : 41%
Week 5 : 33%
Week 6 : 26%
Week 7 : 33%
Better, except for that Week 6 blip. Must have been some funky games. Now, there are two opposing tendencies in the ratings. Ratings in later weeks should be more accurate than earlier ratings because they are based on more games. However, there have also been fewer games since then, so there is more room for random results to throw off the correlation.
But when I checked Aaron's DVOA ratings, the correlations were higher:
Week 3 : 47%
Week 4 : 46%
Week 5 : 43%
Week 6 : 45%
Week 7 : 47%
That's not to say my Power Ratings are useless. Indeed, because they are denominated in points, they have more immediate utility to people looking to place a bet on a game or fill in the office pool. But I gave Aaron a hard time about the correlation between his ratings and future won/lost percentages a year ago; it seems only fair to note now that he has improved his record markedly.