Saturday, February 13, 2016

Crunching Numbers: Developing Race and Power Ratings (Part I)



This is Part 1 of a multi-part series. I don't know how many parts... as many as I come up with.

Over the past year I’ve been working on a graded stakes database and a system for rating horses and the strength of individual races. It’s been a bear of a project because it involves a crapload of manual data entry, along with constant/weekly updates since the ratings change as more data is added to the population.  The database includes the top three finishers from every graded stakes race in North America since January of 2013, along with the field size, finish margins, an assortment of speed and pace figures, and other assorted ratings.

Piggybacking on my post from the other day, speed figures, pace figures, and ratings capture different elements of the performance of a horse in a specific race. A speed figure highlights the final time given the relative speed of the track; a pace figure describes the shape of the race, a rating incorporating weight views the performance of the horse in relation to the other horses in the race, and on and on and on. I like to look at all of those factors but wanted to try and come up with a way to represent performances across the entire spectrum of stakes race. Additionally, I wanted a rating that would change over time given the relative strength or weakness of subsequent races. Those goals led me to this database project.

After hard-keying in the top three finishers in every graded stakes races since 2013, I ranked and weighted the performances based on a variety of figure and rating systems.  These ratings take into account essentially three kinds of rating systems: pure speed figures (no adjustment for pace), pace adjusted speed figures, and a European-style rating based expressed in weight relative to that of his rivals. A horse will rate highest on the overall power scale if it a) runs a fast race with b) a strong pace scenario while c) carrying more weight relative to other horses and in which d) the other horses in the race also run well. I set up the formula in which final time (pure speed) carries the most weight, with pace a strong second and weight/competition, ironically, weighted third.

From the data, I produced three ratings: a Horse Rating, a Race Rating, and what I call a Harper Rating (short for Horse/Race Power Rating… or HRPR… or Harper).

  • The Horse Rating simply rates the individual performance of the horse in a race, irregardless of the strength of the horses that finished second and third.
  • The Race Rating is based on the strength of the top three finishers in a race. 
  •  The Harper Rating is simply an adjustment of the Horse Rating based on the Race Rating.

My expectation prior to calculating ratings was for races in the Classic division to rate the highest, with juvenile races at the bottom (particularly juvenile fillies). That seems logical – the older handicap horses, along with the older sprinters and milers are, generally speaking, the best of the population. Fillies and 3yo colts fall somewhere into that mix, again generally speaking, with the juveniles at the bottom.  And again, the individual ratings are all relative to each other, meaning they will change slightly as more data is added into the mix. The top Harper Rating today may only rank 4th or 10th in the future if other top efforts exceed that performance. So there is a constant updating of the results each week in order to compare the horses and results against each other.

Overall, what I’m looking for is not a tool that will tell me that American Pharoah is great, or Curlin, or any of the other top horses in the sport. I think that’s relatively easy to point out – we don’t need elaborate data sets to confirm horses at the top of the heap. But what is helpful, at least to me, is identifying strong performances by second and third place horses which can slip under the radar. Or a Grade 3 race which turns out to be exceptionally strong. Maybe a race is exceptionally strong due to the fact that the top three all ran excellent races and the fourth place finisher is returning in a softer spot. Or perhaps we can identify a weakness in a current crop of 3yo colts or fillies. It’s the grey areas that I’m hoping to find some additional light.*

*I produced a first set of ratings prior to last fall's Breeders' Cup and found them helpful in certain situations. On average, the dirt ratings did a good job of identifying the top contenders and actually led me to Wavell Avenue for the Filly & Mare Sprint. However, the turf ratings were pretty useless. So for the time being, I'm just focusing on these ratings for dirt races.

Okay, so the numbers are all crunched and it’s time to see if (first) things make sense.  The first thing I did was calculate the average Power Rating for ten divisions: 2yo Colts, 2yo Fillies, 3yo Colts, 3yo Fillies, Classic, Dirt Mile, Dirt Sprint, Dirt Sprint Fillies, Distaff, and Marathon. Why did I include a Marathon division, you might be asking? Basically to get those extremely rare outliers out of the data; I don’t want a G3 event at 1 ¾ miles in the Classic mix cause the race is essentially a freak of racing. So I simply carved out a different category for it. Anyway, here are the average Power Ratings for each division:

Median Harper Rating (ALL horses): 48.41
Average Harper Rating (ALL horses: 61.46

Classic
117.43
Dirt Mile
99.82
Dirt Sprint
99.37
3yo Colts
57.48
Distaff
52.47
Dirt Sprint Fillies
44.62
Marathon
37.15
3yo Fillies
25.86
2yo Colts
21.57
2yo Fillies
9.86

This distribution is almost exactly what I was hoping to see: Classic races at the top, juvenile fillies at the bottom with a clear distribution of Classic, Dirt Mile and Dirt Sprint at the top with a muddled middle of 3yo Colts and Distaffers. About the only thing that bothers me is the 3yo Colts rating as high as they did until I realized that a late summer/early fall race limited to 3yo Colts (like the Pennsylvania Derby, etc.) could produce serious Classic contenders (like Bayern a couple of years ago; in fact, Bayern’s Penn Derby was the second highest Power Rating in the sample, after his BC Classic victory). As a result, those late season races are probably boosting the strength of the division. At least that’s my guess. I don’t add or subtract any weight based on division or age or sex, this is simply a consolidation of the figures and ratings. 

Okay, that’s it for the first part. In Part 2 I’ll slice and dice some of the individual divisions and performances that rated the highest (and lowest) over the last three-plus years. Plus I’ll take a look at the Kentucky Derby prep races from 2013, 2014 and 2015.

No comments:

Post a Comment