Wednesday, August 31, 2016

Racing Industry still out in left field with regards to data. In other news: the earth is round.



In an article over at the Paulick Report today, Jason Wilson, the President and COO of Equibase, answered a series of questions regarding the racing data and distribution company -  its history and where it’s heading in the future. One of the questions posed to Mr. Wilson involved the high price of data within the horse racing industry and the comparison to the very large amount of free data generated by the major sports in America. Mr. Wilson’s response, while noble in effort, completely missed the mark and, in several instances, was an inaccurate portrait of the current data landscape in American sports. It’s a very good and interesting interview and, if you haven’t read it, I urge you to click on over to the Paulick Report and take it in.

To be clear up front, I find the quantity, format and price of historical horse racing data to be not only completely out of line, but also a great hindrance to growing the sport itself. Additionally, the mere fact that the sport of horse racing depends on gambling dollars yet charges large sums to download basic information is completely backwards and counterproductive. I wrote a piece roughly a year ago regarding this issue in which I brought up a couple of the points discussed in today’s article.

So let’s get right down to brass tacks: here’s the key question from the interview in which racing data, and the value therein, is discussed.

Is it fair to compare horse racing data to data from other sports (baseball, football) and ask why racing charges for the same type of data that is widely available at no cost to sports fans? (For example, Equibase charges $8 for one horse's lifetime past performances, while a baseball or football player's lifetime statistics are available at numerous sites at no charge.)
And here are a few extracts from of Mr. Wilson’s answer and my thoughts:
"Generally speaking, however, when you look at the information that is provided at the websites of other sports, it is the type of information used to provide more engagement for fans. It is not the type of in-depth information that serious Vegas gamblers or fantasy sports players would use to make decisions about their wagers or lineups."
There are so many things wrong with this answer that it’s tough to know where to start, so I’ll just go in random order:

1. The Founding Father of Sabermetrics, Bill James, built his analysis and stats off of the basic data. He took the historical, raw data and used it to calculate Runs Created, Range Factor, Win Shares, etc. and all were built off of the basic box score statistics. Anyone can download baseball statistics from Baseball Reference and calculate his advanced stats.

2. Serious Vegas gamblers don’t use raw data? Did I read that right cause that's essentially what Mr. Wilson is saying. If you’re a serious Vegas gambler betting on baseball, do you think you are buying someones proprietary data or are you coming up with your own numbers to separate yourself from the crowd? That’s just... I can't find a good word to describe the lack of awareness.

If that’s the mindset the industry has – that the serious gamblers are the ones buying other peoples stats, instead of coming up with their own creations – then, well, I guess I shouldn’t be surprised.

3.”More to provide engagement for fans…not used to make decisions about their wagers or lineups.” Oh for the love of... whatever.

When Mr. Wilson goes to FanGraphs, and he views stats like WAR (“Wins Above Replacement”) and xFIP (“Expected Fielding Independent Pitching”), does he think no one is setting their fantasy lineup or drafts off of those numbers? And, by the way, the formula for those stats are provided on the website and any fan could calculate them using raw data downloaded for free. Or they could download those stats for free to EXCEL right from FanGraphs, and not a pdf file.

Speaking of pdf files, let's move on...
“Similarly, in recent years Equibase has greatly increased the amount of data available at equibase.com through its leaders lists and profile pages. You can view comprehensive annual and career statistics on horses, jockeys, trainers and owners — including every results chart for every race that has run since 1991.
We charge when we package this data in products for handicapping purposes. These products include proprietary speed and pace figures as well as information about how these athletes have performed in relation to the competitive environment of that particular race.” (Empasis added)
This answer just goes from bad to worse in a matter of seconds.

1. The data that’s FREE on the leaders and profile pages is, again, typically in an almost unusable format and, quite frankly, generally useless for actual, hard-core handicapping purposes.

2. Mr. Wilson conveniently leaves out the fact that results chart for every race are housed in secured PDF files. So if you don’t mind opening up each file and manually hard-keying in data to your spreadsheet, then it’s the bees knees. Otherwise, yeah, not even close to what’s provided by other sports.

3. And finally, Mr. Wilson follows up his misleading statement on results charts with the crème de la crème of the whole interview: “We charge when we package this data in products for handicapping purposes ... [which] include proprietary pace and speed figures.” That is patently false.

Equibase, and everyone else, charges fees (and very high ones at that) even if all you want to download is result chart in an Excel-friendly format. Result chart data does not contain proprietary pace and speed figures. Let me repeat that: Result charts do not contain proprietary pace and speed figures. At all. End of story. So there's no reason to charge for that data.

Result charts are the box score of the horse racing industry. If the racing industry can’t figure out that basic fact (which they clearly can’t or chose not to acknowledge), then we’re not even speaking the same language.

What’s it all mean?

I’ll give Mr. Wilson credit for stepping in the box and taking his cuts on a difficult question but, at the end of the day, he whiffed like Dan Wilson against David Cone. [Former Seattle Mariner Dan Wilson faced David Cone 30 times in his career, striking out 13 times for an absurd 43% failure rate. Yep, pulled that up in about 30 seconds on Baseball Reference. And I didn't pay a thing. I'm sure no fantasy players set their lineups by how batters have fared historically against today's starting pitcher. Nah, now we're just talking crazy!]

Look, I would have had more respect for Mr. Wilson if he just would state what we all know to be the truth:

“We don’t give away the raw results data (or even make it affordable) because then anybody could create their own speed and pace figures, and that would compete with our products.”

But, hey, keep on charging for what other sports offer for free. I'm sure any day now we'll see horse racing's version of Bill James or Nate Silver come along to offer new and innovative ways to look at horse racing which they were able to create after shelling out thousands of dollars for essentially box scores. But I wouldn't bet on it.

Follow up note: The post I wrote last August was in response to the Equibase-STATS partnership, which just recently released their new Race Lens product. (I even linked to it, guys. Happy?) Which was exactly what I thought was going to happen. Hey STATS, here's our data! Come up with another product that we can sell! It's just another, more expensive PP. Been there, done that.

1 comment:

  1. I agree 100% with this article however remember that STATS, Inc. came directly out of Bill James writing that because the Elias Sports Bureau would not share their statistics there should be a grassroots effort made to compile major league boxscores and make them available to the public. His plea gave birth to Project Scoresheet which gave birth to STATS and made websites like BaseballReference.com possible.

    Maybe someone should organize an effort to compile all of DRF's PPs and eventually make them available in a database.

    ReplyDelete