Tag Archive for: Moneyball

Racing and Data Analytics

James Knight, head of racing at Coral, last week put out three tweets that pretty much summed up where our sport is at in its relationship with data and analytics, writes Tony Keenan.

I’m biased of course but couldn’t agree more with the sentiment that racing is the best of betting sports; it has a complexity that few, if any, other sports can match and this is one of its most appealing factors. This complexity lends itself to the creation of data from the nuts and bolts like ground, distance and form to deeper factors like breeding, times and run styles; the list really is endless. But racing, despite some progress lately, doesn’t exploit this extensive data to its full potential.

Much of this is cultural and I mean that not only within racing itself but with a broader Irish, British and European approach to engaging with sport. On this side of the Atlantic, statistics and numbers are not ingrained into the psyche of the sports fan as they are in America. This is changing, however. Take the company Football Radar for example – you can watch a clip introducing their methods here (https://www.youtube.com/watch?v=Y2ee1GoQdeI) – and you see what can be done with the analysis of soccer.

The Americans do it on a whole different level of course with data-heavy websites like Football Outsiders and Baseball Prospectus, though if you want to read a more palatable version of the numbers then Grantland is the place to go where writers like Bill Barnwell, Jonah Keri and Zach Lowe synthesise the vast array of statistics into cogent and well-written arguments. It’s all very mainstream in the States but it boils down to one thing; these numbers help explain why things happen and how sport works so this is something we should want for racing. And lest we forget, they have extensive betting utility too!

It’s important to differentiate between old and new data. By old data I mean the fundamentals that make up racing from age to weight carried to trainers. These basic details have been around forever but that’s not to say you can’t garner new insights from them; the book and film ‘Moneyball’, a prime example of sports analytics reaching the masses, shows this as Billy Beane/Brad Pitt exploits the perception that batting average was more valuable than on-base percentage in baseball.

We’re getting better at interpreting these old numbers in racing too and we now have access to the tools to do so; databases like Horse Race Base, and of course Geegeez mean we can put our own filters on the data and find betting angles that were hitherto hard to calculate. We’ve learned that some numbers are better than others and by better I mean have more predictive value; pure strikerate is a fair indicator of success or otherwise but figures like impact value, actual over expected and percentage of rivals beaten give a truer insight.

But it’s the new data that really interests me. Again, the Americans have led the way. Football Outsiders, the doyens of NFL analysis, use volunteers to chart the minutiae of each play and you can now see data on all the moving pieces of on-field actions, including the once-anonymous offensive lineman and cornerbacks, not just the skill positions like quarterback and wide receiver. Baseball is arguably even more advanced where each major league stadium has installed a PITCHf/x system which charts the trajectory and speed of every pitch thrown in the game. I have even read articles lately where they now have the technology to tell how much spin each pitch has and these are balls moving at upwards of 90 miles per hour.

Racing too has many areas where new data can be introduced, and chief among them has to be sectional timing. I have to admit to being a devotee of sectionals and an admirer of Simon Rowlands and his team at Timeform who have done so much in terms of education with the subject and in building a database of times for racing in the UK. I do some sectional timing of my own and they certainly have betting application with pace being so important in the outcome of a race.

Establishing sectional times at every track in Britain and Ireland would obviously be expensive but I would be surprised if it doesn’t come around eventually; in the interim racecourses need to get on board with people doing their own times and play ball in terms of getting the race distances right and advising of any changes as well as making furlong markers visible. The same applies to TV stations who can provide on-screen clocks and suitable camera angles that aid taking sectionals. Taking these figures can be a little laborious, especially when camera angles make things difficult, and I look forward to a day when the data is provided and I only have to interpret it.

An extension of sectional times is the use of GPS in tracking the exact movement of horses within a race as each animal carries a chip to relay back information about its race position. We have only really seen this used in Dubai (where there is obviously an unlimited pot of money to spend on racing) and at the Breeders’ Cup, with the American company Trakus charting the specific breakdown of how each race went, but the numbers are fascinating. Not only does this provide us with the times for each horse but it also reveals the cost in distance of racing wide, an-0ften underrated aspect of race analysis over here. Simple physics suggests that the shortest distance between two points is a straight line but we have no way of quantifying the cost of racing away from the rail in Ireland and Britain.

Horse weights are used extensively in Hong Kong, a jurisdiction that many believe is the ideal in terms of racing run for the betting public. Whereas installing sectional timing and/or GPS tracking systems at every track in Britain and Ireland would be costly, the weighing of horses would not. The scales are relatively inexpensive, costing between €3-5,000 each, and it’s not as if horses aren’t used to them with many trainers using them at home. Knight mentioned integrity in his tweets and the weighing of horses would be massive tool in the policing of the sport as the best way to stop a horse is not to give it a ride where it can’t win but rather to leave it half-fit for the race.

The obvious plus to the latter option for the dishonest trainer is that there is no way of proving it with the current system. Were the weighing of horses to become widespread, this would lead into a sort of big data around the published numbers; we could compare animals not just against themselves but also against others and over the years could get a sense of optimum racing weights and what sort of figures suggests a horse is not fit or even too fit and ready to go off the boil.

As I mentioned earlier, there are some aspects of American sports where charters note down the data on each and every play, working within a common framework that standardises the numbers; Football Outsiders do this and the volunteers get access to the information while others have to pay for it. This could certainly apply to racing though perhaps in different areas on the flat and over jumps. On the level, charters could look at the keenness of horses within races. As things stand, we can read in-running lines that say a horse ‘raced keenly’ but there are degrees with this and perhaps a one-two-three scale would be better, with one being not perfectly settled, two taking a right grip, and three pulling the jockey’s arms out thus giving itself no chance.

When this data is compiled, it could be placed alongside other information and provide insights. We would know which trainers’ horses are more keen than others (and which can win being keen and which can’t) and what jockeys are best are settling their mounts. We could find that certain tracks or races run at slower paces produce more keenness or even that how horses race is random. Backers of Golden Horn on his next start would certainly be keen to know this after his hard-pulling effort in the Juddmonte International; what are the chances he does the same next time?

This could also apply over obstacles with a horse’s jumping ability graded one-two-three at each hurdle or fence. Again, we would find out which trainer’s horses jump best and whether bad jumping is repeated from one start to the next; we all have our own ideas on this but it would be better to put a number on it. It could also answer some difficult questions like was Zaarito, who fell three times in 2010 with races at his mercy, one of the unluckiest chasers in recent memory or simply a terrible jumper?

With all this data, there will be things that people get completely wrong, numbers that we use that really have little value. But these blind alleys don’t matter in the big picture as mistakes help push racing analytics on. Big data is here to stay in sport and as fans who have become accustomed to seeing it in other sports, many of us want it in racing too. Let’s hope we don’t get left behind: there is no reason why we should with the amount of technical angles we could exploit.

- Tony Keenan

You can connect with Tony on twitter at @RacingTrends