Sunday, April 04, 2010

Are computer models cricket?

Michael Jennings at Samizdata looks at computer models and how reliable they are. Of cricket and climate

It turns out, in cricket, determining whether or not a batter who has been hit by a ball is out depends on whether the ball would have hit the wicket, had the batter not been in the way. Making this call has been up to the judgment of the umpire, but technology keeps improving. Can a computer extrapolate the path of a ball any better than an umpire can?

The "Hawk-eye" system was initially used by television companies, and there was then pressure for it to be used in assisting umpires as well. Basically, this system looks at a number of video replays, and from them constructs a three dimensional model of the ball, the pitch, the bat, etc. From the this model, the path of the ball is extrapolated going forward. Television viewers see a computer graphic image of the ball hitting (or not) a computer graphic image of the stumps, and are told whether the ball would have hit the stumps and whether the batsman was or was not out.

Every since this system has been in place as part of television coverage, there has been pressure for it to be used in umpiring decisions. When people have asked me about this, I have stated my position with unexpected vehemence, particularly given that I am generally in favour of using video replays as part of the adjudication process. For I am, at present, unequivocally opposed to the use of Hawk-Eye and similar decisions in umpiring decisions.

My reason for this is as follows.

Hawk-Eye is a system featuring a lot of complex computer code. The code is proprietary, so what follows is largely reasonably well informed speculation. Although we do know the laws of physics with respect to motion of cricket balls, air resistance, the effect of gravity, bounce when the ball hits the pitch, linear and angular momentum, etc etc etc, the complexity of a even a relatively simple system such as a cricket ball moving in a cricket game is such that it is difficult to impossible to develop a useful model directly from the physics. In addition there are margins of error when triangulating the motion of the ball from video imagery. What this means is that Hawk-Eye's models are not really physical models per se. What they have likely done is a more simple matter of trial and error followed by extrapolation. The ball has been measured going through the air, perhaps half way down the pitch. Various ways of further extrapolating the position of the ball have been tried, and they have been compared with the actual motion of the ball further down the pitch. Trial and error has continued until prediction and reality have become close, and the resulting (statistically derived) algorithm has been used to predict the motion of the ball in cases where the ball has not actually traveled the full distance (because, perhaps the batsman's leg was in the way).

This is all actually fine, except that there are circumstances in which the system can break down. Weather conditions that were never encountered in the development phase might be one. Balls made by different manufacturers might behave differently. Pitches in different places may be made of different kinds of grass. In cricket, local conditions matter a lot, and pitches in different countries are known to have different characteristics and favour different types of bowlers and bowling.

Given that Hawk-Eye is not definitive, another nasty possibility rears its head. Like baseball, cricket is a game greatly suited to gambling. There are a huge number of statistics that followers of the game are interested in, and it is possible to bet on the outcomes of most of them.
Imagine then a situation where statisticians and programmers are running a system of computer code that nobody understands (and which, in fact, they are extremely secretive about the workings of, on the basis that it is "proprietary intellectual property" and a valuable trade secret) which has the ability to overrule umpires decisions. Nobody outside the firm they work for (and indeed few people inside the firm) knows exactly who the people are who run this stuff - certainly not the official administrators of the game. The potential for corruption is obvious.

Far fetched? Well, the best way of running such corruption would be to not be obvious about it. Decisions should still appear approximately right. You do not change the result of matches, but apply a systematic bias of 5% in the direction that improves your profitability. As it happens, I am not a bookmaker, but I am a financial analyst. Give me a 5% systematic bias in the financial markets that I can control, and I will shortly be a very rich man.

Because of the possibility of bias, intentional or otherwise, and the possibility of corruption, he calls for computer models not to be used unless the code and data are open for anyone to examine.

The application to climate research is obvious.

No comments: