domenica 2 dicembre 2012

Betting on horses and the resurrection of Jesus (I)

A friend of yours, a real fanatic of horse races, has convinced you to follow him at the racecourse on a racing day. While you are waiting for the beginning of the races, you decide that, after all, a small betting would be fine. So you read the program of the first race, and the name of a horse comes to your attention: Soldatino!

You ask your friend if Soldatino is a good horse, but the answer is negative: in 105 races in which it participated, it "showed" (i.e., it arrived within the first three places) in just 7, that is, for each time it showed it did not for 14 times! Well, not exactly the favourite of the race...

Seeing your disappointment, your friend tries to get you a bit on the moral: "During the night it rained a lot, so the ground is very heavy." Noting your quizzical expression, your friend tells you that in 70% of the races in which Soldatino showed, the ground was heavy, and in those races in which it came in fourth or worse, the ground was heavy in 10% of cases.

At this point, confusion is painted on your face: you need to know what is the probability that Soldatino shows in the races on heavy ground, but what you know is how often it shows in all races (7 out of 105), how likely the ground is heavy when it shows (70%) and when it does not (10%). How to get the information that you need from this?

You know that Soldatino showed 7 times and out of these 70% were on heavy ground  and that it did not show for 98 times (105-7), and that out of these 10% were on heavy ground. In order to know how likely Soldatino shows on heavy ground you need calculate the ratio between these two entities, i.e. multiply the ratio "showed": "not showed" (7:98, or odds of showing) by the ratio between 70% and 10%, i.e. 0.70 / 0.10, which is 7. And 7:98 multiplied by 7 gives 49:98, i.e. 1:2, that is on heavy ground Soldatino wins a race every three, not bad!

Delighted by this discovery, you rush to put a bet for Soldatino showing. It is a pity that even the bookies know how to do the maths, and so Soldatino showing is put at twice the ante, and so not convenient at all!

You may not get rich, but you've just discovered a fundamental theorem of probability, Bayes' theorem. In this formulation, the theorem says that the odds of Soldatino showing on heavy ground, i.e. the ratio between the number of times it arrives within third place and the number of races in which it ranks from fourth place down - let's call it O (showing | heavy) - is equal to the odds of showing on any ground - let's similarly call it O (showing) - multiplied by the ratio of the probability that the soil is heavy in the event of showing - P (heavy | showing) - and the ratio that the ground is heavy when Soldatino does not show - P (heavy | not-showing) - (this ratio is called the "likelihood ratio"). In summary:
O (showing | heavy ) = O (showing) * P(heavy | showing) / P(heavy | not showing.)
or, in our case::
O (showing | heavy ) = 7:98 *0.70 / 0.10 = 7:98 * 7 = 49:98 = 1:2
This formulation of Bayes' theorem can be generalized to the case where we are interested in, rather than Soldatino showing, a general hypothesis 'H ' and how it changes its odds when we know that the event 'E' occurred:
O (H | E) = O (H) * P (E | H) / P (E | ¬ H)
where O (H) are the odds of the hypothesis 'H', before taking event 'E' into account as evidence, P (E | H) is the probability to observe event 'E' when the hypothesis 'H' is true, P (E | ¬ H) the probability to observe event 'E' when the hypothesis 'H' is false, and O (H | E) are the odds of hypothesis 'H' after taking event 'E' into account as evidence.

But what does this story have in common with the resurrection of Jesus? Something indeed...

Ian Pollock, «Odds again: Bayes made usable», Rationally Speaking, November 29th, 2012. The photo is «Horse racing», by Paolo Camera.
The article was slightly reworded on December 5th, to take into account some clarifications by Mr. Pollock.

3 commenti:

  1. Nice article! I believe I'll start following your blog.

    I do want to mention one possible source of confusion. The notion of a prior and posterior conjures up an image of temporal relation, but this is best thought of as "prior to taking evidence into account" and "posterior to (after) taking evidence into account" - there is no necessary temporal relation between the events you condition on, themselves.

    For example, you might get your prior odds for a Soldatino win from watching his races in 2012, but get your evidence ratio P(E|H)/P(E|!H) from a newspaper article published in 2007.

    Also, one must be careful about the terms "a priori" and "a posteriori" - they carry a lot of philosophical baggage. In Kantian terms, essentially all probabilities (priors and posteriors) represent what philosophers would call a posteriori states of knowledge.

    1. I realize that the article could have been more clear and I am going to emend the article as soon as possible.

      Thank you very much for your clarifications!

    2. Oh, this is pretty minor stuff! The article gets the point across very well.