Can Statistics and Law ever learn to get along?

R v Adams

In 1996 a jury heard forensic testimony that a ‘match’ had been found between Denis Adams’ DNA and a sample found at the scene of a crime where a woman had reported being assaulted and raped. The probability of this match occurring by chance was described by the forensic expert as being ‘1 in 200 million’. In Adams’ defence his lawyers related to the jury that the victim herself in fact failed to pick the defendant out of a line up and even explicitly stated that Adams did not match her description of her assailant. Further, Adams girlfriend testified that he was with her on the night the incident took place. Despite this, the jury convicted Adams of the crime. The defence, convinced that the jury had overly weighted the DNA evidence in their deliberations, immediately launched an appeal. Unsure of the correct way to combine the three pieces of evidence, they recruited a statistical expert, Peter Donnelly (Donnelly, 2005) to undertake a ‘Bayesian’ analysis of the case.

It was resolved by all parties involved that the statistical calculations must be undertaken by the jurors themselves. Donnelly, in combination with the statistical experts from the prosecution, devised a questionnaire to encourage the jurors to quantify the various pieces of evidence presented in the case. For example, in regards to the failure of the victim to identify Adams, the jurors were asked to provide numbers to the questions ‘If he were the attacker, what’s the chance that the victim would say her attacker didn’t look anything like him?” and “If he wasn’t her attacker, what’s the chance she would say this?”. Once the jurors had given numerical estimates for the value of each piece of evidence, they were then guided in how to combine these using Bayesian techniques to arrive at a figure representing the value of all three pieces of evidence combined.

However, the attempt to guide the jurors and the judge through this process was described by Donnelly as rife with misunderstanding, mishaps and general difficulty, some of which Donnelly elucidates:

‘The episode had some amusing sidelines. It was suggested that it would be helpful to supply the jury (and judge) with basic calculators. Although the total cost was well under £100, this request was so unusual that it seemed to require clearance personally from the Lord Chancellor. Then, during my evidence, we walked the jury through a numerical example—the barrister would suggest token numbers in answer to the questions, and the jury and I entered them in the calculators which were eventually supplied. They seemed to have no difficulty in following this, but at an early stage in the calculation, when I said something to the effect that: “Your calculator should now show the value 31.6,” and the jurors all nodded, the judge rather plaintively said: “But mine shows zero.”‘ Donnelly (2007)

The appeal was eventually rejected, with the appeal judge scathing of the statistical approach used. As a result of his experiences, Donnelly remains unconvinced that such an approach is a feasible future for the presentation of Bayesian reasoning in legal cases.

But can there be a future for statistics in the court room? Is there another way? And what even is Bayesian inference anyway?

 

‘What even is Bayesian Inference anyway?’

Bayesian inference is the mathematically-accurate method of updating a ‘prior’ probabilistic belief in a hypothesis (such as Adams being the attacker) in the light of new evidence (such as the DNA evidence, the alibi, and the line-up identification failure) to arrive at a ‘posterior’, or updated belief level in that hypothesis.

It might be clear that this general concept, of updating one’s beliefs in ‘something’ in the light of new information, is hardly one restricted to the court room, and indeed some believe this fundamental belief-updating process, and therefore Bayesian inference, is central to almost all human endeavours (McGrayne, 2011; Link, 2009; Gelman et al. 2014).

Bayes’ formula for undertaking this inference was published over 250 years ago (Bayes & Price, 1763). A picture of Reverend Bayes next to his famous formula can be seen below, however I don’t want to get bogged down in the algebra – there are many (many) thorough explanations of it elsewhere. Suffice to say at this point that what you get out of the formula (highlighted red below and known as the ‘posterior’) is the updated belief level, and to calculate that you combine the prior (green: the old belief level) with the new information / evidence (blue). Hopefully that makes some intuitive sense.


Bayes’ theorem has been extensively validated and is no longer in any doubt as the correct approach in probability-updating situations amongst the statistical community: as Fenton, Neil and Hsu (2014) stated:

‘The application of Bayes’ theorem to probabilities is akin to the application of addition or multiplication to numbers: probabilities are either correctly combined by this rule, or they are combined incorrectly by other means.’ Fenton, Neil & Hsu (2014)

So, if the numbers going into the formula are correct, or correspond to reality, then the number coming out will also be correct. But here of course, lies almost all of the contention: the conversion of non-quantified beliefs (Adams’ guilt; your chance of catching a bus; a patient’s probability of having a given disease; how much your friend likes you; a football team’s chance of winning a match) into the quantified ones the formula requires. Nowhere is this conversion currently more contentious than in the legal realm. However there are work-arounds: one can calculate probability ‘distributions’, for example, taking into account multiple feasible valuations of each piece of evidence (e.g. those most in favour of the prosecution and those most in favour of the defence). For example, while there is no access to the original figures calculated by the jurors in the Adams trial, a Bayesian post-analysis of the case by Dawid (2002) suggested that the probability distribution of guilt taking into account the three pieces of evidence may be as low as 0.36 or as high as 0.98. He believed this analysis demonstrated that there was room for ‘reasonable doubt’. Perhaps this also demonstrates that the techniques can be informative to trials like this.

 

‘Trial by Mathematics’

There are many opponents to the use of Bayesian inference in court cases, and many of them point to a now-classic paper by Professor Laurence Tribe (1971), entitled ‘Trial by Mathematics: Precision and Ritual in the Legal Process’ in Harvard Law Review. Tribe begins the paper with an implicit comparison of modern attempts to ‘mathematize’ the legal process with those from the middle ages:

‘The system of legal proof that replaced trial by battle in Continental Europe during the Middle Ages reflected a starkly numerical jurisprudence. The law typically specified how many uncontradicted witnesses were required to establish various categories of propositions, and defined precisely how many witnesses of a particular class or gender were needed to cancel the testimony of a single witness of a more elevated order. So it was that medieval law, nurtured by the abstractions of scholasticism, sought in mathematical precision an escape from the perils of irrational and subjective judgment.” Tribe, 1971

Tribe’s implied point here is: this was tried before, and it is as bad an idea now as it was back then. One of Tribe’s main functional arguments for this (apart from some compelling moral arguments), is that statistical evidence will be far more salient, or attractive to the jury, than the non-mathematical evidence that they will always be asked to combine it with, because statistical evidence exudes an “aura of precision”. Tribe argued throughout his paper against an article in the very same journal issue by two authors named Finkelstein and Fairley (1971), who were proposing the use of Bayesian inference in legal trials for the first time (presumably the journal had approached Tribe for his views prior to publication instead of this being some wonderful coincidence). Finkelstein and Fairley were proposing a system somewhat similar to that employed in R v Adams above, where the jurors convert their beliefs in numerical values. Tribe makes the point that:

Even assuming with Finkelstein and Fairley that the accuracy of trial outcomes could be somewhat enhanced if all crucial variables could be quantified precisely and analyzed with the aid of Bayes’ Theorem, it simply does not follow that trial accuracy will be enhanced if some of the important variables are quantified and subjected to Bayesian analysis, leaving the softer ones – those to which meaningful numbers are hardest to attach – in an impressionistic limbo. On the contrary, the excessive weight that will thereby be given to those factors that can most easily be treated mathematically indicated that, on balance, more mistakes may well be made with partial quantification than with no quantification.” Tribe, 1971.

I hold some sympathy for Tribe’s views that the legal process might be better if mathematics were kept out of it entirely, particularly at the time it was written. However, unfortunately for Tribe’s modern proponents, while they stand with their shoulders at the door of the courtroom, using Tribe’s arguments to keep the statisticians out, they have forgotten to look behind them. And behind them the real nightmare that Tribe hoped never to see has crept up over the last three decades: legal professionals with no statistical training misusing mathematical evidence, overweighting it, and leading to numerous miscarriages of justice (e.g. Forrest, 2003; Mehlum, 2009; Donnelly, 2007)

The problem grew largely due to the rise of the use of DNA evidence in court cases from the early 1990s. When forensic teams report a DNA match in court and they want to get across the importance of the evidence, they generally report the probability of a random person from the population matching the sample found at the crime scene (the ‘random match probability’, as it’s known). The most notorious error in legal practise, single-handedly responsible for a swathe of miscarriages of justice is intimately entwined with this figure and is known as the Prosecutor’s Fallacy. To demonstrate, imagine you are a juror on a murder trial and you are told there is only one piece of evidence against the defendant: he matches a DNA sample found on the victim’s body which could only have been left by the murderer. You are then told the chance of this match occurring by chance (the RMP) is extremely low: only 1 in 2 million people have this DNA type. Now, what do you think is the chance that this person is innocent? If the answer that pops into your head is also ‘1 in 2 million’ I’m afraid you’ve just committed the fallacy. Why is this a fallacy? Well imagine the murder happened in London (with a population of about 8 million) and we determine that anyone in London could feasibly have done it in the time frame. How many matches for this DNA sample would we expect? Four. We already know the defendant matches, so he is another 1 – so our best estimate of how many matches there are in London is now 5. Since we have no other evidence against the defendant, the best we can say is that he is one of these 5 people, one of which must have committed the crime, so he has a 1/5 chance of being the assailant, or a 4/5 chance of innocence.


Now 4 / 5 is a very big difference to 1 / 2million. The mistake in reasoning here is to ignore the population size, or, the ‘prior’ chance of guilt (which the population gives us). Before we did the DNA match, what was the defendant’s chance of guilt? 1 in 4 million – he had no more chance of being guilty than anyone else in London. So when we combine this with the DNA figure of 1 in 2 million we end up with 4 matches. But what if we were talking about an island with only 50 people on it? How many matches would we expect here? Not even 1 (0.000025 in fact). So if we were talking about this island and the DNA match occurred it would be extremely likely he had committed the crime: the bigger the population, the smaller the prior and the less convincing the DNA evidence. As we saw above with Bayes’ formula, the new, ‘posterior’ belief level has to be a combination of both the prior and the new information (the DNA match) – the mistake in the Prosecutor’s Fallacy is to entirely ignore the prior, or the population size, and focus entirely on the new information.

In this simplistic example we didn’t include any other evidence, and while there are cases where the prosecution rested entirely on DNA evidence (the Adams case we began with for example), it is not often the case. However, unfortunately, exactly as Tribe predicted, cases such as R v Adams as well as a swathe of research studies have now shown that people vastly overweight DNA evidence presented as a ‘random match probability’, typically due to the prosecutor’s fallacy (e.g. Thompson & Schuman, 1987; Koehler Chia & Lindsey, 1995; Schklar & Diamond, 1999; Kaye et al, 2007).

 

‘Is there another way?’

Fenton, Neil and Hsu (2014) see trials like R v Adams as proof that any legal process which includes jurors or lawyers attempting to calculate the statistics behind a trial as doomed to failure. A recent experiment I conducted with them confirmed this view: we presented 100 general-population participants with a Bayesian legal problem with only a single piece of ‘match’ evidence, including all the variables (such as the possibility for error during forensic testing) they would have to take into account to accurately calculate the value of the evidence. Not a single person was able to get the correct result – and most trials include more than 1 piece of evidence.

So what other option is there? Fenton and Neil (2011) argued that if the jurors and the lawyers can’t do the maths themselves, then they are just going to have to trust validated computer programs to do it for them. While it would always be up to people to determine the numbers that go into the formula, once that is done, the computer program, running validated, mathematically-factual algorithms, should be trusted to produce the correct output. This, they argue, is comparable to the way we trust calculators to undertake large multiplications or divisions. What they argue for, in short, is a greater role for statistics in law.

While this approach may appear very much the polar opposite of Tribe’s more conservative views of keeping statistics entirely out of the court room, and we can’t be sure of his views on this, I think it is actually much more in the spirit of his classic rebuttal – his major fear was the misuse of statistics, particularly through overweighting, and that is exactly what is happening now. We are living in a half-way house, with untrained legal professionals (occasionally but not systematically with professional assistance) presenting statistics to untrained jurors, and expecting them to understand the calculations. This is, not the best, but the worst of both worlds. And perhaps unfortunately, we can no longer return to Tribe’s non-mathematical utopia. DNA is here to stay, and with it, comes the random match probability. The only way out of the mess, it seems, is forward, not backward.

References

Bayesian Inference. William Link (2009). Elsevier Ltd.

Bertsch McGrayne (2011). Yale University Press.

Dawid, A. P. (2002). Bayes’s theorem and weighing evidence by juries. In Bayes’s Theorem: Proceedings of the British Academy. R. Swinburne. Oxford, Oxford University Press. 113: 71-90.

Donnelly, P. (2007). Appealing statistics. Medicine, Science, and the Law, 47, 14–17. doi:10.1258/rsmmsl.47.1.14

Fenton, N., & Neil, M. (2011). Avoiding Probabilistic Reasoning Fallacies in Legal Practice using Bayesian Networks, (June), 1–44.

Fenton, N., Neil, M., & Hsu, A. (2014). Calculating and understanding the value of any type of match evidence when there are potential testing errors. Artificial Intelligence and Law, 22(September), 1–28. doi:10.1007/s10506-013-9147-x

Finkelstein & Fairley (1971). A Bayesian Approach to Identification Evidence, 83 Harvard Law Review, 489

Forrest, a R. (2003). Sally Clark–a lesson for us all. Science & Justice: Journal of the Forensic Science Society, 43, 63–64. doi:10.1016/S1355-0306(03)71744-4

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian data analysis (Vol. 2). Boca Raton, FL, USA: Chapman & Hall/CRC.

Kaye, D. H., Hans, V. P., Dann, B. M., Farley, E., & Albertson, S. (2007). Statistics in the Jury Box: How Jurors Respond to Mitochondrial DNA Match Probabilities. Journal of Empirical Legal Studies, 4(4), 797–834. doi:10.1111/j.1740-1461.2007.00107.x

Koehler, J., Chia, A., & Lindsey, S. (1995). The random match probability (RMP) in DNA evidence: Irrelevant and prejudicial? Jurimetrics Journal, 201–220. Retrieved from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1432066

Mehlum, H. (2009). The Island Problem Revisited. The American Statistician, 63(3), 269–273. doi:10.1198/tast.2009.08107

Schklar, J., & Diamond, S. S. (1999). Juror Reactions to DNA Evidence: Errors and Expectancies. Law and Human Behavior, 23(APRIL 1999), 159–184.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy. Sharon Bertsch McGrayne. Yale University Press.

Thompson, W. C., & Schumann, E. L. (1987). Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor’s Fallacy and the Defense Attorney’s Fallacy. Law and Human Behavior, 11(3), 167–187. doi:10.2307/1393631

Tribe, L. H. (1971). Trial by Mathematics: Precision and Ritual in the Legal Process. Harvard Law Review, 84(6), 1329–1393. doi:10.2307/1339610

 

Leave a Reply

Your email address will not be published. Required fields are marked *