Can Statistics and Law ever learn to get along?

R v Adams

In 1996 a jury heard forensic testimony that a ‘match’ had been found between Denis Adams’ DNA and a sample found at the scene of a crime where a woman had reported being assaulted and raped. The probability of this match occurring by chance was described by the forensic expert as being ‘1 in 200 million’. In Adams’ defence his lawyers related to the jury that the victim herself in fact failed to pick the defendant out of a line up and even explicitly stated that Adams did not match her description of her assailant. Further, Adams girlfriend testified that he was with her on the night the incident took place. Despite this, the jury convicted Adams of the crime. The defence, convinced that the jury had overly weighted the DNA evidence in their deliberations, immediately launched an appeal. Unsure of the correct way to combine the three pieces of evidence, they recruited a statistical expert, Peter Donnelly (Donnelly, 2005) to undertake a ‘Bayesian’ analysis of the case.

It was resolved by all parties involved that the statistical calculations must be undertaken by the jurors themselves. Donnelly, in combination with the statistical experts from the prosecution, devised a questionnaire to encourage the jurors to quantify the various pieces of evidence presented in the case. For example, in regards to the failure of the victim to identify Adams, the jurors were asked to provide numbers to the questions ‘If he were the attacker, what’s the chance that the victim would say her attacker didn’t look anything like him?” and “If he wasn’t her attacker, what’s the chance she would say this?”. Once the jurors had given numerical estimates for the value of each piece of evidence, they were then guided in how to combine these using Bayesian techniques to arrive at a figure representing the value of all three pieces of evidence combined.

However, the attempt to guide the jurors and the judge through this process was described by Donnelly as rife with misunderstanding, mishaps and general difficulty, some of which Donnelly elucidates:

‘The episode had some amusing sidelines. It was suggested that it would be helpful to supply the jury (and judge) with basic calculators. Although the total cost was well under £100, this request was so unusual that it seemed to require clearance personally from the Lord Chancellor. Then, during my evidence, we walked the jury through a numerical example—the barrister would suggest token numbers in answer to the questions, and the jury and I entered them in the calculators which were eventually supplied. They seemed to have no difficulty in following this, but at an early stage in the calculation, when I said something to the effect that: “Your calculator should now show the value 31.6,” and the jurors all nodded, the judge rather plaintively said: “But mine shows zero.”‘ Donnelly (2007)

The appeal was eventually rejected, with the appeal judge scathing of the statistical approach used. As a result of his experiences, Donnelly remains unconvinced that such an approach is a feasible future for the presentation of Bayesian reasoning in legal cases.

But can there be a future for statistics in the court room? Is there another way? And what even is Bayesian inference anyway?


‘What even is Bayesian Inference anyway?’

Bayesian inference is the mathematically-accurate method of updating a ‘prior’ probabilistic belief in a hypothesis (such as Adams being the attacker) in the light of new evidence (such as the DNA evidence, the alibi, and the line-up identification failure) to arrive at a ‘posterior’, or updated belief level in that hypothesis.

It might be clear that this general concept, of updating one’s beliefs in ‘something’ in the light of new information, is hardly one restricted to the court room, and indeed some believe this fundamental belief-updating process, and therefore Bayesian inference, is central to almost all human endeavours (McGrayne, 2011; Link, 2009; Gelman et al. 2014).

Bayes’ formula for undertaking this inference was published over 250 years ago (Bayes & Price, 1763). A picture of Reverend Bayes next to his famous formula can be seen below, however I don’t want to get bogged down in the algebra – there are many (many) thorough explanations of it elsewhere. Suffice to say at this point that what you get out of the formula (highlighted red below and known as the ‘posterior’) is the updated belief level, and to calculate that you combine the prior (green: the old belief level) with the new information / evidence (blue). Hopefully that makes some intuitive sense.

Bayes’ theorem has been extensively validated and is no longer in any doubt as the correct approach in probability-updating situations amongst the statistical community: as Fenton, Neil and Hsu (2014) stated:

‘The application of Bayes’ theorem to probabilities is akin to the application of addition or multiplication to numbers: probabilities are either correctly combined by this rule, or they are combined incorrectly by other means.’ Fenton, Neil & Hsu (2014)

So, if the numbers going into the formula are correct, or correspond to reality, then the number coming out will also be correct. But here of course, lies almost all of the contention: the conversion of non-quantified beliefs (Adams’ guilt; your chance of catching a bus; a patient’s probability of having a given disease; how much your friend likes you; a football team’s chance of winning a match) into the quantified ones the formula requires. Nowhere is this conversion currently more contentious than in the legal realm. However there are work-arounds: one can calculate probability ‘distributions’, for example, taking into account multiple feasible valuations of each piece of evidence (e.g. those most in favour of the prosecution and those most in favour of the defence). For example, while there is no access to the original figures calculated by the jurors in the Adams trial, a Bayesian post-analysis of the case by Dawid (2002) suggested that the probability distribution of guilt taking into account the three pieces of evidence may be as low as 0.36 or as high as 0.98. He believed this analysis demonstrated that there was room for ‘reasonable doubt’. Perhaps this also demonstrates that the techniques can be informative to trials like this.


‘Trial by Mathematics’

There are many opponents to the use of Bayesian inference in court cases, and many of them point to a now-classic paper by Professor Laurence Tribe (1971), entitled ‘Trial by Mathematics: Precision and Ritual in the Legal Process’ in Harvard Law Review. Tribe begins the paper with an implicit comparison of modern attempts to ‘mathematize’ the legal process with those from the middle ages:

‘The system of legal proof that replaced trial by battle in Continental Europe during the Middle Ages reflected a starkly numerical jurisprudence. The law typically specified how many uncontradicted witnesses were required to establish various categories of propositions, and defined precisely how many witnesses of a particular class or gender were needed to cancel the testimony of a single witness of a more elevated order. So it was that medieval law, nurtured by the abstractions of scholasticism, sought in mathematical precision an escape from the perils of irrational and subjective judgment.” Tribe, 1971

Tribe’s implied point here is: this was tried before, and it is as bad an idea now as it was back then. One of Tribe’s main functional arguments for this (apart from some compelling moral arguments), is that statistical evidence will be far more salient, or attractive to the jury, than the non-mathematical evidence that they will always be asked to combine it with, because statistical evidence exudes an “aura of precision”. Tribe argued throughout his paper against an article in the very same journal issue by two authors named Finkelstein and Fairley (1971), who were proposing the use of Bayesian inference in legal trials for the first time (presumably the journal had approached Tribe for his views prior to publication instead of this being some wonderful coincidence). Finkelstein and Fairley were proposing a system somewhat similar to that employed in R v Adams above, where the jurors convert their beliefs in numerical values. Tribe makes the point that:

Even assuming with Finkelstein and Fairley that the accuracy of trial outcomes could be somewhat enhanced if all crucial variables could be quantified precisely and analyzed with the aid of Bayes’ Theorem, it simply does not follow that trial accuracy will be enhanced if some of the important variables are quantified and subjected to Bayesian analysis, leaving the softer ones – those to which meaningful numbers are hardest to attach – in an impressionistic limbo. On the contrary, the excessive weight that will thereby be given to those factors that can most easily be treated mathematically indicated that, on balance, more mistakes may well be made with partial quantification than with no quantification.” Tribe, 1971.

I hold some sympathy for Tribe’s views that the legal process might be better if mathematics were kept out of it entirely, particularly at the time it was written. However, unfortunately for Tribe’s modern proponents, while they stand with their shoulders at the door of the courtroom, using Tribe’s arguments to keep the statisticians out, they have forgotten to look behind them. And behind them the real nightmare that Tribe hoped never to see has crept up over the last three decades: legal professionals with no statistical training misusing mathematical evidence, overweighting it, and leading to numerous miscarriages of justice (e.g. Forrest, 2003; Mehlum, 2009; Donnelly, 2007)

The problem grew largely due to the rise of the use of DNA evidence in court cases from the early 1990s. When forensic teams report a DNA match in court and they want to get across the importance of the evidence, they generally report the probability of a random person from the population matching the sample found at the crime scene (the ‘random match probability’, as it’s known). The most notorious error in legal practise, single-handedly responsible for a swathe of miscarriages of justice is intimately entwined with this figure and is known as the Prosecutor’s Fallacy. To demonstrate, imagine you are a juror on a murder trial and you are told there is only one piece of evidence against the defendant: he matches a DNA sample found on the victim’s body which could only have been left by the murderer. You are then told the chance of this match occurring by chance (the RMP) is extremely low: only 1 in 2 million people have this DNA type. Now, what do you think is the chance that this person is innocent? If the answer that pops into your head is also ‘1 in 2 million’ I’m afraid you’ve just committed the fallacy. Why is this a fallacy? Well imagine the murder happened in London (with a population of about 8 million) and we determine that anyone in London could feasibly have done it in the time frame. How many matches for this DNA sample would we expect? Four. We already know the defendant matches, so he is another 1 – so our best estimate of how many matches there are in London is now 5. Since we have no other evidence against the defendant, the best we can say is that he is one of these 5 people, one of which must have committed the crime, so he has a 1/5 chance of being the assailant, or a 4/5 chance of innocence.

Now 4 / 5 is a very big difference to 1 / 2million. The mistake in reasoning here is to ignore the population size, or, the ‘prior’ chance of guilt (which the population gives us). Before we did the DNA match, what was the defendant’s chance of guilt? 1 in 4 million – he had no more chance of being guilty than anyone else in London. So when we combine this with the DNA figure of 1 in 2 million we end up with 4 matches. But what if we were talking about an island with only 50 people on it? How many matches would we expect here? Not even 1 (0.000025 in fact). So if we were talking about this island and the DNA match occurred it would be extremely likely he had committed the crime: the bigger the population, the smaller the prior and the less convincing the DNA evidence. As we saw above with Bayes’ formula, the new, ‘posterior’ belief level has to be a combination of both the prior and the new information (the DNA match) – the mistake in the Prosecutor’s Fallacy is to entirely ignore the prior, or the population size, and focus entirely on the new information.

In this simplistic example we didn’t include any other evidence, and while there are cases where the prosecution rested entirely on DNA evidence (the Adams case we began with for example), it is not often the case. However, unfortunately, exactly as Tribe predicted, cases such as R v Adams as well as a swathe of research studies have now shown that people vastly overweight DNA evidence presented as a ‘random match probability’, typically due to the prosecutor’s fallacy (e.g. Thompson & Schuman, 1987; Koehler Chia & Lindsey, 1995; Schklar & Diamond, 1999; Kaye et al, 2007).


‘Is there another way?’

Fenton, Neil and Hsu (2014) see trials like R v Adams as proof that any legal process which includes jurors or lawyers attempting to calculate the statistics behind a trial as doomed to failure. A recent experiment I conducted with them confirmed this view: we presented 100 general-population participants with a Bayesian legal problem with only a single piece of ‘match’ evidence, including all the variables (such as the possibility for error during forensic testing) they would have to take into account to accurately calculate the value of the evidence. Not a single person was able to get the correct result – and most trials include more than 1 piece of evidence.

So what other option is there? Fenton and Neil (2011) argued that if the jurors and the lawyers can’t do the maths themselves, then they are just going to have to trust validated computer programs to do it for them. While it would always be up to people to determine the numbers that go into the formula, once that is done, the computer program, running validated, mathematically-factual algorithms, should be trusted to produce the correct output. This, they argue, is comparable to the way we trust calculators to undertake large multiplications or divisions. What they argue for, in short, is a greater role for statistics in law.

While this approach may appear very much the polar opposite of Tribe’s more conservative views of keeping statistics entirely out of the court room, and we can’t be sure of his views on this, I think it is actually much more in the spirit of his classic rebuttal – his major fear was the misuse of statistics, particularly through overweighting, and that is exactly what is happening now. We are living in a half-way house, with untrained legal professionals (occasionally but not systematically with professional assistance) presenting statistics to untrained jurors, and expecting them to understand the calculations. This is, not the best, but the worst of both worlds. And perhaps unfortunately, we can no longer return to Tribe’s non-mathematical utopia. DNA is here to stay, and with it, comes the random match probability. The only way out of the mess, it seems, is forward, not backward.


Bayesian Inference. William Link (2009). Elsevier Ltd.

Bertsch McGrayne (2011). Yale University Press.

Dawid, A. P. (2002). Bayes’s theorem and weighing evidence by juries. In Bayes’s Theorem: Proceedings of the British Academy. R. Swinburne. Oxford, Oxford University Press. 113: 71-90.

Donnelly, P. (2007). Appealing statistics. Medicine, Science, and the Law, 47, 14–17. doi:10.1258/rsmmsl.47.1.14

Fenton, N., & Neil, M. (2011). Avoiding Probabilistic Reasoning Fallacies in Legal Practice using Bayesian Networks, (June), 1–44.

Fenton, N., Neil, M., & Hsu, A. (2014). Calculating and understanding the value of any type of match evidence when there are potential testing errors. Artificial Intelligence and Law, 22(September), 1–28. doi:10.1007/s10506-013-9147-x

Finkelstein & Fairley (1971). A Bayesian Approach to Identification Evidence, 83 Harvard Law Review, 489

Forrest, a R. (2003). Sally Clark–a lesson for us all. Science & Justice: Journal of the Forensic Science Society, 43, 63–64. doi:10.1016/S1355-0306(03)71744-4

Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian data analysis (Vol. 2). Boca Raton, FL, USA: Chapman & Hall/CRC.

Kaye, D. H., Hans, V. P., Dann, B. M., Farley, E., & Albertson, S. (2007). Statistics in the Jury Box: How Jurors Respond to Mitochondrial DNA Match Probabilities. Journal of Empirical Legal Studies, 4(4), 797–834. doi:10.1111/j.1740-1461.2007.00107.x

Koehler, J., Chia, A., & Lindsey, S. (1995). The random match probability (RMP) in DNA evidence: Irrelevant and prejudicial? Jurimetrics Journal, 201–220. Retrieved from

Mehlum, H. (2009). The Island Problem Revisited. The American Statistician, 63(3), 269–273. doi:10.1198/tast.2009.08107

Schklar, J., & Diamond, S. S. (1999). Juror Reactions to DNA Evidence: Errors and Expectancies. Law and Human Behavior, 23(APRIL 1999), 159–184.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy. Sharon Bertsch McGrayne. Yale University Press.

Thompson, W. C., & Schumann, E. L. (1987). Interpretation of Statistical Evidence in Criminal Trials: The Prosecutor’s Fallacy and the Defense Attorney’s Fallacy. Law and Human Behavior, 11(3), 167–187. doi:10.2307/1393631

Tribe, L. H. (1971). Trial by Mathematics: Precision and Ritual in the Legal Process. Harvard Law Review, 84(6), 1329–1393. doi:10.2307/1339610


Read More

The Cognitive Psychology of Moral Discrepancy  


During the Second World War, the French existentialist philosopher Jean Paul Sartre, under the occupation of the German army, co-founded the underground resistance group Socialisme et Liberte. He also contributed to several illegal newspapers and magazines, writing articles in opposition to the invaders. However, he also famously accepted a lecturing position which had been taken from a Jew following the ban of their teaching in the country by the Nazis.

Jean Paul Sartre
Jean Paul Sartre

Further, after
approaching several people about joining Socialisme et Liberte and meeting indecision and uncertainty, the group shortly dissolved, and Sartre took no further part in active resistance. Sartre’s philosophy espoused the value of freedom and the moral duty of human beings, and given this, there has been much debate regarding whether Sartre’s actions, or lack of action, during this period were consistent with his professed beliefs.

The study of this relationship between ‘espoused’ moral values and actual behaviour has a long history and also continues to this day. Espoused Theory (developed principally by Chris Argyris e.g.  Argyris and Schön [1974]) states that:

When someone is asked how he would behave under certain circumstances, the answer he usually gives is his espoused theory of action for that situation. This is the theory of action to which he gives allegiance, and which, upon request, he communicates to others. However, the theory that actually governs his actions is this theory-in-use [actual behaviour]. (Argyris and Schön 1974: 6-7)

Jonathan Haidt (2001) goes further: espoused moral beliefs and actual behaviours are governed by completely different mental systems. In Haidt’s “Social Intuitist Model” (see paper: ‘The Emotional Dog and its Rational Tail’) the vast majority of real moral judgments / behaviours in-the-moment are made by one’s intuitive reaction to the situation, rather than through step-by-step reasoning. Reasoning, Haidt states, is generally only used in order to make after-the-fact justifications for moral decisions that have already been made intuitively or indeed to explain one’s moral beliefs to others in a theoretical context.

So what is the reason for this discrepancy? Both Espoused Theory and the Social Intuitist Model provide little explanatory theory outside of the proposal that the two phenomena are governed by different ‘theories’ or ‘mental systems’. Why do these systems behave differently? One possibility comes from research on cognitive biases. Firstly, check out the two versions of the ‘disease’ problem below:


Disease Problem: V1

Imagine you are in charge of the health department for a country experiencing a national disease outbreak. You have quarantined all the affected cases, 600 people in total. Your advisor presents you with the only two treatments available. You are told that treatment A will definitely save 200 lives, while treatment B has a 33% chance of saving all 600, but a 66% possibility of saving no one.

Which treatment option do you choose?



Disease Problem: V2

The situation is the same; 600 people quarantined. However in regards to the treatments, you are now told that treatment A will definitely kill 400 people, while Treatment B has a 33% chance that no people will die, but a 66% chance that all 600 people will die.

Now which treatment do you choose?


While the decision to be made in each version of these two problems is precisely equal, it has been consistently shown that the majority of people opt for treatment A in the ‘lives saved’ framing version (V1) but the same majority opt for treatment B in the ‘deaths’ framing version (V2). This effect has been found in many other experiments with related problems and the general consensus is that when faced with ‘gains’ people tend to choose the safe / certain option, while when faced with ‘losses’ people tend to choose the risky option – even when the two decisions are precisely equal.

This insight, known as ‘Loss Aversion’ led to Tversky and Kahneman’s 1979 ‘Prospect Theory’, a cornerstone of modern behavioural economics. In their 1974 paper (‘Judgment under Uncertainty: Heuristics and Biases’) they

System 1 vs System 2 [Illustration by David Plunkert, via The New York Times]
System 1 vs System 2 [Illustration by David Plunkert, via The New York Times]
proposed that people are susceptible to a wide variety of other cognitive biases also (including ‘anchoring‘, the ‘base rate fallacy‘, the ‘conjunction fallacy‘ and many others). Further, in Kahneman’s best-selling 2011 book ‘Thinking Fast and Slow’ he lays out

his belief that these biases are inherently due to the design of the mental ‘System 1’ (Haidt’s ‘Intuitive’ system) and can be overcome by greater use and education of the mental ‘System 2’ (Haidt’s ‘Reasoning’ system). In Kahneman’s model, both systems have their virtues and vices: System 1 makes decisions quickly and can handle a large amount of complexity, but it makes mistakes. System 2 is slower but more methodical and so makes less mistakes. In the moment, System 2 will often be too slow to determine how to behave so we rely predominantly on System 1.

So, perhaps we have the best intentions but are simply incapable of carrying them out in the moment due to the cognitive limitations of System 1?


An Experimental Test

In a recent paper, Schwitzgebel and Cushman (2015) wanted to test whether the degree of theoretical knowledge of moral situations would affect this in-the-moment decision making. To examine this, the authors decided to compare philosophers (people with philosophy degrees) to “similarly-educated” non-philosophers on the two disease problems. They also took data on the level of expertise in philosophy as well as whether ‘ethics’ was their area of speciality.




The study firstly replicated previous results, with a large majority of participants choosing the risky option when faced with ‘deaths’, and far less choosing the risky option when faced with ‘lives saved’. Furthermore, the effect size was the same for non-philosophers (83% vs 43%) and for philosophers (79% vs 32%) and no difference was seen even for philosophers with specialization in ethics.

Another Approach

This all fits with Espoused Theory, the Social Intuitist Model and the Cognitive Biases approach. Philosophers are trained to deal with ethical problems slowly and precisely (using ‘System 2’ in Kahneman’s language), but when faced with problems like the disease scenarios, their System 1 is just as vulnerable to the framing effect as anyone else.

But can this approach explain all moral discrepancy? Does it even explain the story we began with? Can Sartre’s actions during the war really be put down to cognitive biases and framing effects? He certainly would have had time to consider whether to disband his resistance group as well as whether to take the lecturing post. Can we really class these as in-the-moment, intuitive decisions? Professor Schwitzgebel (of Schwitzgebel and Cushman) has another theory. He has spent a large amount of his life’s work conducting empirical studies on the moral behaviours of Professors of Ethics in particular to determine if they are any kinder, fairer or more moral than other people.

angel_devilOver the years Professor Schwitzgebel and colleagues have looked at a vast range of behaviours including donating to charity, responding to student emails, organ and blood donation, frequency they call their mothers, eating meat, theft of library books, etc etc. The overall finding? No difference. Professors of philosophy studying ethics were no worse or better on these range of behaviours than other people.

Further, especially in regards to eating meat and giving to charity, the ethics professors were significantly different to other groups in their espoused belief about how morally bad eating meat was (they thought it was worse) and how much of one’s salary should be given to charity (they thought it should be more). But when it came to actual
behaviour? No difference.

So why the discrepancy here? The cognitive biases approach doesn’t seem any more relevant here than in Sartre’s case – there are no clear ‘framing’ effects, and people have all the time they need to make these decisions. From all his studies and interviews Professor Schwitzgebel believes one fact clearly shines through: morally, he says, people just want to be about as good as the other people around them. Studying ethics will change your idea of what an ‘ideal person’ is – but it won’t change your desire to be that ideal person – you will still just aim to be about average and no amount of theoretical expertise will change this fact. So it seems that even when we have time to employ our ‘System 2’ and really think about our behaviour, ‘good enough’ is good enough and we shouldn’t expect those with a large amount of training in ethical philosophy or even those who profess these beliefs, like Sartre, to stand by them in practise. Schwitzgebel calls this ‘Cheeseburger Ethics’ and you can find out why by reading his excellent post here:

God Speed!


Haidt, J. (2001). The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Judgment. Psychological Review, 108(4), 814–834. doi:10.1037//0033-295X.

Kahneman, D. (2011). Thinking, fast and slow. Macmillan.

Kahneman, D., & Tversky, A. (1979). Prospect theory: an analysis of decision under risk. Econometrica, 47(2), 263–292. Retrieved from

Meyer, M. W., Argyris, C., & Schon, D. a. (1976). Theory in Practice: Increasing Professional Effectiveness. Contemporary Sociology (Vol. 5). doi:10.2307/2062989

Schwitzgebel, E., & Cushman, F. (2015). Philosophers’ biased judgments persist despite training, expertise and reflection. Cognition, 141, 127–137. doi:10.1016/j.cognition.2015.04.015

Tversky, A, & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science (New York, N.Y.), 185(4157), 1124–1131. doi:10.1126/science.185.4157.1124






Read More

Chess and The Clash of Civilizations

Several years ago I read The Player of Games by the late Iain Banks. In the far distant future, Jernau Morat Gurgeh, one of the greatest game players in his galactic civilization ‘The Culture’ is invited to travel to the distant and rival ‘Empire of Azad’ to play the most complex game ever created. The Azad use the game as the principal method of determining who their next emperor will be, and hold regular tournaments for this purpose. PlayerofGamesIt is to one of these tournaments that Gurgeh is invited, and is expected to be defeated easily. However as the story unfolds it becomes clear that Gurgeh’s vastly different cultural background to all the Azad players is deeply apparent in his approach to the game. This makes him unpredictable to them and gives him an edge in his play. As Gurgeh faces the reigning emperor in the final match, so much so do their cultures influence their styles of play that the game becomes a proxy for the war between The Culture and the Azad Empire.


So when I recently came across a paper entitled ‘Civilization differences in Chess Experts’ by Chassy and Gobet (2015) my mind immediately recalled The Player of Games and I couldn’t help but read on.


Chassy and Gobet examined the first move made by chess experts from across the globe. By far the two most frequent in expert play are e4 (king’s pawn forward two spaces) and d4 (queen’s pawn forward two spaces). This is because an important initial principle in chess is to control the centre of the board with your pieces. On, these two moves are played in around 80% of all games. Therefore all other first moves were lumped together in a third category.


As it transpires, taking into account wins, draws and losses, e4 is a ‘riskier’ move than d4 at this expert level of chess. With d4 the game is slightly more likely to end in a draw, whereas with e4 one has a slightly greater chance of both victory and loss. Importantly, neither is clearly ‘better’ than the other, e4 is just slightly riskier and d4 slightly more conservative. The third category (all other moves lumped together) was intermediary, being slightly less risky than e4 but slightly more risky than d4.


The authors then divided the world up according to Huntington’s (1996) classic text The Clash of Civilizations and the Remaking of World Order. Huntington thought that in the post-cold war world, the primary source of global conflict would be people’s cultural and religious differences rather than specifically territorial boundaries (what a fool…). He divided the world up based on these cultural / religious differences. These included: Western, Orthodox (Russia and the eastern bloc), Islamic, African, Latin American, Sinic (Chinese and neighbouring countries), Hindu, Buddhist and Japanese. In the current paper, ‘Japanese’ had to be removed by the authors as they didn’t have enough chess games to analyse and ‘Jewish’ was added.

WorldMapChess is played globally and further, has the same rules all over the world and has a single global rating system (the elo rating system). The authors were therefore able to extract data from games played between experts across the globe and compare them meaningfully. The results can be seen below.


Cold Hard Boring Reality

As much as I have enjoyed conflating real world research with fiction in this article so far, seeing the actual data forces me to put my researcher hat back on (it’s a very serious hat – no frilly bits at all).


Firstly, I actually have a fairly serious issue with this paper. It is assumed that e4 is chosen on average slightly higher by some cultures ‘because’ it is a risky move and further that this choice of risky move somehow says something about that culture. The paper says that “the level of risk-taking varies significantly across cultures” and “we discuss which psychological factors might underpin these civilization differences”. But this is only valid if players know that e4 is riskier and if they play it because it is riskier. But we really can’t be sure that that is why the move is chosen. Different cultures may have different opening strategy habits, passed down from teacher to student for generations which have more to do with tradition than optimization or risk. Or there might be another reason for the differences. We really can’t be sure so we can’t confidently take the leap of inference from move choice to general approach to risk.


The second thing to say is that these differences are very small and they are averages. If you pit two players from America and Russia against each other you aren’t going to see a Rocky IV style clash of cultures, let alone anything on the scale of the Gurgeh-Azad game. In fact if they are representative of the Western and Orthodox regions as a whole the most probable outcome on this data is that they would both play d4 when they are white. You also won’t find any validation for classic stereotypes in this data.


And that’s the real bucket of cold water. We humans are all just too boringly similar to each other. I suppose we will have to wait until we meet a neighbouring chess-playing galactic empire before we can get some really interesting data.



Chassy, P., & Gobet, F. (2015). Risk taking in adversarial situations: Civilization differences in chess experts. Cognition, 141, 36-40.
Huntington, S. P. (1996). The class of civilizations and the remaking of world order. Penguin Books India.



Read More

David Marr, Cognitive Science and the Middle Path

David Marr published Vision in 1982, and the work continues to influence research in cognitive science today. So much so in fact that Topics in Cognitive Science has published a special edition ‘Thirty Years after Marr’s Vision’ including articles on the applications and relevance of Marr’s work to the modern cognitive scientist.


The Tri-Level Hypothesis

David Marr (1982) proposed in his ‘Tri-Level Hypothesis’ that when seeking to explain the behaviour of any information-processing system such as the brain or its parts, there are three distinct levels of analysis, each of which must be understood: the computational, the algorithmic and the implementation levels. The computational level represents what the system does (the problems it solves e.g. producing colour vision). The algorithmic level includes the representations and processes used to solve these problems and the MarrTriLevelimplementation level is the way in which the system is physically implemented (e.g. in the brain, the specific wiring and connections between neurons in the system). The Tri-Level hypothesis has been reformulated several times in the subsequent 30 years (e.g. Anderson, 1990; Newell, 1982; Pylyshyn, 1984) and remains a core tenet of cognitive science.


Reductionism and Vagueness

In an introduction to Topics in Cognitive Science’s special issue, Peebles and Cooper (2015) contend that the middle level, the algorithmic, is too often being ignored in modern times, with ‘reductionist neuroscience approaches’ focusing entirely at the implementation level and vague ‘Bayesian approaches’ focusing overly at the computational level. While this latter approach may indeed succeed in solving a problem which the brain solves, little or nothing is learned of how the brain itself actually solves the problem. Noting Marr’s insistence on the necessity of understanding at all three levels, the authors therefore urge greater focus on theories of cognitive architecture, which operate at the middle algorithmic level and decompose and explain the system through the interaction of functional cognitive components.


The Encroachment of Neuroscience

However, Bickle (2015), in the same issue, argues against this view. Peebles and Cooper’s attack, particularly on reductionist neuroscience, echoes Marr’s original attack on the inability of reductionists of his time to explain vision using electrophysiological cellular readings (e.g. Barlow, 1972). Bickle argues that while Marr’s original attack on reductionism was justified, it (and by extension, Peebles and Cooper’s) is no longer tenable. A swathe of new techniques and tools such as cortical microstimulation have allowed neuroscientists to begin constructing causal-mechanistic explanations of the brain including the dynamic interaction of parts and their organization as well as explanations of how these interactions ‘solve the problem’ of interest. While reductionist approaches in Marr’s time were merely descriptive (and clearly operated only at the implementation level) modern neuroscience theories are therefore genuinely explanatory and appear to encroach on the algorithmic level. A causal-mechanistic neuroscientific explanation of a system is indeed different from the kind of explanation given by cognitive science and advocated by Peebles and Cooper, but is not clearly inferior, Bickle contends. Further, the interaction between, or equivalence of, these ‘higher level’ neuroscientific explanations of the brain system and the more traditional cognitive explanations at the algorithmic level, is not fully understood and will need further work. Marr did MarrQuotenot anticipate this encroachment of neuroscience on the algorithmic level, Bickle states, and it is not clear what he would have made of it.


Synergistic Working

In the same issue, Love (2015) argues for greater cooperation between those working at different levels and proposes that findings at one level might be used to test theories at another. In fact there is already at least one good example of this synergistic working between neuroscience and cognitive psychology, and can be found in Smith, Kosslyn and Barselou (2007, p16). It began before and ended after, Marr’s 1982 work, and was in fact also within his own field of vision. In the 1970s there were two competing theories of how mental images (e.g. imagining a square) were represented in the mind. Pylyshyn (1973) claimed they were represented conceptually, similar to language (a mental image of a square would be represented simply as the concept ‘square’). However Kosslyn and Pomerantz (1977) believed that such images were actually ‘depicted’ in the mind, geometrically mapping point for point with a real image (a mental image of a square would literally be represented by four joined equal-length lines with right angles between them). For over a decade this debate continued with neither side able to disprove the other on cognitive evidence alone. However in the late 90s’ advances in neuroscience allowed careful examination of the area of the brain underpinning mental imagery. They were found to be represented in the primary visual cortex of the brain ‘topologically’. Mental imagery literally produced a ‘picture’ of activation on the surface of the cortex, which, while it would not be recognizable to a naïve viewer as the original image, corresponded to the size and orientation of the imagined image (see figure 2, below), and which mapped one to one with the mental experience of imagining the object (Klein et al, 2004; Kosslyn et al, 1995; Kosslyn & Thompson, 2003). This provided strong evidence for the ‘Depiction’ theory, and demonstrated the potential value that multi-level working could provide.


Figure 2. ‘A picture on the brain’: two sets of fMRI images from Klein et al (2004) demonstrating (a) the two stimuli used (b) the unique cortical activation for the horizontal image and (c) the unique cortical activation for the vertical image. Activation is shown for both direct perception of the images and for subsequent ‘mental imagery’.



It has been over 30 years since the publication of David Marr’s Vision and the work still remains central to cognitive science. In this time the work has been revised and the clear distinction between the three levels apparent in Marr’s time has become somewhat blurred by the encroachment of neuroscience on the algorithmic level previously monopolised by cognitive science. Further, in developments that would have surely pleased Marr, synergistic working between levels has produced advancements in understanding of the function of the brain. Finally, the Tri-Level Hypothesis still shows the capacity to provoke debate, even within a single publication, and it is perhaps this capacity which will ensure its centrality is maintained for the next thirty years.



Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Lawrence Erlbaum Associates.

Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptual psychology. Perception, 1, 371–394.

Bickle, J. (2015), Marr and Reductionism. Topics in Cognitive Science, 7: 299–311. doi: 10.1111/tops.12134

Klein, I., Dubois, J., Mangin, J. F., Kherif, F., Flandin, G., Poline, J. B., … & Le Bihan, D. (2004). Retinotopic organization of visual mental images as revealed by functional magnetic resonance imaging. Cognitive Brain Research, 22(1), 26-31.

Kosslyn, S. M., & Thompson, W. L. (2003). When is early visual cortex activated during visual mental imagery?. Psychological bulletin, 129(5), 723.

Kosslyn, S. M., Thompson, W. L., Kim, I. J., & Alpert, N. M. (1995). Topographical representations of mental images in primary visual cortex. Nature, 378(6556), 496-498.

Kosslyn, S. M., & Pomerantz, J. R. (1977). Imagery, propositions, and the form of internal representations. Cognitive Psychology, 9(1), 52-76.

Love, B. C. (2015), The Algorithmic Level Is the Bridge Between Computation and Brain. Topics in Cognitive Science, 7: 230–242. doi: 10.1111/tops.12131

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York, NY: Henry Holt and Co.

Newell, A. (1982). The knowledge level. Artificial Intelligence , 18(1), 87–127.

Peebles, D. and Cooper, R. P. (2015), Thirty Years After Marr’s Vision: Levels of Analysis in Cognitive Science. Topics in Cognitive Science, 7: 187–190. doi: 10.1111/tops.12137

Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental imagery. Psychological bulletin, 80(1), 1.

Pylyshyn, Z. W. (1984). Computation and cognition: Toward and foundation for cognitive science. Cambridge, MA: MIT Press.

Smith, E. E., Kosslyn, S. M., & Barsalou, L. W. (2007). Cognitive psychology: Mind and brain. Upper Saddle River, NJ: Pearson Prentice Hall.















Read More