by Joshua Banta, Jonathan Kaplan and Massimo Pigliucci
Why would the popular media be interested in a story about a historical argument surrounding measurement techniques and statistical summaries of human skull volumes? A technical scientific paper published by Lewis et al. in the journal PLoS Biology a few years ago  was just that, and yet it was picked up by major news organizations, including the New York Times , Wired , and Nature , as well as countless science blogs (as a Google search of “Lewis et al. 2011 skulls” quickly confirms). Clearly, something else was going on that piqued reporters’ and bloggers’ interest.
Partially, perhaps, the impact of Lewis et al.’s paper can be attributed to the target of its attack: evolutionary biologist Stephen J. Gould. Gould spent much of his career at Harvard University, where he published technical scientific papers in paleontology, zoology, and evolutionary biology, as well as over 20 books for lay audiences about science. Gould was, and remains, a divisive figure. His strong opposition to “genetic determinism” led to some very public fights with other science popularizers, such as Richard Dawkins and E.O. Wilson, whose work he viewed as encouraging naïve views of the relationship between genes and development. Gould’s longstanding commitment to anti-racism came together with his concern about simple-minded genetic explanations offered by “hereditarianism,” the ultra-genetic determinist view that human behaviors are caused by specific genes that are fixed in their effects and impervious to changes in living and rearing conditions, and that genes that matter to important traits like intelligence vary among the “races.”
In one of his popular books, The Mismeasure of Man, Gould set his sights on Samuel G. Morton, a 19th century American physician who catalogued and reported the cranial volumes of human skulls he collected while working at the University of Pennsylvania; importantly, these skull measurements were organized in Morton’s writings by race . Gould argued that Morton believed the races could be ranked by intellectual ability, and that Morton thought that his measurements proved it. (In fact, it isn’t clear what, if anything, Morton meant his skull measurements to prove — more on this later.) Gould also argued that Morton’s racial biases had led him, unconsciously, to mis-analyze the skulls in his collection in ways that systematically advantaged “whites” and systematically disadvantaged “blacks” (and, indeed, all the other races). Properly analyzed, Gould continued, the skulls in Morton’s collection revealed no differences in sizes worth mentioning, demolishing both Morton’s claims to objectivity and the latter contention that skull sizes varied significantly with “race,” and hence undermining Morton’s goal of linking intellectual ability and race.
Lewis et al. claimed that Gould was wrong, and that Morton was correct. We can’t help but think that at least part of the reason their paper garnered serious attention was the implication people drew from this conclusion: that Morton’s preconceived racial notions were not so wrong after all. This would definitely be welcome news to racists, and sure enough, members of the White Supremacist website StormFront immediately trumpeted Lewis et al.’s results as proving that Gould was a fraud, and took them to be broadly supportive of their explicitly racist agenda . And it is worth remembering that Nicholas Wade, as the science editor for the New York Times, was, at least in part, responsible for the unusual degree of attention that Lewis’ paper received (getting written up in the Times is still a good way to get noticed!); it was only later that Wade’s explicitly racist agenda became common knowledge (with his publication of A Troublesome Inheritance: Genes, Race and Human History). Speculating that Wade publicized Lewis et al.’s paper to support his racist program seems, on the whole, not entirely unreasonable.
The arguments surrounding racial differences in intelligence are complex. Reasonings that evoke issues of nature versus nurture are treading into biologically difficult waters, both conceptually and empirically . Genes are no doubt important mediators of the way we look, our behavior, our intelligence, etc., but so are environmental influences. Rigorous studies with animal models reveal that many things one might think of as genetically hard-wired, like intelligence, creativity, weight, and even hair and eye color, are sometimes drastically sensitive to environmental influences . For example, rats that are “dumb” in one environment can be “smart” when raised in a different environment , and they can have different colors of fur depending on the diet that was fed to their mothers . This complicated interrelationship of genes and environments is one of the reasons why studies on the malleability (or lack thereof) of a bewilderingly complex trait like human intelligence are so problematic.
But the history of hereditarian hypotheses isn’t just complex, it’s also ugly. Research programs that attempted to quantify differences in “innate” intelligence among humans and tie them back to racial differences were popular through the first half of the 20th century (growing into diverse and now discredited fields such as phrenology and eugenics), and these research programs were used to defend horrific practices like slavery and colonialism, segregation, and, more generally, the active creation of social and economic inequality surrounding racial ascriptions. The arguments in favor of these positions were found, on reflection, to be full of holes, based on poor justifications and incorrect assumptions, and were later resoundingly disavowed by the overwhelming majority of geneticists.
Echoes of these discredited ideas, however, return with dispiriting regularity. The 1994 book The Bell Curve by Richard Herrnstein and Charles Murray (neither of whom are geneticists) revived popular interest in these theses, and, as noted above, Wade has recently toyed with these ideas as well. Contemporary researchers explicitly promoting racist science are rare, but seem to be a permanent part of the intellectual landscape.
Lewis and his colleagues explicitly disavowed these racist positions, and indeed praised Gould for his anti-racist work (though some of the authors were substantially less kind to Gould in interviews and their non-academic publications on these issues ). But their article has had unfortunate (and predictable) resonances with these traditions. One cannot assert that Morton’s work on skulls was broadly accurate, and attack Gould’s work on the skulls, without at least implying that Morton’s claim that skulls of “Africans” were much smaller than skulls of “Caucasians” was right, and that average skull sizes really do vary in just that way. This cannot help but give at least some comfort to those who would jump from that to the conclusion that intelligence must differ across races as well.
Whatever Lewis and his colleagues were hoping to do with their paper, they pretty much made a hash of it in the end. Perhaps the single weirdest thing is that they became famous chiefly for carefully re-measuring skulls in Morton’s collection. That their measurements broadly agreed with Morton’s was the main result reported — the claim that was splashed across headlines and lit up the blogosphere.
But Gould never claimed that the measurements of Morton’s that Lewis et al. compared their results to were biased; indeed, Gould stated, quite plainly, that Morton’s physical measurements of his skulls were both reliable and accurate in the end, once Morton had cleaned up and refined his methods. So Lewis et al. were comparing their data to data that Gould said were accurate, and reporting the congruence as if it were news and somehow showed Gould to be wrong. What makes the whole enterprise odd is that Lewis et al. knew this, and yet they still portrayed their work in a way that allowed the media and the Internet to run wild.
Lewis et al.’s other criticism was that the skulls Gould decided to include or exclude, and the ways that Gould decided to identify groups of people was less justifiable than the way Morton did it. They therefore argued that Gould’s analysis of Morton’s data was more biased — more flawed — than Morton’s own. Morton, according to Lewis et al., was properly objective in his analysis, and it was Gould who let his biases undermine his objectivity. On this point, we believe that Gould, Lewis et al., and Morton were all hopelessly confused.
The problem, as we show in a paper recently published in Studies in History and Philosophy of Science , is at least threefold: (1) the sample of skulls in Morton’s collection is not necessarily representative of the groups from which they were drawn; (2) the “races” Morton identified are, from the standpoint of modern genetics, at best problematic; (3) the point of Morton’s collecting and measuring skulls isn’t at all clear, and without clear questions, arguments about the best methodology (for what purpose?) are beside the point.
First, note that Morton did not collect his skulls in a scientifically sound way. He got them from whatever places his associates found it most convenient to, let’s say, “appropriate,” them. Some were stolen from graves, some were taken from archeological sites, etc. Skulls came to him with information on their provenance provided by the “suppliers.” How far we should trust these descriptions, how thorough and accurate they were, is of course debatable. In order to use a sample to estimate an average, one needs to have good evidence that the sample in question is appropriately representative of the groups from which it was drawn. In the case of Morton’s skulls, the result does not constitute a statistically sound sample in any sense. Garbage in, garbage out, as they say.
Second, the racial groups to which Morton, Lewis et al., and Gould credulously assigned the skulls have little bearing to any biological reality. For example, the skulls from what Morton called the “Negro Race” represent a collection of peoples that are genetically heterogeneous. In part, this is because it includes “Native African” skulls, and Africa contains more genetic diversity than the rest of the world combined (groups of people from different places in Africa can be more genetically different from one another than Swedish people and Japanese people, for instance ); and in part because Morton lumped other people with “black” skin, e.g. Australian Aboriginals, into the same “race,” despite the fact that they are not particularly closely related. Gould removed the Australian Aboriginals when he recalculated the “Negro Race” averages, but this falsely implies that there isn’t anything wrong with considering the “Native African race” a legitimate single entity. It isn’t. Again, genetic diversity within Africa is both extensive and subdivided. Or consider Morton’s “American Group.” On his analysis, this “race” contains the “Toltecan Family” which consists of two “populations” — the “Mexicans” and the “Peruvians.” Not only does this bear no relationship to any biological reality, but it isn’t at all clear in this case what would.
Even if we re-assigned the racial groupings of the skulls based on genetic similarities, what would we gain? The seemingly simple project — estimate the average skull volume of each “race” — hides enormous complications. How should we average the skull sizes of different peoples, when the groups we are estimating vary in size (and whose historical sizes were very different), are related to each other in different degrees, with varying degrees of gene-flow between them, etc.? It is completely unclear how one should even begin to approach such a question.
Third, since it isn’t at all clear what question Morton was trying to answer, if any, it isn’t a fortiori clear what evidence he even should have gathered. Gould presumed that Morton wanted to use skull volume as a proxy for intelligence, in order to prove that the “races” differed in native intellectual ability. Morton certainly did believe that the “races” differed in native intelligence, but it isn’t obvious that this was why he was obsessed with measuring skulls. Now, it is obvious that if this was Morton’s goal, his methods were hopelessly inadequate to the task. As noted above, even leaving aside the problems with his samples, and with the details of his analyses, the difficulties in teasing apart environmental effects on development from genetic differences (not that Morton knew anything about genes, of course!) stymie even contemporary researchers. Other writers argued that Morton had a different racist goal in measuring skulls — to prove “polygenesis,” that is, that the different “races” were each created by God as separate entities. We hope it is obvious that if this was Morton’s goal, it is a goal so at odds with contemporary biology that no evidence could possibly be relevant to it.
Some historians have argued that, again, while Morton had many racist beliefs, his work on skulls was just an attempt to gather data with no particular purpose. Indeed, during the same time he was producing his big Catalog of Skulls, he was also publishing detailed descriptions of fossilized crocodile skulls, of all things! And even his Catalog of Skulls contains a surprising number of descriptions of nonhuman (birds, reptile, fishes, other mammal) skulls. But if there is no question to be answered, and one is just measuring whatever skulls are available because one happens to have a fetish for skulls, it doesn’t make sense to ask what methods should be deployed (to do what, exactly?).
Taken together, these problems turn the whole debate surrounding who was most wrong into nonsense. The skull collection is doomed by being assembled in a way that makes any interesting scientific analysis of the groupings of peoples all-but impossible. The basic conclusion at which we arrive regarding Lewis and colleagues versus Gould is “a pox on both your houses!” Morton’s data is simply not useful for anything, and talking about “races” as people perceived them at some point in history is not scientifically relevant.
What is troubling is that the Lewis and colleague’s paper passed through peer review in such a high-profile journal and picked up so much popular media attention, leaving many people with the erroneous impression that there is evidence suggesting that individuals of different “races” really do differ in their skull sizes, and that this then tells us anything of any interest at all. That Lewis and his colleagues work, surely unwittingly, gives cover to racists is even more unfortunate.
Joshua Banta is an integrative biologist at the University of Texas at Tyler who studies genetics, evolutionary biology, and conservation biology.
Jonathan Kaplan is a philosopher at Oregon State University. His main areas of interest are the philosophy of biology and political philosophy.
Massimo Pigliucci is a biologist and philosopher at the City University of New York. His main interests are in the philosophy of science and pseudoscience. He is the editor-in-chief of Scientia Salon, and his latest book (co-edited with Maarten Boudry) is Philosophy of Pseudoscience: Reconsidering the Demarcation Problem (Chicago Press).
 The Mismeasure of Science: Stephen Jay Gould versus Samuel George Morton on Skulls and Bias, by J.E. Lewis et al., PLOS Biology, 7 June 2011.
 Scientists Measure the Accuracy of a Racism Claim, by Nicholas Wade, New York Times, 13 June 2011.
 The mismeasures of Stephen Jay Gould, by Brandon Keim, Wired, 14 June 2011.
 Mismeasure for mismeasure, Nature editorial, 23 June 2011.
 Phenotypic Plasticity: Beyond Nature and Nurture, by Massimo Pigliucci, Johns Hopkins University Press, 2001.
 See, for example: Epigenetics: DNA Isn’t Everything, Science Daily, 13 April 2009 / The agouti mouse model: an epigenetic biosensor for nutritional and environmental alterations on the fetal epigenome, by Dana C. Dolinoy, Nutrition Review, August 2008. / Nurturing brain plasticity: impact of environmental enrichment, by L. Baroncelli et al., Nature, 18 December 2009. / Lab animals and pets face obesity epidemic, by Alla Katsnelson, Nature, 24 November 2010.
 Effects of enriched and restricted early environments on the learning ability of bright and dull rats, by R.M. Cooper and J.P. Zubek, Canadian Journal of Psychology, 1958.
 Nutrition and the Epigenome, Learn.Genetics.
 Holloway is quoted calling Gould a “charlatan” in Wade’s piece in the NYT linked above. The editorial in Nature (link above) has the following claim: “Although the new paper does not accuse Gould of intentionally misrepresenting Morton, some of its authors have raised this possibility in interviews, noting that Gould’s oversights would be less troubling were he known to be a less meticulous scholar.”
 Gould on Morton, Redux: What can the debate reveal about the limits of data?, by Jonathan M. Kaplan, Massimo Pigliucci and Joshua A. Banta, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 7 February 2015.
 African Genetics Study Revealing Origins, Migration And ‘Startling Diversity’ Of African Peoples, Science Daily, 2 May 2009. / The Evolution of Human Genetic and Phenotypic Variation in Africa, by Michael C. Campbell and Sarah A. Tishkoff, Current Biology, 27 September 2010.