Measuring skulls, hereditarianism, and what data is for

S.J. Gould

S.J. Gould

by Joshua Banta, Jonathan Kaplan and Massimo Pigliucci

Why would the popular media be interested in a story about a historical argument surrounding measurement techniques and statistical summaries of human skull volumes? A technical scientific paper published by Lewis et al. in the journal PLoS Biology a few years ago [1] was just that, and yet it was picked up by major news organizations, including the New York Times [2], Wired [3], and Nature [4], as well as countless science blogs (as a Google search of “Lewis et al. 2011 skulls” quickly confirms). Clearly, something else was going on that piqued reporters’ and bloggers’ interest.

Partially, perhaps, the impact of Lewis et al.’s paper can be attributed to the target of its attack: evolutionary biologist Stephen J. Gould. Gould spent much of his career at Harvard University, where he published technical scientific papers in paleontology, zoology, and evolutionary biology, as well as over 20 books for lay audiences about science. Gould was, and remains, a divisive figure. His strong opposition to “genetic determinism” led to some very public fights with other science popularizers, such as Richard Dawkins and E.O. Wilson, whose work he viewed as encouraging naïve views of the relationship between genes and development. Gould’s longstanding commitment to anti-racism came together with his concern about simple-minded genetic explanations offered by “hereditarianism,” the ultra-genetic determinist view that human behaviors are caused by specific genes that are fixed in their effects and impervious to changes in living and rearing conditions, and that genes that matter to important traits like intelligence vary among the “races.”

In one of his popular books, The Mismeasure of Man, Gould set his sights on Samuel G. Morton, a 19th century American physician who catalogued and reported the cranial volumes of human skulls he collected while working at the University of Pennsylvania; importantly, these skull measurements were organized in Morton’s writings by race [5]. Gould argued that Morton believed the races could be ranked by intellectual ability, and that Morton thought that his measurements proved it. (In fact, it isn’t clear what, if anything, Morton meant his skull measurements to prove — more on this later.) Gould also argued that Morton’s racial biases had led him, unconsciously, to mis-analyze the skulls in his collection in ways that systematically advantaged “whites” and systematically disadvantaged “blacks” (and, indeed, all the other races). Properly analyzed, Gould continued, the skulls in Morton’s collection revealed no differences in sizes worth mentioning, demolishing both Morton’s claims to objectivity and the latter contention that skull sizes varied significantly with “race,” and hence undermining Morton’s goal of linking intellectual ability and race.

Lewis et al. claimed that Gould was wrong, and that Morton was correct. We can’t help but think that at least part of the reason their paper garnered serious attention was the implication people drew from this conclusion: that Morton’s preconceived racial notions were not so wrong after all. This would definitely be welcome news to racists, and sure enough, members of the White Supremacist website StormFront immediately trumpeted Lewis et al.’s results as proving that Gould was a fraud, and took them to be broadly supportive of their explicitly racist agenda [6]. And it is worth remembering that Nicholas Wade, as the science editor for the New York Times, was, at least in part, responsible for the unusual degree of attention that Lewis’ paper received (getting written up in the Times is still a good way to get noticed!); it was only later that Wade’s explicitly racist agenda became common knowledge (with his publication of A Troublesome Inheritance: Genes, Race and Human History). Speculating that Wade publicized Lewis et al.’s paper to support his racist program seems, on the whole, not entirely unreasonable.

The arguments surrounding racial differences in intelligence are complex. Reasonings that evoke issues of nature versus nurture are treading into biologically difficult waters, both conceptually and empirically [7]. Genes are no doubt important mediators of the way we look, our behavior, our intelligence, etc., but so are environmental influences. Rigorous studies with animal models reveal that many things one might think of as genetically hard-wired, like intelligence, creativity, weight, and even hair and eye color, are sometimes drastically sensitive to environmental influences [8]. For example, rats that are “dumb” in one environment can be “smart” when raised in a different environment [9], and they can have different colors of fur depending on the diet that was fed to their mothers [10]. This complicated interrelationship of genes and environments is one of the reasons why studies on the malleability (or lack thereof) of a bewilderingly complex trait like human intelligence are so problematic.

But the history of hereditarian hypotheses isn’t just complex, it’s also ugly. Research programs that attempted to quantify differences in “innate” intelligence among humans and tie them back to racial differences were popular through the first half of the 20th century (growing into diverse and now discredited fields such as phrenology and eugenics), and these research programs were used to defend horrific practices like slavery and colonialism, segregation, and, more generally, the active creation of social and economic inequality surrounding racial ascriptions. The arguments in favor of these positions were found, on reflection, to be full of holes, based on poor justifications and incorrect assumptions, and were later resoundingly disavowed by the overwhelming majority of geneticists.

Echoes of these discredited ideas, however, return with dispiriting regularity. The 1994 book The Bell Curve by Richard Herrnstein and Charles Murray (neither of whom are geneticists) revived popular interest in these theses, and, as noted above, Wade has recently toyed with these ideas as well. Contemporary researchers explicitly promoting racist science are rare, but seem to be a permanent part of the intellectual landscape.

Lewis and his colleagues explicitly disavowed these racist positions, and indeed praised Gould for his anti-racist work (though some of the authors were substantially less kind to Gould in interviews and their non-academic publications on these issues [11]). But their article has had unfortunate (and predictable) resonances with these traditions. One cannot assert that Morton’s work on skulls was broadly accurate, and attack Gould’s work on the skulls, without at least implying that Morton’s claim that skulls of “Africans” were much smaller than skulls of “Caucasians” was right, and that average skull sizes really do vary in just that way. This cannot help but give at least some comfort to those who would jump from that to the conclusion that intelligence must differ across races as well.

Whatever Lewis and his colleagues were hoping to do with their paper, they pretty much made a hash of it in the end. Perhaps the single weirdest thing is that they became famous chiefly for carefully re-measuring skulls in Morton’s collection. That their measurements broadly agreed with Morton’s was the main result reported — the claim that was splashed across headlines and lit up the blogosphere.

But Gould never claimed that the measurements of Morton’s that Lewis et al. compared their results to were biased; indeed, Gould stated, quite plainly, that Morton’s physical measurements of his skulls were both reliable and accurate in the end, once Morton had cleaned up and refined his methods. So Lewis et al. were comparing their data to data that Gould said were accurate, and reporting the congruence as if it were news and somehow showed Gould to be wrong. What makes the whole enterprise odd is that Lewis et al. knew this, and yet they still portrayed their work in a way that allowed the media and the Internet to run wild.

Lewis et al.’s other criticism was that the skulls Gould decided to include or exclude, and the ways that Gould decided to identify groups of people was less justifiable than the way Morton did it. They therefore argued that Gould’s analysis of Morton’s data was more biased more flawed — than Morton’s own. Morton, according to Lewis et al., was properly objective in his analysis, and it was Gould who let his biases undermine his objectivity. On this point, we believe that Gould, Lewis et al., and Morton were all hopelessly confused.

The problem, as we show in a paper recently published in Studies in History and Philosophy of Science [12], is at least threefold: (1) the sample of skulls in Morton’s collection is not necessarily representative of the groups from which they were drawn; (2) the “races” Morton identified are, from the standpoint of modern genetics, at best problematic; (3) the point of Morton’s collecting and measuring skulls isn’t at all clear, and without clear questions, arguments about the best methodology (for what purpose?) are beside the point.

First, note that Morton did not collect his skulls in a scientifically sound way. He got them from whatever places his associates found it most convenient to, let’s say, “appropriate,” them. Some were stolen from graves, some were taken from archeological sites, etc. Skulls came to him with information on their provenance provided by the “suppliers.” How far we should trust these descriptions, how thorough and accurate they were, is of course debatable. In order to use a sample to estimate an average, one needs to have good evidence that the sample in question is appropriately representative of the groups from which it was drawn. In the case of Morton’s skulls, the result does not constitute a statistically sound sample in any sense. Garbage in, garbage out, as they say.

Second, the racial groups to which Morton, Lewis et al., and Gould credulously assigned the skulls have little bearing to any biological reality. For example, the skulls from what Morton called the “Negro Race” represent a collection of peoples that are genetically heterogeneous. In part, this is because it includes “Native African” skulls, and Africa contains more genetic diversity than the rest of the world combined (groups of people from different places in Africa can be more genetically different from one another than Swedish people and Japanese people, for instance [13]); and in part because Morton lumped other people with “black” skin, e.g. Australian Aboriginals, into the same “race,” despite the fact that they are not particularly closely related. Gould removed the Australian Aboriginals when he recalculated the “Negro Race” averages, but this falsely implies that there isn’t anything wrong with considering the “Native African race” a legitimate single entity. It isn’t. Again, genetic diversity within Africa is both extensive and subdivided. Or consider Morton’s “American Group.” On his analysis, this “race” contains the “Toltecan Family” which consists of two “populations” — the “Mexicans” and the “Peruvians.” Not only does this bear no relationship to any biological reality, but it isn’t at all clear in this case what would.

Even if we re-assigned the racial groupings of the skulls based on genetic similarities, what would we gain?  The seemingly simple project — estimate the average skull volume of each “race” — hides enormous complications. How should we average the skull sizes of different peoples, when the groups we are estimating vary in size (and whose historical sizes were very different), are related to each other in different degrees, with varying degrees of gene-flow between them, etc.? It is completely unclear how one should even begin to approach such a question.

Third, since it isn’t at all clear what question Morton was trying to answer, if any, it isn’t a fortiori clear what evidence he even should have gathered. Gould presumed that Morton wanted to use skull volume as a proxy for intelligence, in order to prove that the “races” differed in native intellectual ability. Morton certainly did believe that the “races” differed in native intelligence, but it isn’t obvious that this was why he was obsessed with measuring skulls. Now, it is obvious that if this was Morton’s goal, his methods were hopelessly inadequate to the task. As noted above, even leaving aside the problems with his samples, and with the details of his analyses, the difficulties in teasing apart environmental effects on development from genetic differences (not that Morton knew anything about genes, of course!) stymie even contemporary researchers. Other writers argued that Morton had a different racist goal in measuring skulls — to prove “polygenesis,” that is, that the different “races” were each created by God as separate entities. We hope it is obvious that if this was Morton’s goal, it is a goal so at odds with contemporary biology that no evidence could possibly be relevant to it.

Some historians have argued that, again, while Morton had many racist beliefs, his work on skulls was just an attempt to gather data with no particular purpose. Indeed, during the same time he was producing his big Catalog of Skulls, he was also publishing detailed descriptions of fossilized crocodile skulls, of all things! And even his Catalog of Skulls contains a surprising number of descriptions of nonhuman (birds, reptile, fishes, other mammal) skulls. But if there is no question to be answered, and one is just measuring whatever skulls are available because one happens to have a fetish for skulls, it doesn’t make sense to ask what methods should be deployed (to do what, exactly?).

Taken together, these problems turn the whole debate surrounding who was most wrong into nonsense. The skull collection is doomed by being assembled in a way that makes any interesting scientific analysis of the groupings of peoples all-but impossible. The basic conclusion at which we arrive regarding Lewis and colleagues versus Gould is “a pox on both your houses!” Morton’s data is simply not useful for anything, and talking about “races” as people perceived them at some point in history is not scientifically relevant.

What is troubling is that the Lewis and colleague’s paper passed through peer review in such a high-profile journal and picked up so much popular media attention, leaving many people with the erroneous impression that there is evidence suggesting that individuals of different “races” really do differ in their skull sizes, and that this then tells us anything of any interest at all. That Lewis and his colleagues work, surely unwittingly, gives cover to racists is even more unfortunate.


Joshua Banta is an integrative biologist at the University of Texas at Tyler who studies genetics, evolutionary biology, and conservation biology.

Jonathan Kaplan is a philosopher at Oregon State University. His main areas of interest are the philosophy of biology and political philosophy.

Massimo Pigliucci is a biologist and philosopher at the City University of New York. His main interests are in the philosophy of science and pseudoscience. He is the editor-in-chief of Scientia Salon, and his latest book (co-edited with Maarten Boudry) is Philosophy of Pseudoscience: Reconsidering the Demarcation Problem (Chicago Press).

[1] The Mismeasure of Science: Stephen Jay Gould versus Samuel George Morton on Skulls and Bias, by J.E. Lewis et al., PLOS Biology, 7 June 2011.

[2] Scientists Measure the Accuracy of a Racism Claim, by Nicholas Wade, New York Times, 13 June 2011.

[3] The mismeasures of Stephen Jay Gould, by Brandon Keim, Wired, 14 June 2011.

[4] Mismeasure for mismeasure, Nature editorial, 23 June 2011.

[5] Morton’s compilations are available for download here and here.

[6] Stephen J. Gould & the Evolutionary Fraud,

[7] Phenotypic Plasticity: Beyond Nature and Nurture, by Massimo Pigliucci, Johns Hopkins University Press, 2001.

[8] See, for example: Epigenetics: DNA Isn’t Everything, Science Daily, 13 April 2009 / The agouti mouse model: an epigenetic biosensor for nutritional and environmental alterations on the fetal epigenome, by Dana C. Dolinoy, Nutrition Review, August 2008. / Nurturing brain plasticity: impact of environmental enrichment, by L. Baroncelli et al., Nature, 18 December 2009. / Lab animals and pets face obesity epidemic, by Alla Katsnelson, Nature, 24 November 2010.

[9] Effects of enriched and restricted early environments on the learning ability of bright and dull rats, by R.M. Cooper and J.P. Zubek, Canadian Journal of Psychology, 1958.

[10] Nutrition and the Epigenome, Learn.Genetics.

[11] Holloway is quoted calling Gould a “charlatan” in Wade’s piece in the NYT linked above. The editorial in Nature (link above) has the following claim: “Although the new paper does not accuse Gould of intentionally misrepresenting Morton, some of its authors have raised this possibility in interviews, noting that Gould’s oversights would be less troubling were he known to be a less meticulous scholar.”

[12] Gould on Morton, Redux: What can the debate reveal about the limits of data?, by Jonathan M. Kaplan, Massimo Pigliucci and Joshua A. Banta, Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 7 February 2015.

[13] African Genetics Study Revealing Origins, Migration And ‘Startling Diversity’ Of African Peoples, Science Daily, 2 May 2009. / The Evolution of Human Genetic and Phenotypic Variation in Africa, by Michael C. Campbell and Sarah A. Tishkoff, Current Biology, 27 September 2010.


Categories: essay

Tags: , , ,

57 replies

  1. Hi Jedi,

    Sorry, but one deviation does not a refutation make.

    Just as a bit of research by a couple of guys does not solid science make.

    Just look at the data and explanation provided by Weeden and Kurzban in their book (or in the Vox article for that matter): they don’t even claim that all people’s political positions can be predicted by their framework, only that a lot of the variation can.

    But they make this claim on their website, as I pointed out. The fact that they use such an unscientific gimmick for selling the book tells a good deal about their seriousness.

    But who cares about touchiness.

    Not I. But I do care about proper scientific evidence. I have seen too many fly by night theories evaporate in the morning.

    I have too much insight into the various ways that data can be made to fit the theory, even without any conscious manipulation.


  2. Dear Massimo. In 500 words, it seems difficult to carry out much of a sensible discussion. You suggest that GxE cannot be dealt with in nonexperimental designs – there is now a literature on multi-trait multi-environmental mixed models incorporating dense genomic data being currently applied in agricultural genetics (extensions of genomic selection models). These models can be shown to be equivalent to reaction norm models. Similar models can be used in human psychiatric/behavioural genetics, given the right environmental covariates have been collected. Studies such as that of Caspi et al, attempt to look at GxE in a longitudinal framework (environmental stressors are measured at Time 1, and effects on risk of depression at T2 of these, and their interactions with genotype. The effects of the environmental stressors are measurable (and robust across studies), and causation inferred through the longitudinal design. Similarly, Mendelian Randomization is an instrumental variables approach to testing causation, here for putative environmental risk factors. So we have a few tools that can be used to look at these problems which you imply to be impenetrable,
    Regarding the follies of the psychological sciences that you impute, I think we will just have to disagree, and wait for more results.
    Regarding race or ethnic difference, Neil Risch is one human geneticist I know who has pushed the utility of the concept for certain purposes. Differences in rates of hypertension between Americans of African and European descent might or might not completely cultural/environmental in nature, but for albuminuria (an associated trait), there are important genetic differences at work. Whether this concept of race has much in common with pre-20th century concepts is a different matter.


  3. Curious debate. On one side, we have folks who deny objective science, who defend Marxism, who brag about being anti-racist while calling scientists and reporters racist, and who oppose saying anything that might encourage the anonymous posters on Stormfront. If presented with data or quotes to refute what they say, they brag how they are smarter and have a superior understanding.

    Gould’s book is an embarrassment to modern science. Defending it is like defending Soviet Lysenkoism. It is just bad science that is promoted for leftist ideological or self-interest reasons. And yes, it is in the self-interest of white male soft-subject professors who make a career out of denouncing racism and pseudoscience.


  4. I see a lot of strange and mostly irrelevant arguments here.

    To begin with, people are placing great stock in the question of whether, from a biological point of view, there are distinct races, and whether they map onto the “folk” races.

    But of course we know from clustering techniques that there exist distinct population groups at a high level, separated by natural boundaries, and that these population groups have had relatively little gene flow across those boundaries. These groups map reasonably well into the folk races. These groups can be split into still smaller population groups, or lumped together at a still higher level. That they can be either thus split or lumped together doesn’t imply that they aren’t biologically distinct at the level in which there are five major groups; that is just one way of organizing human beings — a way that has biological meaning. Yes, there are other ways of organizing human beings. But that doesn’t imply that the level of the 5 groups doesn’t have biological meaning. All such organizations have biological meaning; some may be more important for one purpose or other than do other such schemes.

    Now another argument is that the so-called “races” blend very smoothly into each other at the boundaries, so that whatever distinctness they may have biologically when segmented by standard techniques just isn’t very important biologically or otherwise. It may be far better, some argue, to say that all human beings are clines of one species, and that the distinctness is mostly illusory, even if it can be recovered by clustering techniques.

    What’s certainly true is that, from a biological point of view, the amount of genetic separation between populations, and the differentials across boundaries or distances, can exhibit any set of continuous values. As is usual in biology, any number of ways exist to respond to the values actually found, in terms of understanding the populations. There probably isn’t an obligatory way to understand these values in general, or with regard to human beings in particular. Whether we say human beings are, across the standard geographical boundaries for the folk races, actually different races or different clines, is mostly irrelevant biologically. Biologically, what’s important is to know what the underlying values are, across these boundaries, and within these boundaries. All else is mostly a verbal argument.

    Now what does this have to do with “scientific racism”?

    Well, let’s just grant the point of those opposed to races, and say outright that there are no “races” of human beings, but only “clines”.

    What does that get them? I would argue that it means nothing for the thing they really care about: the potential for different frequencies of genes across population groups which affect, perhaps greatly, socially important traits. It simply doesn’t matter whether such genes or others are in smooth transition across boundaries. The two groups may still significantly differ on average. Gene flow across the distances can still be so low that the averages are quite disparate.


  5. Jedi Master,

    See (rather, listen to) Dylan’s “tweedle dee and tweedle dum” – so this doesn’t require any science, sorry.

    As to the ‘evolutionary psychology’ W&K presume (a pseudo-science, really), see John Dupre’s “Human Nature and the Limits of Science.”

    You want ‘science’ to validate your world-view; upon careful analysis, it just cant do that.

    Really, you got nothing. All the W&K stats are jiggered; and to those who don’t buy the Ev-Psych paradigm, this is obvious.

    And there is your agenda – Ev-Psych political punditry to convince us to stop voting – which will result in Republican victories in elections.

    Absolutely no one doubts that politics and elections are about self-interest – SO WHAT?! That’s what politics is all about. If you don’t like it, don’t vote. I just don’t see the ‘science’ in this, I’m sorry.


  6. candid_observer,

    “Biologically, what’s important is to know what the underlying values are, across these boundaries” – no, biologically what’s important is to begin with this: that we are all of one species, with an indefinite capacity for variant expression of genetic material; which diversity increases through inter-mating. The effort to close any of this off is suspect, since it effectively argues that science a) should be used to a purpose, and b) needs to be shut down when it reaches the limit of this purpose.

    So candid_observer, what is your purpose?

    Schlafly, what is your purpose?

    C. Van Carter’s purpose is obvious – heighten suspicion of those those with greater melanin in their skin. This achieves what? – besides greater antagonism and social repression of those you don’t like.

    Look, the science of this is in – we are all one species, and we inter-mate within that species to generate greater variation and diversity. I’ve already suggested what you can do if you don’t like that – they say there is the possibility of colonizing Mars in a few years. But such a colony would probably include those of African descent. Perhaps you have nowhere to go. But I can’t say I feel any sympathy for you.

    Liked by 1 person

  7. Fifth comment, ignoring those who didn’t respond to my rhetorical question, or the late arrival of the same ilk who thinks racialist-veneered talk is “objective science.” (That said, I’ve added both “racism” and “racialism” to my ongoing project of a modernization of Ambrose Bierce’s “Devil’s Dictionary.”

    I’m going to take this back to essays of a few months back, a couple of great philosophers, and another issue: presentism.

    Either David Hume and Aristotle had some racist bones, or they didn’t. “Race” was no more scientific 250 years ago, or 2,350 years ago, than today. Aristotle’s general aversion to investigative natural philosophy is not an excuse. And Hume, a full century after Newton, and the developing of the scientific method, certainly had no excuse.

    And, on issues like this, it IS fair to make judgments of the past. If every single person, not in the world, but in Hume’s Britain or Aristotle’s Attica, held such sentiments, then their societies would never have evolved to (overall) accept the fact that race is not a scientific concept, but rather a sociological one created by certain groups for sociological reasons, whether said creation is more conscious or more unconscious.

    But said societies, and the US, did evolve, and to riff on Booker T. Washington, did move “Up from Racism.” (That said, Project Implicit shows that we’ve not come “up” as much as many of us would like to think. Take the tests at this link:

    Any sociological concept that is not scientific, but is held, or has been held in the past, by some individual or group that has at least partial occasion to know better, can be so adjudged. And should be.

    Else, somebody 250 or 2,350 years from now would look back on us, give us a half-sincere, half-mocking collective pat on the head and excuse the strands of racism that still exist today.


%d bloggers like this: