Rosalind M Harding & John H Relethford. Encyclopedia of Life Sciences. Volume 23, Wiley, 2007.
The origins of modern human diversity have been debated within a framework set by two hypotheses: ‘Out of Africa’ versus ‘multiregional’ evolution. For a better understanding of how human populations evolved, a new framework for analysis of genetic data is needed.
‘Out of Africa’ versus the ‘Multiregional’ Hypothesis
The ‘Out of Africa’ hypothesis proposes that morphologically modern humans evolved as a new species (Homo sapiens) in Africa within the last 200,000 years and subsequently replaced all archaic hominid groups outside of Africa between 100,000 and approximately 30,000 years ago. This hypothesis assumes that the distinctive regional morphologies of hominid groups in the middle to late Pleistocene represent several taxonomic species. In the current controversy around human origins, the challenge to the ‘Out of Africa’ hypothesis comes from a model for ‘multiregional evolution of Homo erectus/sapiens. Under ‘multiregional’ evolution, groups of morphologically archaic humans living across the Old World are assumed to form a subdivided population within which a transition from archaic to modern morphology occurred. Morphological differences between groups in the middle to late Pleistocene are taken to represent population diversity within a single species that originated as the emergence of Homo in Africa about 2 million years ago.
Most analyses of genetic variation have concluded in favour of the ‘Out of Africa’ hypothesis. However, none of these studies disproves ‘multiregional’ evolution. Certainly, models for polyphyletic human evolution make predictions that genetic data can disprove. But the hypotheses of ‘Out of Africa’ and ‘multiregional’ evolution present far more subtle differences in expected patterns of genetic variation. One question that may be asked of genetic data concerns the relationship between contemporary human populations and the morphologically archaic hominids that occupied Europe and Asia, such as the Neanderthals who persisted in western Europe until perhaps as recently as 28,000 years ago. Are there any lines of descent in the genome from these non-African archaic hominids into populations living today? This question may be answered eventually, but without any guarantee that the answer would resolve the debate. Consider the possibility that such lineages are found, but that they trace to very few regions of the genome, such as might be expected from a low level of admixture between modern and archaic hominids. The debate would then have to continue until the function of those gene regions and their contribution to a ‘modern human’ phenotype had been determined.
Central to the debate so far have been issues concerning taxonomic relationships, human ‘racial’ variation, and evidence for continuity between modern and archaic hominids from the same geographic regions. One potential source of genetic evidence for continuity is DNA sequences from the bones of archaic and modern hominids. Three studies of Neanderthal specimens report a substantial level of divergence between DNA sequences from Neanderthals and living modern humans and conclude that these data provide evidence of replacement rather than continuity. (These studies will be discussed in more detail below.) However, a fourth study highlights a problem in using DNA sequences as evidence against continuity. In that study, DNA from a morphologically modern Australian, dated to 60,000 years ago, also revealed a remote relationship to contemporary DNA sequences (Adcock et al., 2001). In fact, both morphology and DNA lineages have continued to evolve since the first emergence of modern humans in Africa. The same simple model used to explain the evolutionary distance between modern humans living now and Neanderthals by a species replacement event predicts a close evolutionary relationship between a modern human from the past and modern humans of the present. This same simple model cannot explain the observed discontinuity in the Australian lineage. Better models and more data are needed.
Before anything was known about patterns of genetic diversity across human populations, one of the anticipated possibilities was continuity through time of ‘racial’ lineages within continental regions and deep ‘racial’ divisions between continental regions. In line with the lack of compelling evidence from ancient DNA for continuity within geographic regions, even when expected from morphology, are genetic data from contemporary populations that do not show a substantial partitioning of diversity between ‘racial’ groups. Most single-nucleotide polymorphisms (SNPs) in the human genome with minor allele frequencies greater than 15% (in a global sample) are likely to have global distributions. While some gene loci show strong population differentiation due to selection, the average pattern for nonfunctional polymorphism across loci reflects close evolutionary relationships among populations. Furthermore, the apportionment of diversity among individuals within populations is much greater (.85%) than the diversity contributing to differences between populations (10-15%). These patterns of genetic variation show that the global population of living humans share ancestry that evolved within a single species. However, they do not tag our ancestors with their geographic locations.
Studies of evolutionary history can be based on either of the hypotheses suggested by the fossil morphology. Assuming the ‘Out of Africa’ hypothesis, the close relationships among globally distributed contemporary populations may be due to a short time available for divergence. Phylogenetic analyses of regional population differences put the oldest population divergence between African and non-African populations and date it to less than 200,000 years ago. These results appear to be compatible with ‘Out of Africa’. Alternatively, the close relationships between populations may be a consequence of extensive and on-going gene flow within a subdivided population compatible with ‘multiregional’ evolution. Some evidence from the β-globin gene on chromosome 11 (Harding et al., 1997) and also from the Y chromosome (Hammer et al., 1998) suggests that gene flow has not been restricted to a series of population splits. Further studies on patterns of gene flow assuming population subdivision are needed.
One argument that has been made against the ‘multiregional’ hypothesis is based on the commonly cited effective population size for modern humans of about 104. This estimate from genetic polymorphism in the nuclear genome is much smaller than an estimate on the order of 106 for Old World hominids, based on using archaeological evidence for their area of distribution and assuming densities observed for modern human hunter-gatherer groups. However, 104 represents the size of an idealized population that can support an equilibrium level of genetic diversity equivalent to that currently observed for humans. In this idealized population, numbers are constant through time, everyone in every generation finds a mate randomly, and the probability of having offspring is the same for every individual. Effective population size estimates from genetic diversity do reveal a much greater potential for genetic drift in human evolution than may be suspected from census numbers. But they are not useful indicators of actual numbers of individuals within a species. Effective population size estimates do not provide enough information to resolve the debate about human origins.
Support for the ‘Out of Africa’ hypothesis from genetic data has been based on the recent estimates for phylogenetic divergence between African and non-African populations and observations of greater diversity within Africa. In phylogenetic models, both divergence between populations and diversity within populations are assumed to accumulate with time since a species’ origin. Population genetic models introduce limits on the accumulation of population divergence and diversity under equilibrium conditions, through the processes of gene flow and genetic drift. Qualitatively at least, the same patterns of population diversity used as evidence to support ‘Out of Africa’ also can be explained by a balance of mutation, genetic drift and gene flow, consistent with ‘multiregional’ evolution. To test between the ‘Out of Africa’ and ‘multiregional’ hypotheses with genetic data available mainly from living populations, a new set of questions must be posed to enquire into the demographic past of our direct ancestors.
Evidence from DNA Sequences
The most important evidence that has been claimed as proof of ‘Out of Africa’ and inconsistent with ‘multi-regional’ evolution has come from studies that reconstruct ancestral genealogy using sequences of mitochondrial DNA (mtDNA). These studies suggest that a common ancestor for living humans, who has become known as ‘Mitochondrial Eve’, is likely to have lived less than 200,000 years ago. In the original study a tree was constructed from 133 mtDNA sequence types present in 147 individuals sampled from around the world (Cann et al., 1987). mtDNA sequences are amenable to tree construction because their inheritance is haploid and exclusively maternal. Without copies inherited from fathers, there are no opportunities for recombination to scramble allelic diversity. Consequently, tree structure generated by unique mutations is preserved. Sequence divergence across a primary split between a small cluster of sequence types found only in Africa and the rest of the sequences was calibrated assuming a molecular clock. Divergence rates of 2-4% per million years suggested a date for the common ancestor in the range of 290,000-140,000 years ago.
The opportunity to examine trees from mtDNA sequences prompted an analysis using phylogenetic methods. Phylogenetic methods are not ideal, however, for estimating genealogical relationships among individuals sampled from within a population. First, whereas the data used to measure time in a phylogeny are the sites in the DNA where variants between species have gone to fixation, mtDNA data are polymorphisms from a sample of individuals. Even when a constant rate of substitution for fixed variants justifies an assumption of a molecular clock over divergence time, frequency change of polymorphic variants under drift does not occur at a constant rate. Second, phylogenetic analyses ignore the process of gene flow. Third, phylogenetic analyses represent the compound processes of mutation and genetic drift as a simple average evolutionary rate, estimated for humans over 5 million years of human-chimpanzee divergence. Polymorphism data derive from the last phase of this interval, when genetic drift could be either much more or less, which is important in comparison with its average contribution. Substantial information about genetic drift is available from polymorphism data and ignoring this information in favour of using the average rate over divergence is not to be recommended. Fourth, because a phylogeny reconstructs a single evolutionary history, it should be based on data taken from many independent loci in the genome.
A more appropriate model of evolutionary history for mtDNA allows that the data are from a single gene, comprising a sample of one from a distribution generated by processes of mutation and genetic drift. Population genetic theory for the evolution of DNA sequences was developed by Tajima (1983), who described gene genealogy, and independently by Kingman (1982), who published the first papers on the coalescent. Because of the popularity of the term ‘coalescence’ in recent reviews of mtDNA studies, there may be a mistaken impression that Kingman’s coalescent has been widely used in application to mtDNA data. In fact, very few analyses of mtDNA sequences apply coalescent models. However, the use of coalescent models is increasing, and they have been applied in studies of sequence diversity from the nuclear genome, including several of Y chromosome diversity.
One of the important results of gene genealogy and coalescent theory is that divergence between clusters of sequences is shown to be a pattern that may be expected as a consequence of genetic drift in a random-mating constant-sized population. Interpreting the same pattern within a phylogenetic framework may incorrectly suggest divergence between subspecies or population groups. Looking backwards in time from the present, studies of gene genealogy for random-mating constant-sized populations show that typically, the total coalescence time for the two most divergent sequence lineages is twice as long as the coalescence time of an average pair of sequences. Looking forwards in time from the generation of the most recent common ancestor, many lineages are initially established by reproductively successful founders. But by the time of the present generation, all but two of the original lineages are expected to have been lost by drift. Those two lineages identify our most recent common ancestor for a particular locus. Over the time during which these two lineages evolve, they may accumulate twice as many mutational differences as a pair of lineages chosen randomly from the present generation. Gene trees for a number of loci in the nuclear genome do suggest that global human diversity evolved in a single population connected by gene flow and subject to genetic drift.
The choice of methods for analysing sequence data have changed, but the main conclusions from the original study of mtDNA sequence variation have not. Although recurrent mutation in some polymorphic sites, mainly in the hypervariable D-loop, creates conflicts in mtDNA data for tree resolution, a variety of tree and network construction methods that have been used since usually recover the same essential structure in mtDNA data. Many subsequent studies collectively confirm the estimate for ‘Mitochondrial Eve’ within the range originally given as 290.000-140.000 years ago, and also identify an African root. The implication of a recent species origin has been investigated in subsequent studies of ancestral demography. One study that used the coalescent to simulate exponential population expansion found that typical patterns of frequency distribution observed for mtDNA sequences taken pairwise, do suggest that numbers of female founders for contemporary diversity could not be much greater than 50 (Marjoram and Donnelly, 1994). The evidence of mtDNA data, taken on its own, does appear to support the ‘Out of Africa’ hypothesis, not only implying recent common ancestry but also an exponential growth phase out of a tight bottleneck.
Perhaps the most important feature of mtDNA is its abundance. Large numbers of mitochondria in the cytoplasm of each cell greatly enrich copy numbers of mtDNA sequences compared with single-copy DNA sequences in the nucleus. High copy number greatly increases the chances of detecting mtDNA in samples that retain only a few fragments of degraded DNA. Only for mtDNA have diversity studies of contemporary populations been augmented with sequences amplified from Neanderthal individuals. One mtDNA sequence is from the original Neanderthal type specimen found in Feldhofer Cave in Germany and assigned an uncertain date between 35,000 and 70,000 years ago. A second mtDNA sequence is from a specimen found in Mezmaiskaya Cave in the northern Caucasus and dated to 29,000 years ago. A third mtDNA sequence is from a Neanderthal bone found in Vindija Cave, Croatia, and dated to over 42,000 years before the present. These three Neanderthal sequences differ by more base pairs than a random pair of sequences from contemporary Europeans, but not by more than occurs between pairs of sequences sampled randomly from contemporary African populations (Krings et al., 2000).
Setting modern human and Neanderthal sequences within a phylogeny that includes sequences from a range of gorilla and chimpanzee populations provides further striking evidence for a bottleneck in the origin of human mtDNA diversity (Gagneux et al., 1999). Whereas chimpanzee and gorilla diversity divides into clades that distinguish several geographic subspecies, mtDNA lineages from living humans cluster as a single clade. This clade excludes the Neanderthal sequences. The level of divergence between modern humans and Neanderthals is similar, however, to that observed for mtDNA lineages between central African and west African subspecies groups of the common chimpanzee Pan troglodytes.
Within-species diversity for humans and the common chimpanzee, together with divergence between chimpanzee subspecies, has also been examined using polymorphism data from the X chromosome at Xq13.3 (Kaessmann et al., 1999a). Phylogenetic analysis shows that Xq13.3 sequences from different chimpanzee subspecies intermingle and do not segregate into regionally specific clades. Population genetic analyses of within-species diversity at Xq13.3 for both humans and chimpanzees suggests substantially greater time depth than estimated for mtDNA diversity. These studies suggest that mtDNA, by comparison with the X chromosome, has been subject to a greater loss of lineage diversity in recent evolutionary history, both for chimpanzees and humans. What may be the explanation for the loss of mtDNA diversity?
One way of explaining the different patterns in mtDNA diversity compared with X chromosome diversity in both humans and chimpanzees is with population genetic models that assume loss of diversity by genetic drift. In contrast, models that assume exponential population growth out of a small number of founders impose very little loss of diversity by drift and predict comparable total coalescence times for all loci irrespective of their location in mtDNA, or on X, Y or autosomal chromosomes. A model that assumes expansion out of a speciation bottleneck cannot explain both shallow lineages for mtDNA and deep relationships between Xq13.3 lineages unless Xq13.3 diversity is being maintained by selection. A model that assumes random mating and constant size predicts a loss of lineages by drift for mtDNA and the nonrecombining Y chromosome that is 3-fold greater than for loci on the X chromosome and 4-fold greater than for loci from autosomal chromosomes. The impact of genetic drift in models that assume population subdivision rather than random mating is even greater and increases the variance in total coalescence times among loci.
To accommodate all the data observed so far, a feasible model for the evolution of modern humans is one that assumes a subdivided population connected by gene flow. The level of sequence divergence between Neanderthals and contemporary modern humans does suggest that Pleistocene hominids formed a regionally subdivided population, possibly consistent with the ‘multiregional’ evolution of modern humans. Assuming subdivision, there is a reasonable likelihood that mtDNA data would not detect admixture between Neanderthals and modern humans, had it occurred (Nordborg, 1998). Consequently, mtDNA data alone are not sufficient to prove or disprove the ‘Out of Africa’ hypothesis, and data from the X and other chromosomes are needed.
Was There a Population Bottleneck?
All mtDNA sequence diversity traces back to a single common ancestor, but ‘Eve’ is not necessarily the same ancestor who would be identified by analysing diversity at any other locus. Obviously, tracing Y chromosome diversity back to ‘Adam’ adds another common ancestor. Did ‘Adam’ live at the same time as ‘Eve’ and does diversity in the rest of the nuclear genome trace back to a small founding group living about 130,000 years ago? Trees for both mtDNA and Y chromosome sequence diversity have characteristic features that are consistent with a bottleneck for modern humans followed by rapid population and geographic expansion. Often described as star-shaped, these trees show large numbers of rare allelic variants radiating from common sequence types and few, if any, pairs of common allelic variants that are distinguished by a relatively large accumulation of unique mutational differences. Evidence for a population bottleneck at about the same time as the evolution of modern human morphology would be consistent with speciation.
An expansion out of a very small bottleneck predicts the same time depth for diversity in the nuclear genome as for mtDNA or Y chromosome diversity. Support for this prediction comes from a population genetic analysis of microsatellite diversity sampled from the nuclear genome. Assuming expansion out of a founding bottleneck, the total coalescence time was estimated to be approximately 200,000 years (Gonser et al., 2000). The same diversity provided the usual estimate for effective population size of 104 under assumptions of constant population size and random mating. This study shows that typical nuclear diversity could have accumulated within the same time frame that has been estimated from phylogenetic analyses of mtDNA sequences. But is it the best model to account for the patterns observed for polymorphisms in the nuclear genome?
Gonser et al. (2000) concluded that microsatellite data provided greater support for an expansion from a bottleneck than for a model with constant population size and random mating. The shapes of trees constructed with mtDNA, Y chromosome or Xq13.3 sequence variants also suggest that population expansion may be a more appropriate assumption than population constancy. However, an analysis of Y chromosome diversity that assumed expansion out of small numbers of founders found that ‘Adam’ might have lived much more recently than ‘Eve’ at perhaps only 50,000 years ago (Thomson et al., 2000). This finding cannot be explained by expansion out of speciation bottleneck. Other population genetic forces must be important.
For most loci in the nuclear genome, recombination limits the length of DNA expected to reveal gene tree structure. However, sufficient nuclear DNA data have now been accumulated to show a wide range of gene tree structures. Some contrast strongly with the star-shape for mtDNA. Estimates for total coalescence times also vary greatly. Analyses that assume constant population size have given estimates for total coalescence times of 535,000 years for Xq13.3 diversity (Kaessman et al., 1999b); 1.86 million years for diversity in the pyruvate dehydrogenase E1α (PDHA1) locus also on the X chromosome (Harris and Hey, 1999); 750,000 years for diversity at the β-globin locus on chromosome 11 (Harding et al., 1997); and 311,000 years for the apolipoprotein E (APOE) locus on chromosome 19 (Fullerton et al., 2000). Overall, these genetic data provide some evidence for regional bottle-necks and population expansions within the last 100,000 years but do not imply a small number of founders living at the same time consistent with a speciation bottleneck.
From a phylogenetic perspective, the lack of convergence of total coalescence times on a likely date for a species origin may seem deeply puzzling. Part of this variability is probably due to selection, but the dominant process influencing patterns of nonfunctional diversity in most loci has been genetic drift. Models that allow genetic drift and gene flow in a structured population predict substantial variability in total coalescence times and can account for recent ancestry of mtDNA and Y chromosome diversity as well as much older ancestry for diversity from many autosomal or X chromosome loci. From a population genetic point of view, a wide range of dates including a very recent date for a common ancestor of global Y chromosome diversity is not surprising. The expected variability for total coalescence times is high under assumptions of constant population size and random mating, and even higher if the size of the population varies over time and if the population is subdivided and connected by gene flow, rather than randomly mating. Genetic data combined over multiple loci are likely to be more easily explained with a model for recent regional expansions out of a subdivided population than with a model for a single expansion out of a small African founding population. But further studies are needed to explore the implications of ‘multiregional’ evolution across the Old World since the Middle Pleistocene and to judge whether a subdivided ancestral population might have extended beyond sub-Saharan Africa.
Conclusions
Questions about the origins and evolution of human populations were originally motivated by phylogenetic studies of the fossil record. The application to genetic data of phylogenetic analyses has revealed evidence consistent with ‘Out of Africa’ but also has failed to evaluate fairly the compatibility of genetic data with ‘multiregional’ evolution. Phylogenies provide models for species divergence on time scales of millions of years but cannot reveal details needed for testing between the ‘Out of Africa’ and ‘multiregional’ hypotheses. Investigating these details requires population genetic models that focus on the turnover of transient polymorphism within intervals of tens to hundreds of thousands of years. Only with models for the balance of mutation, genetic drift, selection and gene flow can ‘Out of Africa’ be tested against the ‘multiregional’ hypothesis.
Overall, patterns of diversity in the nuclear genome are not consistent with predictions based on mtDNA for a recent origin within the last 200,000 years and an expansion of modern humans out of a population bottleneck at speciation. Scientific understanding has been distorted by biblical analogy. ‘Mitochondrial Eve’ was just one of a large number of individuals living at different times in the past who would turn out to be ancestors of most of the people living on earth in a distant generation. This is the reality of sex and recombination, which reassorts genes into multiple genetic backgrounds with different ancestral histories. However, whether ancestral lineages of diversity in living populations are judged to extend back exclusively into an African population more than 100,000 years ago, or into a subdivided multiregional Old World population, is a question that needs further study. Critical evidence from genetic data that clearly excludes either the Out-of-Africa or multiregional hypotheses has not yet been presented.