How genetics is settling the Aryan migration debate


Tony Joseph
June 16, 2017

New DNA evidence is solving the most fought-over question in Indian history. And you will be surprised at how sure-footed the answer is, writes Tony Joseph

The thorniest, most fought-over question in Indian history is slowly but surely getting answered: did Indo-European language speakers, who called themselves Aryans, stream into India sometime around 2,000 BC – 1,500 BC when the Indus Valley civilisation came to an end, bringing with them Sanskrit and a distinctive set of cultural practices? Genetic research based on an avalanche of new DNA evidence is making scientists around the world converge on an unambiguous answer: yes, they did.

This may come as a surprise to many — and a shock to some — because the dominant narrative in recent years has been that genetics research had thoroughly disproved the Aryan migration theory. This interpretation was always a bit of a stretch as anyone who read the nuanced scientific papers in the original knew. But now it has broken apart altogether under a flood of new data on Y-chromosomes (or chromosomes that are transmitted through the male parental line, from father to son).

Lines of descent

Until recently, only data on mtDNA (or matrilineal DNA, transmitted only from mother to daughter) were available and that seemed to suggest there was little external infusion into the Indian gene pool over the last 12,500 years or so. New Y-DNA data has turned that conclusion upside down, with strong evidence of external infusion of genes into the Indian male lineage during the period in question.

The reason for the difference in mtDNA and Y-DNA data is obvious in hindsight: there was strong sex bias in Bronze Age migrations. In other words, those who migrated were predominantly male and, therefore, those gene flows do not really show up in the mtDNA data. On the other hand, they do show up in the Y-DNA data: specifically, about 17.5% of Indian male lineage has been found to belong to haplogroup R1a (haplogroups identify a single line of descent), which is today spread across Central Asia, Europe and South Asia. Pontic-Caspian Steppe is seen as the region from where R1a spread both west and east, splitting into different sub-branches along the way.

The paper that put all of the recent discoveries together into a tight and coherent history of migrations into India was published just three months ago in a peer-reviewed journal called ‘BMC Evolutionary Biology’. In that paper, titled “A Genetic Chronology for the Indian Subcontinent Points to Heavily Sex-biased Dispersals”, 16 scientists led by Prof. Martin P. Richards of the University of Huddersfield, U.K., concluded: “Genetic influx from Central Asia in the Bronze Age was strongly male-driven, consistent with the patriarchal, patrilocal and patrilineal social structure attributed to the inferred pastoralist early Indo-European society. This was part of a much wider process of Indo-European expansion, with an ultimate source in the Pontic-Caspian region, which carried closely related Y-chromosome lineages… across a vast swathe of Eurasia between 5,000 and 3,500 years ago”.

In an email exchange, Prof. Richards said the prevalence of R1a in India was “very powerful evidence for a substantial Bronze Age migration from central Asia that most likely brought Indo-European speakers to India.” The robust conclusions of Professor Richards and his team rest on their own substantive research as well as a vast trove of new data and findings that have become available in recent years, through the work of genetic scientists around the world.
What’s happened very rapidly, dramatically, and powerfully in the last few years has been the explosion of genome-wide studies of human history based on modern and ancient DNA, and that’s been enabled by the technology of genomics and the technology of ancient DNA….” David Reich, Geneticist and professor, Harvard Medical School
Peter Underhill, scientist at the Department of Genetics at the Stanford University School of Medicine, is one of those at the centre of the action. Three years ago, a team of 32 scientists he led published a massive study mapping the distribution and linkages of R1a. It used a panel of 16,244 male subjects from 126 populations across Eurasia. Dr. Underhill’s research found that R1a had two sub-haplogroups, one found primarily in Europe and the other confined to Central and South Asia. Ninety-six per cent of the R1a samples in Europe belonged to sub-haplogroup Z282, while 98.4% of the Central and South Asian R1a lineages belonged to sub-haplogroup Z93. The two groups diverged from each other only about 5,800 years ago. Dr. Underhill’s research showed that within the Z93 that is predominant in India, there is a further splintering into multiple branches. The paper found this “star-like branching” indicative of rapid growth and dispersal. So if you want to know the approximate period when Indo-European language speakers came and rapidly spread across India, you need to discover the date when Z93 splintered into its own various subgroups or lineages. We will come back to this later.

So in a nutshell: R1a is distributed all over Europe, Central Asia and South Asia; its sub-group Z282 is distributed only in Europe while another subgroup Z93 is distributed only in parts of Central Asia and South Asia; and three major subgroups of Z93 are distributed only in India, Pakistan, Afghanistan and the Himalayas. This clear picture of the distribution of R1a has finally put paid to an earlier hypothesis that this haplogroup perhaps originated in India and then spread outwards. This hypothesis was based on the erroneous assumption that R1a lineages in India had huge diversity compared to other regions, which could be indicative of its origin here. As Prof. Richards puts it, “the idea that R1a is very diverse in India, which was largely based on fuzzy microsatellite data, has been laid to rest” thanks to the arrival of large numbers of genomic Y-chromosome data.

Gene-dating the migration

Now that we know that there WAS indeed a significant inflow of genes from Central Asia into India in the Bronze Age, can we get a better fix on the timing, especially the splintering of Z93 into its own sub-lineages? Yes, we can; the research paper that answers this question was published just last year, in April 2016, titled: “Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences.” This paper, which looked at major expansions of Y-DNA haplogroups within five continental populations, was lead-authored by David Poznik of the Stanford University, with Dr. Underhill as one of the 42 co-authors. The study found “the most striking expansions within Z93 occurring approximately 4,000 to 4,500 years ago”. This is remarkable, because roughly 4,000 years ago is when the Indus Valley civilization began falling apart. (There is no evidence so far, archaeologically or otherwise, to suggest that one caused the other; it is quite possible that the two events happened to coincide.)

The avalanche of new data has been so overwhelming that many scientists who were either sceptical or neutral about significant Bronze Age migrations into India have changed their opinions. Dr. Underhill himself is one of them. In a 2010 paper, for example, he had written that there was evidence “against substantial patrilineal gene flow from East Europe to Asia, including to India” in the last five or six millennia. Today, Dr. Underhill says there is no comparison between the kind of data available in 2010 and now. “Then, it was like looking into a darkened room from the outside through a keyhole with a little torch in hand; you could see some corners but not all, and not the whole picture. With whole genome sequencing, we can now see nearly the entire room, in clearer light.”

Dr. Underhill is not the only one whose older work has been used to argue against Bronze Age migrations by Indo-European language speakers into India. David Reich, geneticist and professor in the Department of Genetics at the Harvard Medical School, is another one, even though he was very cautious in his older papers. The best example is a study lead-authored by Reich in 2009, titled “Reconstructing Indian Population History” and published in Nature. This study used the theoretical construct of “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI) to discover the genetic substructure of the Indian population. The study proved that ANI are “genetically close to Middle Easterners, Central Asians, and Europeans”, while the ASI were unique to India. The study also proved that most groups in India today can be approximated as a mixture of these two populations, with the ANI ancestry higher in traditionally upper caste and Indo-European speakers. By itself, the study didn’t disprove the arrival of Indo-European language speakers; if anything, it suggested the opposite, by pointing to the genetic linkage of ANI to Central Asians.




