- Nucleobases are the alphabet of DNA. There are four of them : adenine (A), thymine (T), guanine (G) and cytosine (C). They always go by pairs, A with T, and G with C. Such pairs are called “base pairs”.
- The 46 chromosomes of human DNA are composed of a total of 3,000 million base pairs.
- The Y chromosome possess 60 million nucleobases, against 153 million for the X chromosome.
- Mitochondrial DNA is found outside the cell’s nucleus, and therefore outside of the chromosomes. It consists only of 16,569 bases.
- A SNP (single nucleotide polymorphism) is a mutation in a single base pair. At present, only a few hundreds SNP’s define all the human haplogroups for mtDNA or Y-DNA.
Following the end of the last Ice Age approximately 12,000 years ago, European hunter-gatherers recolonised the continent from the Ice Age refugia in southern Europe. The vast majority of Mesolithic Europeans would have belonged to Y-haplogroup I. This included I*, pre-I1, I1, I2*, I2a*, I2a2, but the most widespread appears to have been I2a1, which was found in most parts of Europe. Northeast Europeans would have belonged mostly to haplogroup R1a. Other minor male lineages were certainly also present in parts of Europe, notably haplogroup A1a, C-V20, F-P96 and possibly even Q1a and R1b1* (P25).
Neolithic and Chalcolithic Europe
Hundreds of Neolithic samples from all over Europe (but especially Central Europe and Iberia) have been tested. The new lineages brought by these Near Eastern immigrants included mt-haplogroups HV, J1, J2, K1, K2, N*, N1, T1a, T2b, T2c, T2e, T2f, U3, W, X1, X2, and many subclades of H (including H2, H5, H7, H13 and H20). H4, H8 and H9 seem to have originated in the Near East as well, although no Neolithic sample has been identified in Europe yet.
The Bronze Age and the Indo-European migrations
The origin of the Indo-Europeans lies in the Pontic-Caspian steppe with (R1a) tribes to the north (forest-steppe and tundra) and (R1b) tribes to the south (open steppe) during the Chalcolithic and Bronze Age. Their migration both westward to Europe and south-eastward to Central and South Asia makes it easy to guess which mtDNA haplogroups they carried (=> see also Identifying the original Indo-European mtDNA from isolated settlements). The best matches for R1a are C4a, H1b, H1c, H2a1, H6, H11, K1b1b, K1c, K2b, T1a1a1, T2a1b1, T2b2, T2b4, U2e, U4, U5a1a, W, and several Isubclades.
The R1b branch would have originated in eastern Anatolia and/or northern Mesopotamia/Syria during the Early Neolithic period, where they probably domesticated cattle and became primarily cattle herders. Then would have crossed the Caucasus to the Pontic Steppe in search for pasture for their cattle, where they mixed to some extent with southern R1a tribes. The maternal lineages of these Near Eastern R1b people would have included haplogroups H5a, H6, H8, H15, I1a1, J1b1a, K1a3, K2a6, U5, and some V subclades (like V15).
- K => 40,000 years ago (probably arose in northern Iran)
- T => 30,000 years ago (around the Red Sea or around the Persian Gulf)
- J => 30,000 years ago (in the Middle East)
- R => 28,000 years ago (in the Central Asia)
- E1b1b => 26,000 years ago (in Northeast Africa)
- I => 25,000 years ago (in the Balkans)
- J1 => 20,000 years ago (in the Taurus/Zagros mountains)
- J2 => 19,000 years ago (in northern Mesopotamia)
- E-M78 => 18,000 years ago (in north-eastern Africa)
- R1b => 18,000 years ago (around the Caspian Sea or Central Asia)
- R1a => 17,000 years ago (in southern Russia)
- G => 17,000 years ago (in the Middle East)
- I2 => 17,000 years ago (in the Balkans)
- E-V13 => 15,000 years ago (in the southern Levant or North Africa)
- I2b => 13,000 years ago (in Central Europe)
- N1c1 => 12,000 years ago (in Siberia)
- E-M81 => 11,000 years ago (in Northwest Africa)
- I2a => 11,000 years ago (in the Balkans)
- G2a => 11,000 years ago (in the Levant or Anatolia)
- R1b1b2 => 10,000 years ago (north or south of the Caucasus)
- I2b1 => 9,000 years ago (in Germany)
- I2a1 => 8,000 years ago (in Southwest Europe)
- I2a2 => 7,500 years ago (in Southeast Europe)
- I1 => 5,000 years ago (in Scandinavia)
- R1b-L21 => 4,000 years ago (in Central or Eastern Europe)
- R1b-S28 => 3,500 years ago (around the Alps)
- R1b-S21 => 3,000 years ago (in Frisia or Central Europe)
Haplogroup C (Y-DNA)
Haplogroup C is an extremely old lineage thought to have appear before or soon after the first migration of Homo Sapiens outside Africa, some 70,000 years ago. Men belonging to haplogroup C would have departed from East Africa during the Ice Age and followed the coasts of Indian Ocean, settling in the Arabian peninsula, the Indian subcontinent, south-east Asia, north-east Asia and Oceania.
Haplogroup L (Y-DNA)
Haplogroup L is found mostly in West Asia and South Asia. Its overall frequency ranges between 5 and 15% in Pakistan and western India, with a peak of 23% among the Kalash of northwest Pakistan, and from 1 to 10% in central Asia (mostly in Uzbekistan, Tajikistan and Afghanistan). It is also found in the Middle East (5% in Lebanon, 4.5% in Turkish Kurdistan, 4% in Iran, 3% in Syria), in parts of the the Caucasus (7% in Azerbaijan and Chechnya, 3% in Armenia and Ingushetia), and in isolated parts of Europe (3.5% in north-east Italy, from 0.2% to 1% in the Balkans and Greece, 0.5% in Flanders).
Haplogroup H (Y-DNA)
Haplogroup H is typically found among Dravidian populations in the Indian subcontinent, especially in South India and Sri Lanka. In Europe it is found almost exclusively among the Gypsies (Romani), who belong predominantly (between 15% and 50%) to the H1a (M82) subclade of Indian origin. The highest frequencies of haplogroup H among non-Romani Europeans are found in regions with large Romani populations, such as Romania, Slovakia, the southern Balkans, and Andalusia, suggesting that these lineages are also of Romani origin. No other subclade than H1a has been found to date in Europe.
Haplogroup A (Y-DNA)
A is the oldest of all Y-DNA haplogroups. It originated in sub-Saharan Africa over 140,000 years ago, and possibly as much as 340,000 years ago if we include haplogroup A00. Modern populations with the highest percentages of haplogroup A are the Khoisan (such as the Bushmen) and the southern Sudanese.
There are only rare and isolated cases of European men belonging to haplogroup A. Commercial tests have identified a few Scottish and Irish families (surnames Boyd, Logan and Taylor) all belonging to the same A1b1b2 (M13) subclade. This subclade is normally found in East Africa (Ethiopia, Sudan), but has also been found in Egypt, the Arabian peninsula, Palestine, Jordan, Turkey, Sicily, Sardinia and Algeria. It was certainly brought to Europe by Levantine people, be it during the Neolithic or later (Phoenicians, Jews, immigration within the Roman Empire).
Haplogroup H & V (mtDNA)
Haplogroup H is by far the most common all over Europe, amounting to about 40% of the European population. It is also found (though in lower frequencies) in North Africa, the Middle East, Central Asia, Northern Asia, as well as along the East coast of Africa as far as Madagascar.
H1, H3 and V are the most common subclades of HV in Western Europe. H1 peaks in Norway (30% of the population) and Iberia (18 to 25%), and is also high among the Sardinians, Finns and Estonians (16%), as well as Western and Central European in general (10 to 12%) and North-West Africans (10 to 20%).
Haplogroup U & K (mtDNA)
Haplogroup U is extremely old. It originated some 60,000 years ago at the confine of North-East Africa and the Middle East, soon after the first Homo Sapiens ventured out of Africa. This is why each of its top-level subclade (U1, U2, U3…) can be seen as a haplogroup in its own right. The main European subclades are U3, U4, U5 and U8/K. U1 is mostly found in the Middle East, U6 in North Africa, U7 from the Near East to India, and the rare U9 from Ethiopia and the Arabian peninsula to Pakistan.
Haplogroup K is the main subclade of U8. It is found throughout Europe and Western Asia, as far away as India. Its highest concentration is in North-West and Central Europe, Anatolia and the southern Arabian peninsula. It is believed to have first arisen somewhere between Egypt and Anatolia approximately 16,000 years ago (estimates range from 22,000 years to as little as 10,000 years before present). It has the largest number of subclades of any haplogroup in spite of its fairly recent age. K1a is the largest subclade. The relatively important presence of K1a in the Near East suggest that it predates the Neolithic migration to Europe. This has been supported by the ancient mtDNA from Neolithic sites. Haplogroup K was never found in Europe prior to to the Neolithic, then suddenly appears at a frequency (17%) much higher than in modern Europeans and similar to that of the present-day Levant. Most of the Neolithic K belongs to the K1a subclade.
Haplogroup J & T (mtDNA)
Haplogroup J originated in the Middle East 45,000 years, making it one of the oldest mitochondiral haplogroups in Europe and the Middle East. Haplogroups J1c and J2a1 might have been present in Southeast Europe since the Epipaleolithic, then were probably diffused by Neolithic farmers across the rest of Europe. J2b1a, a mostly Near Eastern subclade, has been found in Neolithic samples in Europe alongside J1c.
Haplogroup W (mtDNA)
Present at low frequencies in most of Europe, in Anatolia, around the Caspian Sea, and from the Indo-Pakistani border to Xinjiang, haplogroup W is one of the best maternal markers of Indo-European ancestry (mtDNA equivalent of R1a and R1b). Its highest frequency is in Ukraine, European Russia, Baltic countries and Finland (3 to 5% overall), as well as in northern Pakistan (15%), Punjab (9%) and Gujarat (12%). In India, it is considerably more common among the upper castes and among Indo-European speakers.
Haplogroup I (mtDNA)
Like haplogroup W, haplogroup I is found at low frequency over most of Europe, especially in northern and eastern Europe, and across Central Asia as far as Pakistan and North-West India, with a characteristic presence in the North Caucasus. Haplogroup I first appears in Europe with the arrival of Proto-Indo-European cultures, notably the Unetice culture associated with Y-haplogroup R1b. The absence of haplogroup I from Paleolithic, Mesolithic and Neolithic sites, and from modern non-Indo-European speaking populations such as the Saami, the Basques and the Maghrebians all play in favour of an Indo-European origin.
Haplogroup X (mtDNA)
Haplogroup X is a very old and scattered haplogroup found all over Eurasia, North Africa as well as among Native North Americans. It frequency rarely exceeds 5% of the population in any ethnic group, and is more often restricted to 1 or 2%. X1 is found almost exclusively in North Africa, while X2a is the only lineage present among Amerindians. X2d, X2e, X2n and X4 are found in Europe and Central Asia, and could therefore have been spread at least partially by the Proto-Indo-Europeans.
The strong presence of X2 around the Caucasus, progressively fading towards the Near East and Mediterranean , hints that it could be related to the spread of Y-DNA haplogroup G2a. R1b1b and G2a both having origins around the Caucasus it is unsurprising to find X2 alongside these two Y-DNA haplogroups.
Haplogroup R (mtDNA)
Haplogroup R is the main subclade of N, the one that was to generate the 6 most common European haplogroups (H, V, J, T, U, K). At the time of writing R subclades were numbered from R0 (a.k.a. pre-HV) to R31. Most of them are found in South Asia (R5, R6, R7, R8, R30, R31), Southeast Asia (R9, R21, R22, R24), East Asia (R9/F, R11/B), and even among Papuans (R14) and Australian aborigenes (R12). R0a peaks in the southern Arabian peninsula is common among Arabs and Middle-Easterners. R1a (not to be confused with the homonymous Y-chromosome haplogroup) is found among the Adygei people from the North Caucasus (related to the Maykop culture => see R1b section), Brahmins from northern India, northwestern Russians and Poles – basically all people closely related with the Indo-European expansion. R2 is found from northwest India and Pakistan to Iran, Georgia and Turkey. It could be connected to the Indo-Iranians.
Finno-Uralic people have an overall mtDNA admixture similar to other Europeans, with a higher percentage of W and U5b, and a small percentage of Siberian haplogroups such as N or A. The Sami are characterised by a high percentage of haplogroups U5b1 and V.
The Berbers are the indigenous populationof north-west Africa. Although their Y-DNA is almost perfectly homogenous, belonging to haplogroup E-M81, Berber maternal lineages show a much greater diversity, as well as regional disparity. At least half (and up to 90% in some regions) of the Berbers belong to some Eurasian lineages, such as H, HV, R0, J, T, U, K, N1, N2, and X2, mostly of Middle or Near Eastern origin. 5 to 45% of the Berbers will have sub-Saharan mtDNA (L0, L1, L2, L3, L4, L5). There are only three native North African lineages, U6, X1 and M1, representing 0 to 35% of the people depending on the region.
The Gypsies (Romani people) originated in the Indian subcontinent and mixed with local population in the Middle East and Eastern Europe over the centuries. About half of the Gypsy population belong to haplogroup M, and more specifically M5 (reflected by Y-haplogroup H1a), which is otherwise exclusive to South Asia. The other mtDNA haplogroups found among the Gypsy community are mostly of Eastern European, Caucasian or Middle Eastern origin, such as H (H1, H2, H5, H9, H11, H20, among others), J (J1b, J1d, J2b), T, U3, U5b, I, W et X (X1b1, X2a1, X2f) (sources). The same diversity exist on the Y-DNA side (45% of H1a, followed by I1, I2a, J2a4b, E1b1b, R1b1b, R1a1a).