Thursday, April 23, 2009

Correspondence between mtDNA and Y-haplogroups - by Terry Toohill

Correspondence between mtDNA and Y-haplogroups


I’ve just today noticed that it’s entirely possible to superimpose a coherent mtDNA chart over a Y-chromosome chart. In other words individual mtDNA haplogroups are, in fact, often associated with individual Y-chromosome haplogroups, especially when both are expanding into previously uninhabited environments. In that situation they obviously need each other for either line to survive.



If you look at the diagram (click to enlarge) you’ll see that all the M group of mtDNA haplogroups (bold, Italics, underlined) are associated with Y-chromosome haplogroups enclosed within the dashed lines, a shape a bit like Australia in the middle of the diagram. On the other hand mtDNA haplogroup N (bold, underlined) is associated with the Y-haps at the left of the diagram and with those at the top right. And some descendants of mtR (smaller, bold, underlined) with a particular Y-chromosome haplogroup at the bottom right.


From the combined diagram it seems that mtM expanded into India with Y-haps F and K-T (and possibly C5), but not with the other F-derived haplogroups (G/H and I/J). Those Y-haps teamed up with mtN, specifically mtI/X and mtW, somewhere along the route between Africa and India. I’ll return to mtN soon but a branch of W seems to have reached Australia at some time.


As mtM expanded with Y-haps F and K-T both the mtDNA and Y-chromosome haplogroups changed over time. For example mtQ developed in New Guinea with Y-hap S. And mtG, along with mtD, in East Asia with Y-hap F2. But mtD went much further north than Y-hap F2. To some extent D’s distribution corresponds with that of Y-hap O, but again the mt-hap went much further than the Y-hap. Eventually mtD reached America, presumably with yet another Y-haplogroup. To be consistent I should join mtC/Z with Y-hap T somewhere at the western end of the K-T geographic range. In this case, though, their eventual distributions share very little relationship to each other. Lastly we have mtE in Southeast Asia with Y-haps K and M. But in this case the Y-haps moved further than the mtDNA. Who were Y-haps K2, K3, K4, M1, M2 and M3 breeding with once they had moved beyond mtE’s distribution? I’d guess that members of mt haplogroup N had already spread along the east coast of Asia and south into Australia.


As well as associating with Y-haps G/H and I/J, mt N had teamed up with Y-hap C, to finish up in Australia as mtS and Y-hap C4, in New Guinea as mtP and Y-hap C6, in Japan as mtY and Y-hap C1, and in Southeast Asia as mtR and Y-haps C* and possibly C2. Lastly mtA and mtKet N* developed somewhere between all these eastern Ns and the western mtI/X and mtW mobs, perhaps with Y-hap C3. Haplogroup C5 had finished up in India, where there were already plenty of women so they didn’t need to take any with them. Y-hap C5 may have gone with Y-haps F and K-T though, therefore it too could be associated with mtM.


The next step in the spread of all those mtNs began when they swapped sides: mtR teamed up with Y-hap N-R, and off they went. That’s why I’ve actually included mt-hap R twice, once as a subgroup on N (bold, underlined) associated with Y-hap C* and once as the base of a second expansion (smaller, bold, underlined) associated with Y-hap N-R. The route they followed is debatable but I suspect it forked. Mt-hap F didn’t move very far but mtB spread right along the East Asian coast, eventually reaching both America and Polynesia. By then B had swapped sides yet again and moved into Polynesia with Y-hap C2, and perhaps into America with Y-hap C3. As far as I’m aware mtB is spread throughout the latter continent though.


The mt lines H/V, J/T and U/K ultimately became entangled with Y-hap R, and even joined up again with their old mates, descendants of Y-haps G/H and I/J. MtH and Y-hap R have become the major haplogroups in Europe. Perhaps they were first in. It’s possible that some I/X or W mtDNA lines and some G/H and I/J Y-chromosome lines may represent remnant ancient haplogroups in Europe though. But they could well have arrived after Y-hap R and all these descendants of mtR. It seems that as Y-hap Q moved towards America they picked up women from different places along the way, to finish up being associated with mt-haps A, B, C, D and X.


Some of the M and N mtDNA lines have moved back into Africa. My guess would be with Y-hap E, although Y-haps R and T seem also to be involved. Y-hap D may have been associated with the impetus for an original Central Asia migration. In that case it too would be associated with various mtNs, perhaps mtA and mtY.



16 comments:

Maju said...

Just some quick thoughts before jumping into my bed:

all the M group of mtDNA haplogroups (bold, Italics, underlined) are associated with Y-chromosome haplogroups enclosed within the dashed linesNot really. For example Y-DNA L that you place in "India" (where it does certainly exist) is more of a Pakistani and even Central Asian group and may correlate better, if anything, with mtDNA clades like U2.

I don't understand why you split Melanesian Y-DNA into (mtDNA) M-related and unrelated groups. Why is (Y-DNA) S related with (mtDNA) Q and not (Y-DNA) M or C6. What do you know that I don't? (If anything).

Anyhow, in South Asia there is more than just mtDNA M: R is aboundant too (lots of small and not so small clades, plus U2 - and there is some N as well). You claim that Y-DNA H is associated with mDNA R but in fact spans by areas of South Asia where mtDNA M is clearly dominant.

In East Asia too, you make Y-DNA D to be unrelated with mtDNA M but that is not the case in Japan clearly: M7, M8/CZ, M9, G. MtDNA G and D are also surely associated with Y-DNA D in Tibet/Central Asia (unsure right now about Tibetan mtDNA). All those are M-derived.

Much of the same could be said about East Asian Y-DNA C (C3 and C2).

The overall pattern instead seems to be: mtDNA M did (almost) not particiapte in the colonization of West Eurasia and the same stands true for Y-DNA C and D. So you are confusing this peculiarity of West Eurasia with something else.

A last criticism: Y-DNA G/H is not as of now any single haplogroup but two. G is also not just "Caucasus" but, especially in the G2 sublineage is much more widespread through all West Eurasia, including Central Asia.

terryt said...

Thanks for putting that up Tim. Hopefully we will be able to improve it with time. I'm already tempted to put mt-haps D and G with Y-hap O. But the mt-haps probably first moved into East Asia with Y-hap F and only later expanded with O.

Luis claims the idea that Y-haps and mtDNA haps moved together is a simplification but the connections here are associated with first arrivals in each region. In those cases Y-haps and mt-haps are obviously connected.

terryt said...

Thanks for your contribution Maju.

"Y-DNA L that you place in 'India' (where it does certainly exist) is more of a Pakistani and even Central Asian group and may correlate better, if anything, with mtDNA clades like U2".

I'm quite prepared to change it to 'S Asia' or 'Central Asia' if you wish. India is just shorthand. The association with mtDNA U2 is presumably a later correlation as U2 is very much a downstream mutation within the R clade.

"Why is (Y-DNA) S related with (mtDNA) Q and not (Y-DNA) M or C6".

I agree that either Y-DNA C6 or S could be associated with either mtDNA Q or S. But associating Y-DNA C4 with S fits the other C Y-DNA pattern and both Y-hap S and mtDNA Q seem to have a South Asian origin (note). Y-DNA M is more Melanesian rather than New Guinea.

"Anyhow, in South Asia there is more than just mtDNA M: R is aboundant too (lots of small and not so small clades, plus U2 - and there is some N as well)".

True, but again downstream clades. And providing support for my contention that R moved through South Asia with Y-DNA P.

"In East Asia too, you make Y-DNA D to be unrelated with mtDNA M but that is not the case in Japan clearly: M7, M8/CZ, M9, G. MtDNA G and D are also surely associated with Y-DNA D in Tibet/Central Asia (unsure right now about Tibetan mtDNA). All those are M-derived".

MtDNA M need not have arrived there with Y-DNA D though. And D is not only associated with M haplogroups there. About half the Japanese mtDNA is N-derived. Same in Central Asia. I'm pretty much in agreement with your idea that mt-DNA M is South Asian in origin but Y-DNA D is absent in South Asia and through much of SE Asia. This suggests the Y-DNA and mtDNA haps have become associated later rather than having expanded in association. Likewise for C3 and C2.

"Y-DNA G/H is not as of now any single haplogroup but two. G is also not just 'Caucasus'".

Again a simple matter to change it. H would then appear on the line between F* and I-T, and would be included within the dashed lines. In other words it took part in the original movement into India along with mtDNA M.

Maju said...

Y-DNA M is more Melanesian rather than New Guinea.

Only recently it has been found that Melanesian former K1 and K7 are within M. M1 is typically Papuan (as well as other Melanesian) and comprises like a good deal of the Y-DNA lineages of the island. The ISOGG brief comment on M reads: Y-DNA haplogroup M reaches its known peak in Papua New Guinea, totaling one-third to two-thirds of population.

True, but again downstream clades.

Wrong. 5 of all 15 R subclades are circunscribed to South Asia (R1, R5, R6'7, R12, R14 and R30), while yet other three lineages are shared inter-regionally between South Asia and West Eurasia (R2'JT and U) or South Asia and Australia (R31). Overall South Asia hosts the highest top level diversity within macro-haplogroup mtDNA R by large and therefore is the most likely urheimat of this huge matrilineage.

It would seem that the local R subclades had little room for expansion within the subcontinent because after M had taken nearly all the available niches in an earlier moment. Still lineages like R7 or U have some importance. Instead West Eurasian branches like R0, U(xU2) and JT (a downstream clade, in your terminology) found much more room for expansion in West Eurasia. Even East Asian R9'21'22'F and R11'B found eventually more room for expansion in East Asia too, what suggests that South Asia was already crowded for Paleolithic standards by then (not so long after the first expansive pulse).

But the top-level (underived) diversity for R is in South Asia without a shadow of doubt.

MtDNA M need not have arrived there with Y-DNA D though.

Considering that both appear as the oldest lineages in East Asia it is hard to think otherwise.

I'm pretty much in agreement with your idea that mt-DNA M is South Asian in origin but Y-DNA D is absent in South Asia and through much of SE Asia.

The latest research (Hong Shi, 2008 - search at Mathilda's) determines that the urheimat of D was in SE Asia with little room for doubt.

This suggests the Y-DNA and mtDNA haps have become associated later rather than having expanded in association. Likewise for C3 and C2.

Wishful thinking an nothing else.

H would then appear on the line between F* and I-TNo. It is a distinct clade than IJK (your "I-T").

In other words it took part in the original movement into India along with mtDNA M.

MtDNA M and N are comparable to Y-DNA C, D and F. I don't know why you place a downstream Y-DNA F subclade along with the major top-level mtDNA lineage. MtDNA M and N must have expanded when Y-DNA F, D and C did, even if it's not easy to make simplistic parallelisms.

This is because they expanded in a cocktail, not one here and one there, with strict separation of any sort. Distilling, drift, when it happened, came later, not in the moment of expansion: expanding populations do not drift significatively, only stable or contracting ones do (and the smaller the population, the more they drift too).

It's pure statistics. Try this: put some beans of several colors in a bag and blindly pick some of them, discarding the rest. Then repeat with the remainder (the survivors) as many times as you want. If there are many beans and you pick most of them, the color ratio will not vary and you will have to repeat a lot of times before noticing any difference at all. If instead there are few beans in the bag and you pick even fewer, then soon you will end with a single color population. That's drift: that's how our haplogroups were sorted around.

terryt said...

"M1 is typically Papuan (as well as other Melanesian)".

Isn't that what I've got? what's really interesting is that M3 tends to be found a little beyond M1, through northern Melanesia, and M2 further afield, even as far east as Fiji. In fact the obvious conclusion is that Y-hap F simply differentiates into geographic haplogroups from M in the east, through S, N-R, L to T in the west.

"No. It is a distinct clade than IJK (your 'I-T')".

But that makes no difference as to where it fits on the line, merely that it cannot be to the right of F*.

"I don't know why you place a downstream Y-DNA F subclade along with the major top-level mtDNA lineage".

Where did I do that?

"This is because they expanded in a cocktail, not one here and one there".

I doubt that very much. After all they were expanding into what we can probably assume had been till then a hostile environment, otherwise they would have already occupied it all. As I've pointed out elsewhere, at the margin of any species' expansion selection and drift are likely to be severe. Even if they were all present in some 'original' population it's extremely unlikely they all moved in the same direction at the same time.

Maju said...

Isn't that what I've got? what's really interesting is that M3 tends to be found a little beyond M1, through northern Melanesia, and M2 further afield, even as far east as Fiji. In fact the obvious conclusion is that Y-hap F simply differentiates into geographic haplogroups from M in the east, through S, N-R, L to T in the west.

Uh?

M2 and M3 surely mean further founder effects within M expansion.

I really can't gather how you reach to your conclussion in the second sentece anyhow (you seem to mean K, not F, by the way). M, S, NOP (not "N-R", the nomenclature is there for a reason), L and T are different founder effects in different regions. Within the major clades the highest diversity (NOP/P/R, L and T) is clearly in the northern South Asia - Central Asia - West Eurasia area. Agreed that minor K lineages can appear to (slightly) swift the balance to the SE but are these lineages really Sahulian or rather Austronesian with an East Asian origin? Do they share any upstream structure that we are missing?

But that makes no difference as to where it fits on the line, merely that it cannot be to the right of F*.

Yes: F divides itself into: F1, F2, F3, F4, G, H and IJK, for what we know - plus the paragroup F*. That is 7 clearly defined sublineages: 4 typically South Asian, 1 (G) typically West Eurasian, 1 (F2) seemingly East Asian and 1 (IJK) widely distributed.

"I don't know why you place a downstream Y-DNA F subclade along with the major top-level mtDNA lineage".

Where did I do that?
.

M: you place mtDNA M at what you call "I-T" (Y-DNA IJK).

R: you place mtDNA R (seemingly R(xB,F) along with Y-DNA NO (your "N-R").

That's crazy, IMO.

I doubt that very much. After all they were expanding into what we can probably assume had been till then a hostile environment, otherwise they would have already occupied it all.

Hmmm? They occupied it all in little time. We humans don't just drop many eggs around and let them fare by themselves: our reproduction style is to raise our children for many years and we can have only so many kids anyhow. So even in the best conditions we can only grow so much in each generation, maybe double or triple numbers?

Plus you need time to explore and get familiar with new areas so you can effectively exploit them. And I am not considering here other limitations like diseases or naturally violent death rate. People didn't live for too long on average in the Plaeolithic, it seems. Nomadic peoples like the Bushmen breastfeed for up to 4 years as natural contraceptive measure, allowing women to have to put up with one single toddler at each time.

So make up your maths: an average woman living 30-35 years could have maybe 3-4 children, of which not all surely survived. So even in the best of conditions doubling numbers in each generation was surely not an option. Let's assume a population growth of c. 50% as reasonable (in optimal conditions, nearly zero in not so good ones).

Let's make it double every 2 generations (good conditions) and let's assume 25 years per generation. Each century the population multiplies by 4. So 100 original people could become c. 100,000 in 500 years and 400 million in one milennium. The late figure (and surely the early one) are already well beyond the Malthusian limits for Paleolithic Eurasia.

So what do we have? That the main enemy were themselves: if they grew so fast, they wold be pitted against each other in few generations/centuries, even within the best external conditions becasue such optimal demic growth was just not sustainable. So no wonder they migrated either and that they even dared to face the extremely strong Neanderthals for a piece of land.

But also we see that a small single group could well have colonized large areas, even most of the continent, in very few generations as well and that internal Malthusian pressure would have pushed marginal groups farther away easily too, creating a multitude of founder effects.

This does not explain mtDNA R dynamism anyhow, so guess it may be related with other elements, like a some decissive ethnic technological advantage of the sort of the stoneblade or the atlatl (maybe).

Even if they were all present in some 'original' population it's extremely unlikely they all moved in the same direction at the same time.

East was the natural way to go: even if here could be some H. erectus remnants they were no match (no really big skull has been found in Asia beside Sapiens and Neanderthal ones, so they were surely rather limited intellectually). In the west instead it was the extremely strong and also expanding H. neanderthalensis, who surely hindered human expansion in that direction for a good while (also deserts are much more of a barrier than swamps but less important, I guess).

terryt said...

"M2 and M3 surely mean further founder effects within M expansion".

"M, S, NOP (not "N-R", the nomenclature is there for a reason), L and T are different founder effects in different regions".

Isn't that what I said? What's the difference between the two?

"you place mtDNA M at what you call 'I-T' (Y-DNA IJK)".

Actually at IJK-T.

"you place mtDNA R (seemingly R(xB,F) along with Y-DNA NO (your "N-R")".

Both are downstream clades. And I don't exclude B and F.

"They occupied it all in little time".

We don't really know that. Besides which you then go on to explain why I am correct.

Maju said...

Isn't that what I said? What's the difference between the two?.

That you wrote:

In fact the obvious conclusion is that Y-hap F [you mean K obviously] simply differentiates into geographic haplogroups from M in the east, through S, N-R, L to T in the west.

You appear to suggest a distribution process from M (New Guinea) to T (West Asia), while the most logical thing is to look for the "gravity center" of this distribution as first hypotehtical dispersal center.

Actually at IJK-TActually IJK. Full stop. You don't need to reinvent the wheel.

Both are downstream clades. And I don't exclude B and F.

In your graph mtDNA B and F are paired with Y-DNA C2 somehow.

MtDNA R is a downstream clade (1 step) of N but NOP is a downstream clade of K (three steps from F).

MtDNA R has lineages in places like Sahul (P especially, which seems pretty old there), where there is no meaningful Y-DNA NOP (but there are lots of K, what may make better sense). Maybe you're thinking of the Australian R(xR1)? If so, what about New Guinea?

Overall the age of mtDNA R seems quite older than Y-DNA P or NO and the NOP node seems really difficult to take apart from the K node (just a single SNP). If I'd wanted to couple mtDNA R with anything else (not too strictly in any case), I'd do with Y-DNA K - seems pretty obvious, as both are the only "explosive" derived macrolineages in their respective gender lines.

"They occupied it all in little time".

We don't really know that
.

Pretty obvious for the explosive nature and wide distribution of M and R (especially).

terryt said...

"You appear to suggest a distribution process from M (New Guinea) to T (West Asia), while the most logical thing is to look for the 'gravity center' of this distribution as first hypotehtical dispersal center".

I may 'appear to suggest' such a thing but we cannot actually tell the direction of any movement, although India seems a likely dispersal centre. After all the dashed line is pretty closely associated with India although strictly it only contains mtDNA M and M-derived haplogroups. I'm quite prepared to accept two directions, east and west.

"In your graph mtDNA B and F are paired with Y-DNA C2 somehow".

Y-DNA C2 is certainly associated with mtDNA B in the remote Pacific, but this is obviously a more recent migration. But Y-DNA C2 is island SE Asian in origin and probably closely related to C*. MtDNA F may have spread north with C*.

"MtDNA R is a downstream clade (1 step) of N but NOP is a downstream clade of K (three steps from F)".

I doubt if separate individual Y-haps and mtDNA haps are all that closely associated except in any 'first' migration into an uninhabited region, which is what I am attempting to discover.

"MtDNA R has lineages in places like Sahul (P especially, which seems pretty old there), where there is no meaningful Y-DNA NOP".

That's why I have not placed NO with any R-derived mtDNA. And it makes it quite possible that Y-hap C6 became associated with mtDNA P. In fact I have mtDNA P associated with two different Y-haps. It's almost certain the populations in the region would have mixed, even if that took some generations.

"Pretty obvious for the explosive nature and wide distribution of M".

'Wide' distribution does not necessarily mean 'explosive'.

Maju said...

... although India seems a likely dispersal centre... I'm quite prepared to accept two directions, east and west.

Good.

India although strictly it only contains mtDNA M and M-derived haplogroups.

Not at all. South Asia contains the highest diversity of mtDNA R sublineages.

It is also home of several major Y-DNA lineages: C5, H, L and P (or at least R, which with very high likelihood originated in this region - P probably too in fact, even Q can be considered as maybe South Asian by origin).

It is also notable that J2b reaches one of its two peaks in South Asia (the other is in Europe, with West Asia being low on it) and, while pseudo-reasonings like "Neolithic" or even "Alexander's boys" have been thrown around as tentative explanations, I find them highly unlikely. J2b is a very old lineage and this spread surely happened in the Paleolithic. I would think of this Y-DNA J dispersal (based only on the patterns) as maybe associated with that of mtDNA U.

I doubt if separate individual Y-haps and mtDNA haps are all that closely associated...

Hmmm. I thought that was the point of this article of yours: to show how Y-DNA and mtDNA are coupled historically - always according to your viewpoint.

'Wide' distribution does not necessarily mean 'explosive'.

35 top-tier sublineages means very much explosive. It is the most diverse starlike haplogroup I know of. I dwell on that issue on this post at Leherensuge. Only mtDNA H can compare AFAIK.

terryt said...

"South Asia contains the highest diversity of mtDNA R sublineages".

Diversity doesn't mean origin.

"It is also home of several major Y-DNA lineages: C5, H, L and P (or at least R".

All of which, apart from L, were probably immigrants to India after it had already been occupied by modern haplogroups.

"I thought that was the point of this article of yours: to show how Y-DNA and mtDNA are coupled historically".

That will only work for the first immigrants to a particular region. After that haplogroups get mixed. But coupled haplogroups can still expand together, especially through the adoption of some new technology. This secondary expansion need not necessarily involve the original pairing. My diagram provides several examples.

Maju said...

Diversity doesn't mean origin.

Unless you have a very good reasoning against it, it is the best indicator we can have for that. It is only logical that when a haplogroup expands, most of its derived lineages remain in the nearby area. They don't need to be the more succesful ones though.

All of which, apart from L, were probably immigrants to India after it had already been occupied by modern haplogroups.

I disagree. C5 and H are only found in South Asia. Only P can be somewhat confusing but AFAIK you find more R* and R1* and even R1a* (and certainly R2) in South Asia than anywhere else. So R is almost for sure of South Asian origin. The overall estimated origin of P would only be a factor of the origin of R and the origin of Q, and Q also has good odds to be of South Asian origin or at most Central Asian.

terryt said...

"C5 and H are only found in South Asia".

Didn't you read what Ebizur wrote some time back? Both are found mainly to the northwest of India. Using your own logic they would have expanded from there. 'It is only logical that when a haplogroup expands, most of its derived lineages remain in the nearby area'.

"you find more R* and R1* and even R1a* (and certainly R2) in South Asia than anywhere else. So R is almost for sure of South Asian origin".

I totally accept that. It's P I have the problem with. P has as its closest relation (brother) NO, in eastern and northern Asia. It's possible, but unlikely, that N and O originated in India, but you'd have to postulate a huge amount of drift.

Maju said...

Didn't you read what Ebizur wrote some time back? Both are found mainly to the northwest of India.

No. He said (from memory) that C* is found over there. Ccan't recall the exact locations of C5 but I believe they are something like eastern coasts: from Bengal to the far South.

I am much more sure about H, which is concentrated in the South, notably in Kerala. But also has an important presence in the center-north. I think these two areas are dominated by distinct subclades of H: H1 and H2, but not sure which is which atm.

I totally accept that. It's P I have the problem with. P has as its closest relation (brother) NO, in eastern and northern Asia. It's possible, but unlikely, that N and O originated in India, but you'd have to postulate a huge amount of drift.

No. NO originated in SE Asia with all likelihood. This is distinct from where P originated, even if they are sybling clades. The same that L, M, S and T originated in different areas (Pakistan, New Guinea, Australia and West Asia) even if all four are sybling K lineages.

You can join the dots and the resulting K core area must be somwhere between South and SE Asia. Where exactly? No idea.

terryt said...

"The same that L, M, S and T originated in different areas (Pakistan, New Guinea, Australia and West Asia) even if all four are sybling K lineages".

So why are they the only haplogroups ever to have 'originated in different areas'? Surely NOP is part of the same clade. You've finished up with a huge gap between L in Pakistan/India and S in New Guinea. Nothing in between?

I give up. You're free to make up whatever you wish.

Maju said...

In between there was K - I guess. I am not the one distributing the clades around the people. They did that themselves.

But in between you also have NOP, true. Yah, good point: maybe just consider NOP as the main local continuator of the original K in the same homeland. While L, M, S and T would rather be early offshots maybe.

You also have the minor Ks though: two appear to be SE Asian and one South Asian. But no matter if K coaslesced in SE Asia or South Asia, there would always be a gap. It's not me who says that: it's the data, the facts - they are stubborn, you know.

The most likely explanation for the gap is the private lineage model. A private lineage can easily be "reabsorbed", drifted out. It happens all the time. Only when a private lineage expands it gains some security against extinction (and we call it haplogroup, beause it's more than just a private lineage: it is a set of several, many of them, hierarchically structured in a genealogical tree).

Post a Comment