Genetics, History, DNA, and Genealogy Information: 2016

Saturday, December 31, 2016

On the Need for More Interdisciplinariness in "Interdisciplinary" Studies

Ah, if they were all as good as Luigi Luca Cavalli-Sforza. The pioneer of interdisciplinary studies, and a Renaissance man, he would thoroughly immerse himself in genetics, demography, history, archaeology, and linguistics -- or find collaborators who could augment his knowledge. Thus, his work SAW THE BIG PICTURE.

A new paper out shows that modern "interdisciplinary" studies aren't so interdisciplinary at all.

It's called Mapping European Population Movement through Genomic Research by Patrick J. Geary and Krishna Veeramah. You can read it by clicking here.

The authors show that many geneticists writing about history simply pick up some bogus two-bit history book. That is why you get so much pseudo-science out there.

I once talked to a guy, a fairly educated scientist from another discipline, who felt he saw some marker in European genes. So he did some google searches as to which tribe had ever moved in the rough place where he found the markers. He then published a paper claiming he found a Cimbri-specific marker. But he didn't read the rest of the history; had he done so, he would have grasped perhaps that that tribe was wiped out by Gaius Marius in the first century BC....

The paper also points out that there isn't enough precision in genetics, because geneticists don't bother to understand that different regions have different histories. What good is knowing some person was French, without logging if that person is Provencal or Norman? Very little....

Best quote from the paper: "The Ralph and Coop study, while highly rigorous at the level of the population genetic analysis, included no historians or archaeologists, and the only historical literature cited, presumably to »identify« the Hunnic contribution to European population, was a general history of Europe, a survey of Slavic history, and two articles in the New Cambridge Medieval History. The Busby et al. study also included no historians or archaeologists on its team, and the only historical literature cited was a Penguin History of the World, Peter Heather’s survey of the Early Middle Ages, and a survey of Muslims in Italy. Unlike these studies, designed and executed exclusively by geneticists who then look through a few general historical handbooks to try to find stories that might explain their data..."

In other words, many scientific papers suffer from the same thing that plagues the Anthrogenica or even worse, Maciamo's horrifically bad Eupedia: "a LITTLE knowledge is dangerous." They don't bother grasping the big picture in genetics, demography, history, archaeology, and linguistics...

Sunday, October 30, 2016

We Are Our Brother's Keeper: Are All Men Cousins? And Is This The Root Of Prejudice?

Many of you already know the following concepts. Humans intuit a sense of community and family with those with whom they are related. This has been confirmed in study after study, on child abuse, on ingroup-outgroup dynamics, and on racial prejudices.

The percentages of relatedness to trigger that feeling of kinship need not be large. As the following chart shows, many of us have folks over to Thanksgiving dinner with whom we only have 1-3% of identical DNA with. But that identical DNA is hugely significant. It's identical. And that of course makes one much more "related" than this "we share most DNA with all humans and even chimpanzees." Indeed, it's the margins that seem to count. And again, studies on stepfathers in particular, have confirmed this time and time again.

Parent / Child
Full Sibling 50%


Grandparent / Grandchild Aunt / Uncle Niece / Nephew Half Sibling	25%
1st Cousin	12.5%
1st Cousin once removed	6.25%
2nd Cousin	3.13%
2nd Cousin once removed	1.5%
3rd Cousin	0.78%

The weird quality of the Y-Chromosome makes what I am about to post intriguing:

A human genome, including the X and Y chromosomes, is about 3771 cM long.

The Y Chromosome makes up about 2% of that, by length, and about 1% by SNPs.

Because men in certain haplogroups have IDENTICAL Y-Chromosomes (except for tiny combining parts), and because unlike the rest of DNA, those genes are passed on IDENTICALLY, then all men in the same haplogroup share as much DNA as, say, 2nd Cousins Once Removed.

Could this be the explanation why, for example, Western European males, which do not have much Y-chromosome diversity, exhibit a powerful ingroup dynamic with each other?

Fascinating, to be sure.

Tuesday, October 11, 2016

How DNA Ancestry Testing Works and How Can I Know It's Accurate

When a commercial DNA testing site like Ancestry.com or 23andme or FTDNA tests your DNA, they do not know which snippet came from which of your parents.

For example, if at a given point (a gene, in popular parlance), you have a "C" from your dad and a "T" from you mom (meaning you have brown eyes, but carry the blue-eyes gene), the testing service doesn't know which "letter" came from which parent.

What they then do is try to guess, stringing your DNA out into small chunks or strings of letters.

They then compare these to DNA in their reference database. 23andme's reference database, which is one of the best, if not the best in the world, only has about 11,000 samples in it. To represent the whole world!

So if you have ancestry from a big country (like France or Germany) or a country that has pockets of deep isolation (like Italy), the odds -- that they have someone from your corner of the country, or your little isolated craggy valley in some mountain chain -- are small.

They then compare the little strings of letters and come up with a likelihood that you have ancestry from one of those reference populations.

23andme has the most scientific test in the business, but it gets French/German/Belgian/Dutch/Swiss/Austrian/Luxembourgisch ancestry wrong 92% of the time. It most often shows up as "generic Northwest European." Similarly, 23andme -- the best in the business -- can't identify Italian ancestry 50% of the time. It often shows up incorrectly as Middle Eastern or Generic Southern European.

The moral of this story is to be patient with the science. It's not 100% there yet.

If you have documented ancestry from one region, trust your documents.

If you don't have any cousins from a pool you were identified as, then chances are it was a miscall. (For example, if you have documented Italian ancestry, but it says you are 1/8 Middle Eastern or 1/8 Spanish), then unless you have a known great-grandparent that is 100% such, it's probably a miscall. (This would mean your parent would test as 1/4, by the way).

Finally, there is a series problem with testing sites, particularly FTDNA's, with the issue of timing. If you go back far enough, we are ALL Africans, right? Yet a DNA test telling you that you were African would not be too useful. Do they mean recently or in the past?

Similarly, as has been well-documented, most European ancestry can be broken down into 3 big chunks: ancient hunter gatherers (Ancient Western Europeans, most similar modern population = Lithuanians); ancient farmers (Ancient Near Easterners, most similar modern populations include Greeks, Sardinians, others); and ancient pastoralists/horse rearers (Ancient Eurasian Steppe Dwellers, most similar modern populations include Ukrainians). But the migrations were really, truly all over the place.

Ancient Near Easterners are NOT modern Near Easterners. Ancient hunter gatherers in France are NOT the modern French, etc.

If a test tells you that you have some Near Eastern blood, it often is sensing this ancient signal.

It doesn't do you much good for them to say that 6000 years ago, you had some ancestry in the Near East. Everyone did.

Tuesday, May 24, 2016

Neandertals Never Died; Just Their Direct Sirelines and Matrilines

From a piece by Faye Flam in none other than Bloomberg, comes this wonderfully succinct nugget that expresses something that readers of this blog know I ascribe to:

"Scientists have also revised their view of Neanderthal extinction – long attributed to some deficit on their part. Maybe nothing dramatic happened at all, said Hawks. They would have made up a small fraction of the world’s population, and when larger groups of modern humans joined them in Europe they might have simply been absorbed."

(emphasis added)

This is what I coined the "Demography not Drama" explanation.

It is likely the Neandertal population was tiny, and when modern humans entered Europe, they simply absorbed them, perhaps even absorbed multiple sub-populations (which the genetics data now supports too).

With each generation, there is a great chance that a male line or a female line will disappear. All it takes is for a man to have only daughters, or a woman to have only sons. Older lines (which have been around for more generations) face longer odds of appearing to have survived, because each generation increases the chances a line will appear to have died out. The patrilines and matrilines from a group starting with a smaller population size will also appear to have died out over time.

We have seen this occur in the modern world, both in the example of surnames on isolated islands (the families didn't die out, but the surnames eventually greatly reduced in numbers because of the randomness of males having male children) and with thoroughbreds (the original thoroughbred founding population included 30+ male horses, but only 3 sirelines (akin to surnames or Y-chromosome haplogroups) have survived.

This doesn't mean the others "died out." Like Neandertals, their genes live on among us.

Friday, February 5, 2016

The Sad Case of the Orthodoxy and the Posth Article on Pleistocene Demographics

Just a couple months ago, in the context of the peopling of Ireland, I emphasized on Eupeida (and here) how important it is to put all the Theories Du Jour that are based on modern uniparental distributions through a model based on population demographics and sound logic.

Specifically, I emphasized that ancient population sizes were minuscule compared to modern ones, and that if a population started a long long time ago, with a size that was way way small -- compared to subsequent waves -- that it would give a false signal that the original population was "conquered" or "outcompeted" or "never existed" or originated somewhere incorrect. I cautioned against those four errors.

This engendered quite the debate on Eupedia forums. When backed into a corner and shown the weakness of his "R1b Were Studly Conquerors Theory," the "blindly following the current orthodoxy" folks react badly.

Many "Interwebz Scientistz" fail to grasp these concepts. They favor their own wacky, biased theories based on what they see today only. If a land is populated by one people, they must be all conquering studs, right?

Today, Posth et. al put out an extensive paper on Pleistocene demographics.

Its shocking discovery? Just like Y DNA Hg C existed in Europe in tiny numbers among the very first Europeans, so did mtDNA Hg M.

M disappeared eventually, due to the simple fact that its initial population size was tiny, and that because it had been there so long, the odds that certain women didn't have daughters, each generation, eventually meant it was not passed on. Remember, we're talking uniparental markers here.

The authors commented exactly as I did: up to now, people mistakenly believed that Hg M never set foot in Europe -- or that if it did, it was killed off or whatever by a new wave. Sorry, both theories are wrong.

It is WONDERFUL to see another peer-reviewed, scholarly paper making this exact same point, and backing it up with newfound data.

As the paper indicates:

-These first hunter gatherers started with a TINY initial population size.

-There is a loss every generation of males having males or females having female offspring.

-I've calculated the approximate odds of a male not having a male child or a female not having a female child (i.e. looking like their uniparental marker was "conquered") at 12.5%, each generation, totally random.

-The longer a population has existed in a locale (and being free of mutations), the more generations go by, the greater the chance that random happenstance, chance, etc. will make it appear that a Hg either never existed or was slaughtered in a mass killing/enslavement/mate preference.

Now you have further proof of it.

I'm waiting to hear how Hg M died out because of some studly new more beautiful females who moved in. Oh woops, Maciamo doesn't post here. And he doesn't himself bear Hg M. And M is not linked to R1b...

Saturday, January 30, 2016

In Praise of Roberta Estes and DNAeXplained.com

In a world of pseudo-science and echo chambers, a few blogs stick out for being mostly in touch with reality. In the world of Ancient DNA, Dienekes, although less active than before, has pioneered much in the field of DNA, and still has many serious scientists who comment there.

In the world of DNA for Genealogy, one blog sticks out. It is Roberta Estes' DNAeXplained.com. Of all the blogs and websites dedicated to disseminating information about DNA, hers is consistently factual, science-based, and yet easy to understand.

This scientist came across a few of her posts, and I daresay they are mandatory reading for anyone seeking a better understanding of their DNA. Below are links and highlights:

"Determining Ethnicity Percentages"

Step 1: Creation of the underlying population data base.
Don’t we wish this was as simple as it sounds. It isn’t. In fact, this step is the underpinnings of the accuracy of the ethnicity predictions. The old GIGO (garbage in, garbage out) concept applies here. . . .

The third way to obtain this type of information is by inference. Both Ancestry.com and 23andMe do some of this. Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants. In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location. Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations. That’s only 115 samples, on average, per location to represent all of that population. That’s pretty slim pickins. Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16. The regions they cover are shown below. . .

No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light. . . .

"Are You Native American?"

"having Haplogroup Origins and Ancestral Origins indicating Native American ancestry does not necessarily mean you are Native American or have Native American heritage. This is a very pervasive myth that needs to be dispelled. . . .

The good news is that more and more people are DNA testing. The bad news is that errors in the system are tending to become more problematic, or said another way, GIGO – Garbage in, Garbage Out.

....

There are a very limited number of major haplogroups that include Native American results. For mitochondrial DNA, they are A, B, C, D, X and possibly M. I maintain a research list of the subgroups which are Native. Each of these base haplogroups also have subgroups which are European and/or Asian. The same holds true for Native American Y haplogroups Q and C.
In the Haplogroup Origins and Ancestral Origins, there are many examples where Non-Native haplogroups are assigned as Native American, such as haplogroup H1a below. Haplogroup H is European...

One of the problems we have today is that because there are so many people who carry the oral history of grandmother being “Cherokee,” it has become common to “self-assign” oneself as Native. That’s all fine and good, until one begins to “self-assign” those haplogroups as Native as well – by virtue of that “Native” assignment in the Family Tree DNA data base. That’s a horse of a different color.

Monday, January 25, 2016

Calculating Matches on Gedmatch: Why CentiMorgans (cM) are more important than SNPs

I have discovered that very very very few people know this, so it is worth posting.

The different testing companies, 23andme, Ancestry, FTDNA, etc. all test slightly different SNPs. In other words, the "points" on the genome, the "genes" that are tested vary from company to company.

I have seen some people on Gedmatch dismiss a match because "it doesn't have enough SNPs." Or because "it's not above the SNP threshold."

Gedmatch itself uses a 7 cM and 700 SNPs match to qualify someone as a cousin.

The SNP part is faulty thinking.

Because the testing companies don't test the same SNPs, you can have long stretches that match with a low number of SNPs.

Case in point: Someone who tested on 23andme like I did matched me for 10.0 cM and 1024 SNPs. That same person on FTNDA matched me for 10.0 cM but just 510 SNPs. FTDNA tested half of the SNPs that 23andme did (or half of the same set).

This is key to grasp. Expect closer matches to you on Gedmatch if your kits start with the same letter (i.e. M for 23andme, F for FTDNA, and A for Ancestry.) DO NOT DISMISS LOW SNP MATCHES.