Sunday, June 10, 2018

Ancestry DNA Issues Revised Ancestry Estimates, Finds that Germans Exist

Judy G. Russell, the Legal Genealogist, is out with a fantastic new post on AncestryDNA's new ethnicity estimate percentages.

As she wryly notes in the opening, she is delighted to find out that they have discovered that Germans exist.

We've wrote about this before, as have others.  The major testing sites -- some of which are run by people who seem hostile to Germans (America's biggest ethnic group) -- have written Germans off the map.  23andme is particularly bad at identifying German DNA.  They disclose it too, but they bury it in the fine print.

We have been repeatedly depressed by newbies, who know from good paper records that they are a quarter German (or Swiss, or French, or Austrian) say, "duh, gee, duh, this unscientific website tells me I am really 21.2% English wow gee duh am I adopted?"  NO!  The science isn't there yet.  As Judy Russell says, "it's not quite soup."

And it STILL isn't quite soup.  This post focuses on Germans, but the major testing services have an equal problem with Italians, another major American ethnic group.  Poor Italians who get tested often end up with anything but Italian.  (Spare me your pseudoscience on how Italy has been invaded.  EVERY country has been invaded.)  Italy is a long country with many peaks and valleys, and for much of its history was an exporter of population to surrounding areas.  The testing sites need more samples to identify all the different permutations of Italians.

Bottom line, as we've said before, and as every credible scientist says - DO NOT TRUST the ethnicity estimates of the testing services.

Tuesday, October 11, 2016

How DNA Ancestry Testing Works and How Can I Know It's Accurate

When a commercial DNA testing site like or 23andme or FTDNA tests your DNA, they do not know which snippet came from which of your parents.

For example, if at a given point (a gene, in popular parlance), you have a "C" from your dad and a "T" from you mom (meaning you have brown eyes, but carry the blue-eyes gene), the testing service doesn't know which "letter" came from which parent.

What they then do is try to guess, stringing your DNA out into small chunks or strings of letters.

They then compare these to DNA in their reference database.  23andme's reference database, which is one of the best, if not the best in the world, only has about 11,000 samples in it.  To represent the whole world!

So if you have ancestry from a big country (like France or Germany) or a country that has pockets of deep isolation (like Italy), the odds -- that they have someone from your corner of the country, or your little isolated craggy valley in some mountain chain -- are small.

They then compare the little strings of letters and come up with a likelihood that you have ancestry from one of those reference populations.

23andme has the most scientific test in the business, but it gets French/German/Belgian/Dutch/Swiss/Austrian/Luxembourgisch ancestry wrong 92% of the time.  It most often shows up as "generic Northwest European."  Similarly, 23andme -- the best in the business -- can't identify Italian ancestry 50% of the time.  It often shows up incorrectly as Middle Eastern or Generic Southern European.

The moral of this story is to be patient with the science.  It's not 100% there yet.

If you have documented ancestry from one region, trust your documents.

If you don't have any cousins from a pool you were identified as, then chances are it was a miscall.  (For example, if you have documented Italian ancestry, but it says you are 1/8 Middle Eastern or 1/8 Spanish), then unless you have a known great-grandparent that is 100% such, it's probably a miscall.  (This would mean your parent would test as 1/4, by the way).

Finally, there is a series problem with testing sites, particularly FTDNA's, with the issue of timing.  If you go back far enough, we are ALL Africans, right?  Yet a DNA test telling you that you were African would not be too useful.  Do they mean recently or in the past?

Similarly, as has been well-documented, most European ancestry can be broken down into 3 big chunks: ancient hunter gatherers (Ancient Western Europeans, most similar modern population = Lithuanians); ancient farmers (Ancient Near Easterners, most similar modern populations include Greeks, Sardinians, others); and ancient pastoralists/horse rearers (Ancient Eurasian Steppe Dwellers, most similar modern populations include Ukrainians). But the migrations were really, truly all over the place.

Ancient Near Easterners are NOT modern Near Easterners.  Ancient hunter gatherers in France are NOT the modern French, etc.

If a test tells you that you have some Near Eastern blood, it often is sensing this ancient signal.

It doesn't do you much good for them to say that 6000 years ago, you had some ancestry in the Near East.  Everyone did.

Saturday, January 30, 2016

In Praise of Roberta Estes and

In a world of pseudo-science and echo chambers, a few blogs stick out for being mostly in touch with reality.  In the world of Ancient DNA, Dienekes, although less active than before, has pioneered much in the field of DNA, and still has many serious scientists who comment there.

In the world of DNA for Genealogy, one blog sticks out.  It is Roberta Estes'  Of all the blogs and websites dedicated to disseminating information about DNA, hers is consistently factual, science-based, and yet easy to understand. 

This scientist came across a few of her posts, and I daresay they are mandatory reading for anyone seeking a better understanding of their DNA.  Below are links and highlights:

Step 1:  Creation of the underlying population data base.
Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here. . . .

The third way to obtain this type of information is by inference.  Both and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below. . .

No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light. . . .

"having Haplogroup Origins and Ancestral Origins indicating Native American ancestry does not necessarily mean you are Native American or have Native American heritage. This is a very pervasive myth that needs to be dispelled. . . .

The good news is that more and more people are DNA testing.  The bad news is that errors in the system are tending to become more problematic, or said another way, GIGO – Garbage in, Garbage Out.


There are a very limited number of major haplogroups that include Native American results.  For mitochondrial DNA, they are A, B, C, D, X and possibly M.  I maintain a research list of the subgroups which are Native.  Each of these base haplogroups also have subgroups which are European and/or Asian.  The same holds true for Native American Y haplogroups Q and C.
In the Haplogroup Origins and Ancestral Origins, there are many examples where Non-Native haplogroups are assigned as Native American, such as haplogroup H1a below.  Haplogroup H is European...

One of the problems we have today is that because there are so many people who carry the oral history of grandmother being “Cherokee,” it has become common to “self-assign” oneself as Native.  That’s all fine and good, until one begins to “self-assign” those haplogroups as Native as well – by virtue of that “Native” assignment in the Family Tree DNA data base.  That’s a horse of a different color.

Monday, January 25, 2016

Calculating Matches on Gedmatch: Why CentiMorgans (cM) are more important than SNPs

I have discovered that very very very few people know this, so it is worth posting.

The different testing companies, 23andme, Ancestry, FTDNA, etc. all test slightly different SNPs.  In other words, the "points" on the genome, the "genes" that are tested vary from company to company.

I have seen some people on Gedmatch dismiss a match because "it doesn't have enough SNPs."  Or because "it's not above the SNP threshold."

Gedmatch itself uses a 7 cM and 700 SNPs match to qualify someone as a cousin.

The SNP part is faulty thinking.

Because the testing companies don't test the same SNPs, you can have long stretches that match with a low number of SNPs.

Case in point: Someone who tested on 23andme like I did matched me for 10.0 cM and 1024 SNPs.  That same person on FTNDA matched me for 10.0 cM but just 510 SNPs.  FTDNA tested half of the SNPs that 23andme did (or half of the same set). 

This is key to grasp.  Expect closer matches to you on Gedmatch if your kits start with the same letter (i.e. M for 23andme, F for FTDNA, and A for Ancestry.)  DO NOT DISMISS LOW SNP MATCHES.