Showing posts with label Ancestry Composition. Show all posts
Showing posts with label Ancestry Composition. Show all posts

Friday, August 18, 2017

Are Ethnicity Percentages and Ancestry Calculators from DNA Tests Accurate?

The media has blasted headlines this week that show an incredible ignorance of DNA testing for ethnic percentages.  One, which could have been pulled out of a 1980s tabloid for its ridiculousness, screeched, "Neo-Nazis are taking genetic tests and are deeply upset by the results!"

Neither of the two writers delved too deeply into the subject, and that is because shorthand reporting is easier.  As we've posted, again and again, most of the three major DNA testing sites disclose quite openly that their science is far from perfect.  For example, that if one is German, French, Dutch, Belgian, Austrian, or Swiss, that they cannot discern your ancestry 92% of the time.  (With the US having more people of German ancestry than even English or Irish -- that's a big deal).

In fact, it's quite common for someone taking a test from three different websites to receive three different results!  And as the post beneath this one shows, trying the 40 or so other "ethnicity calculators" available for free on Gedmatch produced...40 different results.

As I often say: if 5 different scales produced 5 (vastly different) weights, you would know that at least four of 'em don't work!  :-)

Anyway, for those looking for perspective, we shouldn't highlight the bad, so we've decided to highlight the good -- or the excellent, rather.

One of the best posts we have seen on the topic comes courtesy of a blog called The Legal Genealogist.  It's called, "Those Percentages If You Must" -- and is a Must Read for people curious about whether ancestry calculations from DNA tests are accurate.

It first, rather hilariously, goes into the various myths and misperceptions about DNA and human history.  Concepts like, "black Irish" or "I have some Native American in me."  Concepts that plague the world of pop-DNA-testing.

After it goes through the science (in easy to understand terms), it reveals what I have posted here time and time again:

DNA testing IS GREAT and REMARKABLY PRECISE for finding you cousins.  It CAN tell you if you are a third-cousin, once-removed (with that kind of precision).  If you don't know your heritage, and that cousin is, for example, 100% Native American -- then it follows that you too have a pair of Native American great-great-great grandparents.

DNA testing IS NOT good at ethnicity percentages and ethnic calculation.  As if anything could be so precise as to tell you that you are "4.2% Jewish."  The science is still just not ready for prime time, and many underrepresented populations, even in Europe, still confound the tests.

(As an aside, ancestry calculators should all produce nice and even results when people get back to the pre-travel era."  In other words, if you had 64 ancestors that were alive in 1500 AD, you should only see multiples of 1.56% chunks, right?  Since no one is half a human!)

The article succinctly concludes with:

DNA testing is a wonderful tool. It can connect us with cousins we’d have never found otherwise to help us reconstruct our family histories.

But in terms of “am I Native American?” “what tribe did I come from in Africa?” “am I 25% Irish?” No. No, no, no.  That’s the absolute weakest aspect of DNA testing. 

Indeed.  Well said.

Friday, July 21, 2017

AND THE WINNER IS... (Comparing Admixture/Heritage Tests on Gedmatch)


  • We ran exhaustive tests of several commercial and free DNA-testing labs and ethnicity calculators.  
  • To test the sites, we used only individuals with well-documented, double confirmed, 100% known ancestry.  
  • We tested multiple males from multiple lines to assure as much as humanly possible no extra-parental events (bastardy) occurred.  
  • We even tested minor nobility with documented ties to geographic locales.  
  • We used individuals who do not come from cities or places of cosmopolitanism (influx of foreigners).  
  • We tested only people with all four grandparents from the same locale.  
  • We tested multiple people from different countries in Europe.
As we've posted before, of the commercial labs, 23andme takes first prize, and is the worst.  23andme provides the most conservative and accurate ethnic ancestry approximations.

We have also completed our testing of all of the ancestry composition tests available on GEDMATCH.  Below is a summary, the results, and the rankings.

  • First of all, the specialty labs, Ethiohelix, Gedrosia DNA, puntDNAL, etc. do not even come close to being accurate, at least for individuals of European heritage.
  • None of MDLP's tests passed our accuracy gauntlet and correctly called west European DNA.
1. The overall winner, and the clear winner of all the tools currently available on Gedmatch, is the Eurogenes K13 test.  It was pretty darn good at distinguishing DNA from various western European lands, for people of "purebred" ancestry.

2. Coming in second was Eurogenes EUtest K15 v2, which also had a pretty darn good record of accurate calls.

3. An honorable mention, and a close third, with accurate calls roughly as close to the second-place finisher, was Dodecad's K12b test.

  • No other tests besides those three were even close to "often accurate."
  • No tests, including those three, were much use for accurately calling the ancestry of European "mutts."  We found that the same tests that were accurate with individuals with 100% heritage from one country, were of limited value for serving as an oracle (predicting accurately) the ancestry of individuals of mixed European heritage.

Tuesday, October 11, 2016

How DNA Ancestry Testing Works and How Can I Know It's Accurate

When a commercial DNA testing site like or 23andme or FTDNA tests your DNA, they do not know which snippet came from which of your parents.

For example, if at a given point (a gene, in popular parlance), you have a "C" from your dad and a "T" from you mom (meaning you have brown eyes, but carry the blue-eyes gene), the testing service doesn't know which "letter" came from which parent.

What they then do is try to guess, stringing your DNA out into small chunks or strings of letters.

They then compare these to DNA in their reference database.  23andme's reference database, which is one of the best, if not the best in the world, only has about 11,000 samples in it.  To represent the whole world!

So if you have ancestry from a big country (like France or Germany) or a country that has pockets of deep isolation (like Italy), the odds -- that they have someone from your corner of the country, or your little isolated craggy valley in some mountain chain -- are small.

They then compare the little strings of letters and come up with a likelihood that you have ancestry from one of those reference populations.

23andme has the most scientific test in the business, but it gets French/German/Belgian/Dutch/Swiss/Austrian/Luxembourgisch ancestry wrong 92% of the time.  It most often shows up as "generic Northwest European."  Similarly, 23andme -- the best in the business -- can't identify Italian ancestry 50% of the time.  It often shows up incorrectly as Middle Eastern or Generic Southern European.

The moral of this story is to be patient with the science.  It's not 100% there yet.

If you have documented ancestry from one region, trust your documents.

If you don't have any cousins from a pool you were identified as, then chances are it was a miscall.  (For example, if you have documented Italian ancestry, but it says you are 1/8 Middle Eastern or 1/8 Spanish), then unless you have a known great-grandparent that is 100% such, it's probably a miscall.  (This would mean your parent would test as 1/4, by the way).

Finally, there is a series problem with testing sites, particularly FTDNA's, with the issue of timing.  If you go back far enough, we are ALL Africans, right?  Yet a DNA test telling you that you were African would not be too useful.  Do they mean recently or in the past?

Similarly, as has been well-documented, most European ancestry can be broken down into 3 big chunks: ancient hunter gatherers (Ancient Western Europeans, most similar modern population = Lithuanians); ancient farmers (Ancient Near Easterners, most similar modern populations include Greeks, Sardinians, others); and ancient pastoralists/horse rearers (Ancient Eurasian Steppe Dwellers, most similar modern populations include Ukrainians). But the migrations were really, truly all over the place.

Ancient Near Easterners are NOT modern Near Easterners.  Ancient hunter gatherers in France are NOT the modern French, etc.

If a test tells you that you have some Near Eastern blood, it often is sensing this ancient signal.

It doesn't do you much good for them to say that 6000 years ago, you had some ancestry in the Near East.  Everyone did.

Wednesday, October 28, 2015

What Is the Best and Most Accurate Ancestry Calculator (DNA Testing)?

What Is the Best and Most Accurate Ancestry or Admixture Calculator from DNA Testing?

We Review 23andme, AncestryDNA, Family Tree DNA (FTDNA), DNA.Land, Dodecad, Eurogenes, etc.

Judging from community discussions in online forums, "Admixture" tests, where a company or entity takes your raw DNA data, puts it into a calculator, and then purports to tell you where your ancestors came from -- these are all the rage.  It is not rare for seemingly educated individuals to post on the Internet sheer and utter nonsense about their results, for example, assuming that a calculator identified their ancestry with something close to 100% accuracy.

In the online world, there is no such thing as perfect privacy.  And in DNA, there is no such thing as 100% accuracy for ancestry calculators. 

This is because all people are admixed, but not all ethnic groups form part of the samples.  Put another way, if your ancestors come from a valley in Switzerland where no one has ever been tested, you might show up in a test as French, German, Italian, Austrian, but not Swiss. 

You might say to yourself that you have documented ancestry back to the dawn of time that you are from Switzerland.  You may match other Swiss people exactly.  But because the Swiss are indeed mixes of the groups above, and because there are no specific, micro-targeted Swiss samples in the hypothetical database that match you more closely than those other nationalities, the test would be woefully inaccurate to YOU.  After all, you don't want a test to tell you you might be Northern Italian, if you are Swiss.  (For that matter, do you NEED a test to tell you that?  See below.)

In the online privacy world, they've named protections that are scientifically the best (and do their job pretty darn well) "Pretty Good Privacy."  In the DNA world, all we can hope for is "Pretty Good Accuracy" -- ancestry calculators that are scientifically grounded, don't make claims beyond what they can really do, and ones that get the broad regions correct in the very least.

The coolest benefit about living in a college town (Berkeley for this blogger) is that there are a ton of people from all over the world, with pretty well-defined ancestry.  For example, that Danish exchange student with 500 years of documented ancestors in Denmark?  That's a good candidate for testing some of these calculators.  Enough friends of mine have taken DNA tests, and we've plugged the results in the calculators across several paysites (testing companies) like 23andme and AncestryDNA, and free calculators, like the ones available on Gedmatch.  Who came out on top?

By far, the best and most accurate ancestry calculator is on 23andme.   Like all good scientists, they are humble instead of full of hubris.  They don't profess to give you one set of results and say, "this is it."  Instead, they give you three different results: standard, conservative, and speculative.  Each is pretty darn accurate for most of the people we know who have tested there and other sites.  Bottom line: 23andme's "Ancestry Composition" feature is outstanding, and the best, most accurate one online we could find.

It is our opinion that the least accurate ancestry calculator is at the new site  And the one on FTDNA is a close second.  Both are terrible.  Almost everyone who used the feature on reported that the calculator is way off; just not ready for prime time at time of writing this post.

How do these calculators work?  Well, remember, the data that comes out is only as good as the data that comes in.  It is worth to always remember the concept that computer programmers call "GIGO: Garbage In, Garbage Out."  What this means is that if the data on which a conclusion is based is faulty, the answer will also be faulty.  With calculators, this manifests itself two ways: with a shifted focus, or faulty or incomplete baseline data.

By a different focus, we mean: Several calculators, for example, the MDLP Ethnicity Calculator, also offered (with Eurogenes and Dodecad and Gedrosia) at Gedmatch, stands for Magnus Ducatiae Lituania Project.  As you might have guessed from its name, it focuses on the people from lands that used to form the Grand Duchy of Lithuania: places in Northeast Europe, including Poland, Estonia, etc. 

MDLP seeks to be very good at calculating ethnic tidbits of interest to those populations.  But is is good for determining the difference between, say, a Catalonian Spaniard and a Northern Italian?  No, it's actually quite bad on that front.  That's simply not its focus. 

Similarly, there are other calculators on Gedmatch that exist to focus on and cater to Asians, Africans, even mixed race folks.  And within European populations, you have other focuses, like Dodecad, which seems Grecocentric, for lack of a better word.  None of these will do that great outside their focus areas.  So take the results from those ones with a grain of salt, unless you happen to hail from their regions of focus.

Don't believe that?  Think I'm being extreme?  If you are European, try putting your data in a calculator that is focused on another population.  Like the East Asian-focused calculators.  It won't tell you that you are NOT East Asian.  It will tell you which East Asian population you resemble the most.  To be clear: if all a calculator has is East Asian samples, a European will be told he or she is Japanese or Chinese.  This same concept applies within European focused calculators at the regional level.

In terms of bad baselines, recall the Swiss example above.  Europe is filled with micropopulations that exhibit a high degree of population homogeneity (a little inbred, to use the pejorative term).  If a calculator does not have a sample from your micropopulation (the narrow region where your ancestor lived for millennia), then you will get a faulty reading. 

Put simply (to use a French example): It's a big country.  Normans are not Basques, Provencals are not Bretagnes, etc.  That is why the best calculators are HONEST.  23andme discloses quite readily that for the huge populations in the middle of Europe (French, Germans, but also Benelux countries, etc.), it cannot spot the DNA with certainty 92% of the time. 

Does the 23andme website have any drawbacks?  Sure it does.  But they are minor compared to the others. 

First, its "Countries of Ancestry" feature is not what it could be.  But it's important to understand three things:  (1) This is NOT their ancestry calculator, but another feature entirely, so perhaps it's unfair perhaps for us to even review it in this space.  (2) It's experimental, and they state that.  (3) They are wisely phasing it out.  What was the problem with that feature?  Well, it gave you the list of countries of people who have the most matches with you.  Let's say for example you are half Italian, half Polish (a common mix in Chicago).  In other parts of Chicago, another common mix is half Polish, half Irish.  For whatever reason, people of Irish heritage have tested themselves at far greater numbers than the others. Your Polish DNA would overlap (match) with the people who reported they were half Polish, half Irish.  And this feature would then tell you that "a high percentage of the people who have DNA similar to yours are from Ireland."  Do you understand?  It's a huge problem, especially for smaller populations, especially because so many Americans are now half this, half that.  It's just not that edifying then.

23andme also suffers from the same sample issues as many of the other ancestry calculators.  For example, 85% of Italian Americans (TRANSLATION: potential customers, since most people who test are from Britain or the US) hail from just 3 regions in the deep south of Italy: Campania (Naples), Calabria, and Sicily.  Yet the population samples that most of these websites use are from Tuscany.  Even though Dante tried to meld them, Tuscans are not Sicilians and vice-versa. 

Often, these calculators when they see Sicilian or rural Southern Italian genes, they, in effect, say: we don't know what you are! you are kind of Italian but you also resemble, a little bit, people from Cyprus or Jews.  So they give an odd result.  And then you have someone tested who says, "I might be Jewish."  No.  The answer is that your people were not included in the data-set by which the baseline was developed.  If they were, the calculator would recognize you as a run of the mill Sicilian.

All online ancestry alculators also suffer from lack of inter-operability and non-standardized terms.  For example, among the calculators on Gedmatch, some use the term "Caucasian" to mean "generalized European" (which is how it used in common parlance, of course).  Others use it to mean, the specific, like, from Soviet Georgia, Armenia, etc.

Here's the bottom line: don't expect any ethnicity or ethnic-origins calculator to be 100% correct.  Don't expect new insights if you have confirmed records.  In other words, if you look just like your dad (you're not a bastard), and you're not adopted, and you have records going back centuries -- why do you need an ethnicity calculator to begin with?

These admixture tests can help if you were adopted, and want to have a sense of where to start.  But keep in mind, the largest plurality of Americans come from German heritage, and yet the best currently cannot identify German DNA 92% of the time.

Avoid the mythology and those who oversimplify.  There are reliable sources out there in genetic genealogy, like Debbie Kennett -- and there are a lot of charlatans.  Be careful whenever someone oversimplifies to the point of exaggeration, falls into stereotypes, or tells you what you want to hear.  With DNA as with everything, the most parsimonious answer is often the best.  The exotic is often wrong.

As the science improves, you can't go wrong using the Standard or Conservative setting on the 23andme Ancestry Composition test.