Showing posts with label Gedmatch. Show all posts
Showing posts with label Gedmatch. Show all posts

Friday, July 21, 2017

AND THE WINNER IS... (Comparing Admixture/Heritage Tests on Gedmatch)

Methodology:


  • We ran exhaustive tests of several commercial and free DNA-testing labs and ethnicity calculators.  
  • To test the sites, we used only individuals with well-documented, double confirmed, 100% known ancestry.  
  • We tested multiple males from multiple lines to assure as much as humanly possible no extra-parental events (bastardy) occurred.  
  • We even tested minor nobility with documented ties to geographic locales.  
  • We used individuals who do not come from cities or places of cosmopolitanism (influx of foreigners).  
  • We tested only people with all four grandparents from the same locale.  
  • We tested multiple people from different countries in Europe.
As we've posted before, of the commercial labs, 23andme takes first prize, and Ancestry.com is the worst.  23andme provides the most conservative and accurate ethnic ancestry approximations.

We have also completed our testing of all of the ancestry composition tests available on GEDMATCH.  Below is a summary, the results, and the rankings.

  • First of all, the specialty labs, Ethiohelix, Gedrosia DNA, puntDNAL, etc. do not even come close to being accurate, at least for individuals of European heritage.
  • None of MDLP's tests passed our accuracy gauntlet and correctly called west European DNA.
1. The overall winner, and the clear winner of all the tools currently available on Gedmatch, is the Eurogenes K13 test.  It was pretty darn good at distinguishing DNA from various western European lands, for people of "purebred" ancestry.

2. Coming in second was Eurogenes EUtest K15 v2, which also had a pretty darn good record of accurate calls.

3. An honorable mention, and a close third, with accurate calls roughly as close to the second-place finisher, was Dodecad's K12b test.

  • No other tests besides those three were even close to "often accurate."
  • No tests, including those three, were much use for accurately calling the ancestry of European "mutts."  We found that the same tests that were accurate with individuals with 100% heritage from one country, were of limited value for serving as an oracle (predicting accurately) the ancestry of individuals of mixed European heritage.



Monday, January 25, 2016

Calculating Matches on Gedmatch: Why CentiMorgans (cM) are more important than SNPs

I have discovered that very very very few people know this, so it is worth posting.

The different testing companies, 23andme, Ancestry, FTDNA, etc. all test slightly different SNPs.  In other words, the "points" on the genome, the "genes" that are tested vary from company to company.

I have seen some people on Gedmatch dismiss a match because "it doesn't have enough SNPs."  Or because "it's not above the SNP threshold."

Gedmatch itself uses a 7 cM and 700 SNPs match to qualify someone as a cousin.

The SNP part is faulty thinking.

Because the testing companies don't test the same SNPs, you can have long stretches that match with a low number of SNPs.

Case in point: Someone who tested on 23andme like I did matched me for 10.0 cM and 1024 SNPs.  That same person on FTNDA matched me for 10.0 cM but just 510 SNPs.  FTDNA tested half of the SNPs that 23andme did (or half of the same set). 

This is key to grasp.  Expect closer matches to you on Gedmatch if your kits start with the same letter (i.e. M for 23andme, F for FTDNA, and A for Ancestry.)  DO NOT DISMISS LOW SNP MATCHES.


Wednesday, October 28, 2015

What Is the Best and Most Accurate Ancestry Calculator (DNA Testing)?

What Is the Best and Most Accurate Ancestry or Admixture Calculator from DNA Testing?


We Review 23andme, AncestryDNA, Family Tree DNA (FTDNA), DNA.Land, Dodecad, Eurogenes, etc.


Judging from community discussions in online forums, "Admixture" tests, where a company or entity takes your raw DNA data, puts it into a calculator, and then purports to tell you where your ancestors came from -- these are all the rage.  It is not rare for seemingly educated individuals to post on the Internet sheer and utter nonsense about their results, for example, assuming that a calculator identified their ancestry with something close to 100% accuracy.

In the online world, there is no such thing as perfect privacy.  And in DNA, there is no such thing as 100% accuracy for ancestry calculators. 

This is because all people are admixed, but not all ethnic groups form part of the samples.  Put another way, if your ancestors come from a valley in Switzerland where no one has ever been tested, you might show up in a test as French, German, Italian, Austrian, but not Swiss. 

You might say to yourself that you have documented ancestry back to the dawn of time that you are from Switzerland.  You may match other Swiss people exactly.  But because the Swiss are indeed mixes of the groups above, and because there are no specific, micro-targeted Swiss samples in the hypothetical database that match you more closely than those other nationalities, the test would be woefully inaccurate to YOU.  After all, you don't want a test to tell you you might be Northern Italian, if you are Swiss.  (For that matter, do you NEED a test to tell you that?  See below.)

In the online privacy world, they've named protections that are scientifically the best (and do their job pretty darn well) "Pretty Good Privacy."  In the DNA world, all we can hope for is "Pretty Good Accuracy" -- ancestry calculators that are scientifically grounded, don't make claims beyond what they can really do, and ones that get the broad regions correct in the very least.

The coolest benefit about living in a college town (Berkeley for this blogger) is that there are a ton of people from all over the world, with pretty well-defined ancestry.  For example, that Danish exchange student with 500 years of documented ancestors in Denmark?  That's a good candidate for testing some of these calculators.  Enough friends of mine have taken DNA tests, and we've plugged the results in the calculators across several paysites (testing companies) like 23andme and AncestryDNA, and free calculators, like the ones available on Gedmatch.  Who came out on top?

By far, the best and most accurate ancestry calculator is on 23andme.   Like all good scientists, they are humble instead of full of hubris.  They don't profess to give you one set of results and say, "this is it."  Instead, they give you three different results: standard, conservative, and speculative.  Each is pretty darn accurate for most of the people we know who have tested there and other sites.  Bottom line: 23andme's "Ancestry Composition" feature is outstanding, and the best, most accurate one online we could find.

It is our opinion that the least accurate ancestry calculator is at the new site DNA.land.  And the one on FTDNA is a close second.  Both are terrible.  Almost everyone who used the feature on DNA.land reported that the calculator is way off; just not ready for prime time at time of writing this post.

How do these calculators work?  Well, remember, the data that comes out is only as good as the data that comes in.  It is worth to always remember the concept that computer programmers call "GIGO: Garbage In, Garbage Out."  What this means is that if the data on which a conclusion is based is faulty, the answer will also be faulty.  With calculators, this manifests itself two ways: with a shifted focus, or faulty or incomplete baseline data.

By a different focus, we mean: Several calculators, for example, the MDLP Ethnicity Calculator, also offered (with Eurogenes and Dodecad and Gedrosia) at Gedmatch, stands for Magnus Ducatiae Lituania Project.  As you might have guessed from its name, it focuses on the people from lands that used to form the Grand Duchy of Lithuania: places in Northeast Europe, including Poland, Estonia, etc. 

MDLP seeks to be very good at calculating ethnic tidbits of interest to those populations.  But is is good for determining the difference between, say, a Catalonian Spaniard and a Northern Italian?  No, it's actually quite bad on that front.  That's simply not its focus. 

Similarly, there are other calculators on Gedmatch that exist to focus on and cater to Asians, Africans, even mixed race folks.  And within European populations, you have other focuses, like Dodecad, which seems Grecocentric, for lack of a better word.  None of these will do that great outside their focus areas.  So take the results from those ones with a grain of salt, unless you happen to hail from their regions of focus.

Don't believe that?  Think I'm being extreme?  If you are European, try putting your data in a calculator that is focused on another population.  Like the East Asian-focused calculators.  It won't tell you that you are NOT East Asian.  It will tell you which East Asian population you resemble the most.  To be clear: if all a calculator has is East Asian samples, a European will be told he or she is Japanese or Chinese.  This same concept applies within European focused calculators at the regional level.

In terms of bad baselines, recall the Swiss example above.  Europe is filled with micropopulations that exhibit a high degree of population homogeneity (a little inbred, to use the pejorative term).  If a calculator does not have a sample from your micropopulation (the narrow region where your ancestor lived for millennia), then you will get a faulty reading. 

Put simply (to use a French example): It's a big country.  Normans are not Basques, Provencals are not Bretagnes, etc.  That is why the best calculators are HONEST.  23andme discloses quite readily that for the huge populations in the middle of Europe (French, Germans, but also Benelux countries, etc.), it cannot spot the DNA with certainty 92% of the time. 

Does the 23andme website have any drawbacks?  Sure it does.  But they are minor compared to the others. 

First, its "Countries of Ancestry" feature is not what it could be.  But it's important to understand three things:  (1) This is NOT their ancestry calculator, but another feature entirely, so perhaps it's unfair perhaps for us to even review it in this space.  (2) It's experimental, and they state that.  (3) They are wisely phasing it out.  What was the problem with that feature?  Well, it gave you the list of countries of people who have the most matches with you.  Let's say for example you are half Italian, half Polish (a common mix in Chicago).  In other parts of Chicago, another common mix is half Polish, half Irish.  For whatever reason, people of Irish heritage have tested themselves at far greater numbers than the others. Your Polish DNA would overlap (match) with the people who reported they were half Polish, half Irish.  And this feature would then tell you that "a high percentage of the people who have DNA similar to yours are from Ireland."  Do you understand?  It's a huge problem, especially for smaller populations, especially because so many Americans are now half this, half that.  It's just not that edifying then.

23andme also suffers from the same sample issues as many of the other ancestry calculators.  For example, 85% of Italian Americans (TRANSLATION: potential customers, since most people who test are from Britain or the US) hail from just 3 regions in the deep south of Italy: Campania (Naples), Calabria, and Sicily.  Yet the population samples that most of these websites use are from Tuscany.  Even though Dante tried to meld them, Tuscans are not Sicilians and vice-versa. 

Often, these calculators when they see Sicilian or rural Southern Italian genes, they, in effect, say: we don't know what you are! you are kind of Italian but you also resemble, a little bit, people from Cyprus or Jews.  So they give an odd result.  And then you have someone tested who says, "I might be Jewish."  No.  The answer is that your people were not included in the data-set by which the baseline was developed.  If they were, the calculator would recognize you as a run of the mill Sicilian.

All online ancestry alculators also suffer from lack of inter-operability and non-standardized terms.  For example, among the calculators on Gedmatch, some use the term "Caucasian" to mean "generalized European" (which is how it used in common parlance, of course).  Others use it to mean, the specific, like, from Soviet Georgia, Armenia, etc.

Here's the bottom line: don't expect any ethnicity or ethnic-origins calculator to be 100% correct.  Don't expect new insights if you have confirmed records.  In other words, if you look just like your dad (you're not a bastard), and you're not adopted, and you have records going back centuries -- why do you need an ethnicity calculator to begin with?

These admixture tests can help if you were adopted, and want to have a sense of where to start.  But keep in mind, the largest plurality of Americans come from German heritage, and yet the best currently cannot identify German DNA 92% of the time.

Avoid the mythology and those who oversimplify.  There are reliable sources out there in genetic genealogy, like Debbie Kennett -- and there are a lot of charlatans.  Be careful whenever someone oversimplifies to the point of exaggeration, falls into stereotypes, or tells you what you want to hear.  With DNA as with everything, the most parsimonious answer is often the best.  The exotic is often wrong.

As the science improves, you can't go wrong using the Standard or Conservative setting on the 23andme Ancestry Composition test.

READ MORE:



 

Sunday, August 30, 2015

The Top Ten Myths of Genetic Genealogy, Archaeogenetics, and DNA Testing (10 through 7)

Any scientist visiting the websites or online forums of Eupedia, Anthrogenica, or Apricity (to name a few) is mortified.  The amount of shorthand claims, pseudo-science, pop-anthropology, and myths perpetuated there are truly astonishing, and quite sad.  Below we list the Top Ten myths of this world.  We will update the post over time to link to specific offenders, so you can share the laughs we shared.

Don't be an idiot.  Learn these myths, and for the love of all things holy, don't propagate them!

10.  If you are of Scandinavian heritage (Denmark, Norway, Sweden), you are a "Viking."  

Example post: "my gma is half Swedish and I am very adventurous; must be the Viking LOL."  

Vikings were the marauders sailing from Scandinavia who invaded many parts of Europe during the years of approximately 600 AD - 1200 AD.   Those of Scandinavian blood are emphatically NOT "Viking."  The Vikings were the adventurous ones who left.  Scandinavians are descended of the ones who stayed home.  

While Scandinavians may share common origin with the Vikings dating back 1500 years, technically it's not correct to say they are descended from them.  And to the extent there is a gene for adventure-seeking, violence, or the so-called, "warrior" gene, it's more probable that the ones who stayed in Scandinavia (as fishermen and barley farmers) do NOT have that gene.

Many Russians, Ukrainians, English, Scots, Calabrians, Sicilians, and Northern French have a better claim to be "directly descended from Vikings."  Sorry.



9.  You can determine by a test on Eurogenes or Gedmatch the precise percentages of EEF-ANE-WHG that you are.

For the uninitiated, these acronyms stand for "Early European Farmer," "Ancient North Eurasian," and "Western [European] Hunter Gatherer."

Example post: "Username: SteppeOverlord  EEF: 21.345%, ANE: 19.876% WHG 58.779."

It's important to note that these hypothetical populations were reconstructed from...ONE SAMPLE EACH.  Thus, when you take the Eurogenes EEF ANE WHG test, you are comparing yourself to each of three skeletons: the EEF is the LBK sample found in Stuttgart, Germany.  The ANE is the Mal'ta boy found in Siberia.  The WHG is the Loschbour skeleton found in Belgium.  Citation.

These populations were themselves admixed, especially the Stuttgart sample.  It's not accurate to use one exemplar to represent an entire group, especially ones with the huge geographical ranges of the acronym populations.  It's much more accurate to say that you tested whatever percentage in common with Loschbour, Mal'ta, or Stuttgart.  

 Many of the genes inherited so many generations ago will be the result of identical by state, (more or less coincidence, or breeding back, in a way), than Identical By Descent.  Citation.  Europeans are a homogenous lot, and these tests don't therefore reveal much, if anything, and the terminology, turned to shorthand, stinks.


8.  Admixture percentages are due to a historical event.

Example post: "OMG!  I am English, Irish, German, and Polish.  But Dodecad says I have 6% Siberian; this must prove the legend in my family that my great-grandmother was a Cherokee princess!"

Or:

"I am South Italian.  But Eurogenes says I have 12% southwest Asian.  Must be the Greek blood!"

People tend to overestimate historical events (i.e., those we know about due to past events being recorded in writing), but tend to underestimate non-historical events.  This is a recent-ness bias that comes from a little knowledge about history, often expressed in shorthand, (i.e., South Italy was Greek).

It is however, almost always not true.  In the first example above: many Europeans, especially Northern Europeans, test positive for some Siberian/ANE/even Native-American-like ancestry, but this is almost certainly the result of ancient Admixture from the first Indo-Europeans from the steppe, who had substantial Asian-like ancestry.  For the second example: the people who populated Italy in prehistoric times were descended in many cases from the first farmers, who came from the southeast fringes of Europe.  Such signals in modern ancestry are way more likely to indicate ancient admixture from population sources with common ancestry to historical populations.

Sorry, but the boring is almost always more true than the interesting.



7.  People from places with many years of recorded history are more admixed than people with less history.

example post: "If you are of South Italian ancestry, you're probably part Roman, Greek, Scandinavian, Arab, and Jewish."

This one is so obvious it is painful to have to post.  But it's the corollary of number 8 above: a little historical knowledge being dangerous.

Imagine two regions: Region 1 is fairly remote, but has had extensive writing for 2600 years, and every marauder, political shift, kingdom, invasion, battle, language spoken, and petty dukedom is recorded in glorious detail.  Imagine another region, Region 2, that has had extensive writing and civilization for only about 1100 years.  There are large gaps in knowledge of what happened there, because of the lack of historians.

I just described Basilicata, Italy and Hesse, Germany.  Yet so many online "mytholographers" perpetuate the notions that people like Italians, Jews, and Greeks (i.e., those with 25+ centuries of intense recorded history) are more admixed than those without such extensive documentation (i.e., Germans, French, etc.)

You can't escape this, on any online forum, people speculating on exotic sources in Italian ancestry, and almost no one does this for Germans and French.

Just because we don't know who was invading another area during prehistory or the Dark Ages, does it mean it didn't happen?  Just because we don't know the name of the king who pillaged a territory, does it make him any less historical?  Because there is no Trojan War story for Hesse, Germany, does it mean there was no warfare, invasion, or exotic influences?

The French and Germans are so "admixed" (i.e., generic European) that 23andme cannot identify their DNA 92% of the time.  Citation.  Yet the poor Greeks have to tolerate in every discussion, excruciating detail and speculation about every single exotic strain in their blood.

Aside from the remotest, hard-to-get-to, isolated regions of Europe (Finns, Northwest Irish, Basques, and Sardinians), everyone has been invaded, repeatedly, and everyone is very very admixed.  The paradigm, of focusing only on certain peoples for this, has to change, because it's simply not accurate.

Check back soon for the rest of the Top 10 list.