Saturday, January 30, 2016

In Praise of Roberta Estes and

In a world of pseudo-science and echo chambers, a few blogs stick out for being mostly in touch with reality.  In the world of Ancient DNA, Dienekes, although less active than before, has pioneered much in the field of DNA, and still has many serious scientists who comment there.

In the world of DNA for Genealogy, one blog sticks out.  It is Roberta Estes'  Of all the blogs and websites dedicated to disseminating information about DNA, hers is consistently factual, science-based, and yet easy to understand. 

This scientist came across a few of her posts, and I daresay they are mandatory reading for anyone seeking a better understanding of their DNA.  Below are links and highlights:

Step 1:  Creation of the underlying population data base.
Don’t we wish this was as simple as it sounds.  It isn’t.  In fact, this step is the underpinnings of the accuracy of the ethnicity predictions.  The old GIGO (garbage in, garbage out) concept applies here. . . .

The third way to obtain this type of information is by inference.  Both and 23andMe do some of this.  Ancestry released its V2 ethnicity updates this week, and as a part of that update, they included a white paper available to DNA participants.  In that paper, Ancestry discusses their process for utilizing contributed pedigree charts and states that, aside from immigrant locations, such as the United States and Canada, a common location for 4 grandparents is sufficient information to include that individuals DNA as “native” to that location.  Ancestry used 3000 samples in their new ethnicity predictions to cover 26 geographic locations.  That’s only 115 samples, on average, per location to represent all of that population.  That’s pretty slim pickins.  Their most highly represented area is Eastern Europe with 432 samples and the least represented is Mali with 16.  The regions they cover are shown below. . .

No matter which calculations you use relative to acceptable Margin of Error and Confidence Level, Ancestry’s sample size is extremely light. . . .

"having Haplogroup Origins and Ancestral Origins indicating Native American ancestry does not necessarily mean you are Native American or have Native American heritage. This is a very pervasive myth that needs to be dispelled. . . .

The good news is that more and more people are DNA testing.  The bad news is that errors in the system are tending to become more problematic, or said another way, GIGO – Garbage in, Garbage Out.


There are a very limited number of major haplogroups that include Native American results.  For mitochondrial DNA, they are A, B, C, D, X and possibly M.  I maintain a research list of the subgroups which are Native.  Each of these base haplogroups also have subgroups which are European and/or Asian.  The same holds true for Native American Y haplogroups Q and C.
In the Haplogroup Origins and Ancestral Origins, there are many examples where Non-Native haplogroups are assigned as Native American, such as haplogroup H1a below.  Haplogroup H is European...

One of the problems we have today is that because there are so many people who carry the oral history of grandmother being “Cherokee,” it has become common to “self-assign” oneself as Native.  That’s all fine and good, until one begins to “self-assign” those haplogroups as Native as well – by virtue of that “Native” assignment in the Family Tree DNA data base.  That’s a horse of a different color.

Monday, January 25, 2016

Calculating Matches on Gedmatch: Why CentiMorgans (cM) are more important than SNPs

I have discovered that very very very few people know this, so it is worth posting.

The different testing companies, 23andme, Ancestry, FTDNA, etc. all test slightly different SNPs.  In other words, the "points" on the genome, the "genes" that are tested vary from company to company.

I have seen some people on Gedmatch dismiss a match because "it doesn't have enough SNPs."  Or because "it's not above the SNP threshold."

Gedmatch itself uses a 7 cM and 700 SNPs match to qualify someone as a cousin.

The SNP part is faulty thinking.

Because the testing companies don't test the same SNPs, you can have long stretches that match with a low number of SNPs.

Case in point: Someone who tested on 23andme like I did matched me for 10.0 cM and 1024 SNPs.  That same person on FTNDA matched me for 10.0 cM but just 510 SNPs.  FTDNA tested half of the SNPs that 23andme did (or half of the same set). 

This is key to grasp.  Expect closer matches to you on Gedmatch if your kits start with the same letter (i.e. M for 23andme, F for FTDNA, and A for Ancestry.)  DO NOT DISMISS LOW SNP MATCHES.