Some of us a few years back started to decry the ever-ongoing ISOGG
renaming process, which coupled with the discovery of new subclades,
meant that one year, someone might be deemed
R1b1b1a2bab2ba11babd12ba2b1c, and the next year
R1b1b2bab2f1faf1fafaf1f1f1a.
People started saying that it would
probably be better to say the first couple letters and the major
terminal SNP. For example, R1b-U106 or I2-M26. This was logical and
good. Unlike the terminology, the SNPs never change. And they're
shorter to write.
Here I humbly propose a new terminology for ancient autosomal samples. I
think picking terms like, "WHG" was a mistake, and now that I read about EHG and CHG, I
really think so. For the uninitiated, these acronyms stand for "Western
Hunter Gatherer," "Caucasus Hunter Gatherer," etc.
People compare their modern genomes, or the genomes of modern
populations or ethnic groups, to these ancient samples. And then they
use the shorthand, like, "Scottish average 19% CHG." This is highly
misleading.
Let me give the reasons why I think it is deficient, and tell me if you disagree.
1. As we get more samples over time, it will be hard to keep renaming
the different samples, if they form a different component. We just saw
this with the recent CHG finds. Imagine if we find a detectable signal
of ancient genes from Iberia. What will we call that component?
"Really Western Hunter Gatherer?"
2. The shorthand is deeply misleading (i.e., "Scottish are 19% CHG.") This to me is the most important point. Most
people reading this are experts. But I see on so many other boards
people who seem to think that some scientist somewhere took a survey of a
bunch of ancient samples, "averaged" it, and that we are comparing
populations to populations.
We're not. We are not comparing Scots to Western Hunter Gatherers. We
are comparing Scots (or any other modern individual or group) to ONE SAMPLE. For WHG, it's Loschbour. For EEF, it's Stuttgart. For ANE, it's Mal'ta. Etc.
3. We don't know that that one sample will turn out to be representative of
"Western Hunter Gatherers" any more than we know that taking Danny
Devito or the harlequin model Fabio is a representative of a modern
Italian. Indeed, as the number of samples we get grows, we know the
situation is infinitely more complex.
We all remember, for example, when the first farmers sampled had very
unique mtDNA. For a while, people tried to read too much into it. "OMG,
what if all farmers bore this odd mtDNA?" was the refrain. But it
turned out to be a one-off. This can and will happen again and again as
we get more samples over time.
4. The acronyms will get repetitive real fast. We are talking
about aDNA, remember? Before farming, the whole world were hunter
gatherers. So, many (most) aDNA samples will eventually have -HG after
them, if we follow the current convention.
I imagine a world where we have found 26 slightly different hunter
gatherer samples, and thus we have one different -HG for every letter in
the alphabet! That'd be just silly.
For these reasons, but primarily numbers 2 and 3, I think the current
practice is misleading and doomed to failure. Europe is a very
complicated place. We will find ancient samples with very unique
genomes, which are detectable in modern populations. They will all be
slightly different from one another, because one sample is, well, one
sample... It is highly misleading to say that "John Smith..." or
"Estonians are more Western Hunter Gatherer than..." because we have not
sampled all, most, or even many Western Hunter Gatherers. (I don't
mean to pick on WHG. This applies equally, indeed MORE, with EEF and
ANE!)
So, what is the solution?
I think if we purport to be scientific, we need to speak with scientific precision.
If an individual or a modern population bears resemblance to an ancient
genome, we should state that it has a percentage similarity to that one
sample. And not try to make it more than it is, like the very official
and extensive term like, "Eastern Hunter Gatherers."
As for the sample, we should also include the year discovered, the situs
of the discovery, and the years Before Present (BP).
Remember, many of these sites are
caves where there have been and will be more discoveries. In other
words, I expect there will be many more Loschbours, more Stuttgarts,
etc., and it will get quite confusing unless we speak with specificity
about when something was discovered and when in time it came from.
Let's avoid a situation like we had with terms like R1b1b1b1a2a1b2bc3d,
which lose meaning. Let's refer to things with scientific precision.
Examples:
Instead of, "Scots are 19% Ancient North Eurasian."
SAY: "On average 19% of the genes of the modern Scottish population match 2013Mal'ta-24,000BP."
Instead of, "Southern European populations have a lot more CHG blood than I expected."
SAY: "Southern European populations bear many genes matching 2015Kotias-10,000BP."
Instead of, "Sardinians are 45% WHG."
SAY: "Approximately 45% of the genes in the modern Sardinian population resemble 2013Loschbour-6000BP."
This convention is much more accurate.