This is the second post in a short series in response to an article by Tomkins and Bergman in their ongoing effort to downplay the genetic similarity between humans and chimpanzees. The subject of this post is a paper by Ingo Ebersberger published in 2002, in which they report a sequence difference (excluding indels) of 1.24%.
But let’s first look at what Tomkins and Bergman say about this paper:
“Researchers selected two-thirds of the total sequence for more detailed analyses. One-third of the chimp sequence would not align to the human genome and was discarded. […] Not surprisingly, they report only a 1.24% difference in only highly similar aligned areas between human and chimp. A more realistic sequence similarity based on the researchers’ own numbers for discarded data in the alignments alone is not more than 65%.”
Tomkins and Bergman seem to be taking this one-third of chimp sequence that supposedly does not align – thereby giving a maximum identity of 66.6%, and then knocking off the 1.24% substitution rate to get their 65% approximation.
Now, let’s have a look at what the paper actually says:
“Twenty-eight percent of the total amount of sequence was excluded from the analysis, since the entire sequence, or parts of it, displayed more than one match in the human genome that was not due to known families of repeated sequences. For 7% of the chimpanzee sequences, no region with similarity could be detected in the human genome.”
More than one match? Really? But I thought Tomkins and Bergman said it would not align at all? Surely an honest mistake by these two competent researchers.
But what about the 7% that legitimately couldn’t find a match?
I think I have a reasonable explanation for that. In the Materials and Methods section, Ebersberger mentions that they are using a “draft version of the human genome (freeze August 6, 2001)”.
It’s hard to find solid data on that version of the human genome (known as hg8) but as best as I can tell, it was somewhere between 90% and 95% complete. So when the Ebersberger paper was published, around 5%-10% of the human genome wasn’t yet complete. It’s therefore not surprising that a sizeable percentage of the chimpanzee sequences could not be found.
Can we compare these chimp sequences to GRCh38?
Unfortunately, no. I contacted Ingo Ebersberger in late 2014 and asked if he still had the data:
“I don’t think that I will be able to find the original data and no other co-author will have them.”