Tomkins Human-Chimp DNA – Britten 2002

Way back in 2012, Jeff Tomkins and Jerry Bergman teamed up to “re-evaluate” the published scientific literature related to the overall genetic difference between humans and chimpanzees. Their article can be found here:

http://creation.com/human-chimp-dna-similarity-re-evaluated

Their claim, in a nutshell, is that the results of many of these papers are overstated for various reasons, and Tomkins and Bergman have taken it upon themselves to correct the results.

Britten, 2002

One such study is by Roy Britten, in 2002, and can be found here:

http://www.pnas.org/content/99/21/13633.full

As you can see, Britten reported an overall DNA similarity of around 95.2% for his small sample, which was comprised of the only 5 completed Chimpanzee BACs in existence at the time.

In the paper, you will notice that Britten only reports on around 779kb of sequence, while Tomkins and Bergman quite rightly point out that the sum of the lengths of those BACs is around 846kb. Does the excluded sequence align to the human genome? Tomkins and Bergman say no, and use that to scale back the overall DNA similarity to 87.7%.

But did they actually check?

No. Of course they didn’t. But I did! Now, Britten doesn’t really give a good explanation for why some of the sequence was excluded, and unfortunately he passed away at the age of 92 not long after Tomkins and Bergman published their article.

So here are the alignment results from the nucmer/MUMmer software package when I compared the five BACs to the human genome.

AC006582 – 186,092bp

As you can see from the S2 and E2 columns, the entire length of the BAC aligns to the human genome.

    [S1]     [E1] |   [S2]   [E2] | [LEN 1] [LEN 2] | [% IDY] 
=========================================================================
20789281 20810917 | 186092 164450 |   21637   21643 |   98.05 
20813855 20821723 | 164445 156609 |    7869    7837 |   96.71 
20822132 20891846 | 156608  87016 |   69715   69593 |   98.06 
20891781 20896388 |  87017  82414 |    4608    4604 |   98.72 
67928585 67976610 |      1  48038 |   48026   48038 |   98.32 
67976609 68010902 |  48159  82411 |   34294   34253 |   98.46

You can also see from the S1 and E1 columns that this BAC matches to two locations on human chromosome 12, and Britten makes note of this in his paper. You could also check the synteny map from ensembl.org (reproduced below) clearly showing the relevant breakpoints.

britten-synteny

So what is the overall percentage identity of this BAC to the human genome? Well, first of all we need to work out the number of nucleotides that match. We do this by multiplying the percentage identity of each match (“% IDY”) by the length of that match (“L2”):

(21,643 * 98.05%) + (7,837 * 96.71%) + (69,593 * 98.06%) + ... = 182,545 nt

But do we use the overall length of the BAC as the divisor in our equation, or do we use the length of the human sequence that it spans? If we want to be as conservative as possible,  we’ll use the maximum of the two. As above, the length of the BAC is 186,092bp, while the combined length of the two syntenic regions of human chromosome 12 (columns S1 and E1) is 189,426bp.

Therefore our overall similarity for this BAC is 96.37%.

AC007214 – 154,685bp

    [S1]     [E1] |   [S2]   [E2] | [LEN 1] [LEN 2] | [% IDY] 
=========================================================================
20810108 20816825 | 147956 154685 |    6718    6730 |   97.50 
67987724 67997814 | 147926 137755 |   10091   10172 |   98.04 
67997818 68060869 | 137491  74530 |   63052   62962 |   98.45 
68060856 68124237 |  70532   7232 |   63382   63301 |   98.28 
68124247 68131497 |   7257      1 |    7251    7257 |   97.95

How many nucleotides did we match?

(6,730 * 97.50%) + (10,172 * 98.04%) + (62,962 * 98.45%) + ... = 147,841 nt

By looking at the S2 and E2 columns, you’ll note that there is quite a large indel – approximately 4,000 bases long, which Britten has already taken into account in his results. So, once again, the “missing” sequence that Tomkins and Bergman claim does not align, clearly does align. I wonder if we will see that trend continue?

To calculate our identity, what is the longer span of the two? The chimpanzee BAC is 154,685 bases, and this is longer than the corresponding sequence on the human chromosome, which is only 150,491 bp.

Therefore our overall similarity for this BAC is 95.58%.

AC097335 – 148,984bp

     [S1]      [E1] |   [S2]   [E2] | [LEN 1] [LEN 2] | [% IDY]
===========================================================================
122584064 122621080 |      1  37111 |   37017   37111 |   97.40 
122621388 122657017 |  37102  72697 |   35630   35596 |   97.56 
122657439 122694830 |  72687 110041 |   37392   37355 |   97.74 
122694662 122718458 | 110000 133714 |   23797   23715 |   97.51 
122722001 122737262 | 133712 148984 |   15262   15273 |   98.00

How many nucleotides did we match?

(37,111 * 97.40%) + (35,596 * 97.56%) + (37,355 * 97.74%) + ... = 145,476 nt

And for our maximum span in this case, it is the human spanning sequence – 153,199bp – that is longer than the chimpanzee BAC – 148,984bp.

Therefore our overall similarity for this BAC is 94.96%.

AC096630 – 160,603bp

    [S1]     [E1] |   [S2]   [E2] | [LEN 1] [LEN 2] | [% IDY] 
=========================================================================
23929738 23942644 |      1  12893 |   12907   12893 |   97.54 
23942646 23943694 |  13054  14094 |    1049    1041 |   97.34 
23944199 23988365 |  14096  58297 |   44167   44202 |   98.29 
23987821 23990092 |  59219  61499 |    2272    2281 |   96.15 
23990410 24090254 |  60891 160595 |   99845   99705 |   98.10

How many nucleotides did we match?

(12,893 * 97.54%) + (1,041 * 97.34%) + (44,202 * 98.29%) + ... = 157,039 nt

Now, this one needs a little adjustment. Notice that there is some overlap between S2 and E2 on the 4th and 5th results? We don’t want to double-count any nucleotide matches, so we need to scale this back a little. The weighted percentage identity of these matches is 98.0746% and there are 609 bp of overlap. We’ll reduce our nuleotide matches by 597, down to 156,442 nt.

Our longest span in this case is – by a narrow margin – the chimpanzee BAC at 160,603bp.

Therefore our overall similarity for this BAC is 97.41%.

AC093572 – 195,652bp

Well, this is our last BAC, and as we’ve seen, all the previous BACs have aligned completely to the human genome. I think we’re getting close to figuring out whether Tomkins and Bergman “re-evaluation” is valid! I’d give a spoiler alert, but the answer is literally the next thing you’ll see.

    [S1]     [E1] |   [S2]   [E2] | [LEN 1] [LEN 2] | [% IDY] 
=========================================================================
24117704 24126528 | 195644 186806 |    8825    8839 |   99.24 
24126542 24130967 | 186870 182455 |    4426    4416 |   99.30 
24133114 24222222 | 182455  93337 |   89109   89119 |   98.60 
24221913 24239932 |  92705  74808 |   18020   17898 |   97.56 
24239920 24251188 |  73900  62572 |   11269   11329 |   96.85 
24250869 24252922 |  61415  59363 |    2054    2053 |   98.54 
24253250 24270233 |  59964  42959 |   16984   17006 |   97.93 
24270373 24290547 |  42953  22803 |   20175   20151 |   98.17 
24290532 24312416 |  21904      1 |   21885   21904 |   98.40

How many nucleotides did we match?

(8,839 * 99.24%) + (4,416 * 99.30%) + (89,119 * 98.60%) + ... = 189,474 nt

Again in this case, we need to scale back that number of nucleotides because there is significant overlap (668bp) in the results. Our reduced nucleotide match is now 188,817 nt. The longer of the two spans is the chimpanzee BAC: 195,652bp.

Therefore our overall similarity for this BAC is 96.51%.

Conclusion

Surprise, surprise, Tomkins and Bergman didn’t bother to back up their assertions and they’ve been shown to be wrong. For completeness, here is the table that calculates the weighted average percentage identity of all five BACs.

BAC Matches MaxSpan Identity
AC006582        182,545        189,426 96.37%
AC007214        147,841        154,685 95.58%
AC097335        145,476        153,199 94.96%
AC096630        156,442        160,603 97.41%
AC093572        188,817        195,652 96.51%

Total

       821,121

       853,565

96.20%

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s