This is a brief tutorial on how one goes about demonstrating the existence of a cryptic centromere on human chromosome 2. It is in response to this point from Jeff Tomkins:
“The purported cryptic centromere on human chromosome 2, like the fusion site, is in a very different location to that predicted by a fusion event.”
So, first of all I need to mention that Jeff Tomkins implicitly admits that there is such a putative centromere, but his objection is that it is not where it should be. Nevertheless I’ll show you how to find it and then show that it is where it is expected it to be.
So what are we looking for?
“The DNA evidence in question is based on the fact that human, great-ape, and other mammalian centromeres are composed of a highly variable class of DNA sequence that is repeated over and over called alpha-satellite or alphoid DNA. Alphoid DNA, although found in centromeric areas, is not unique to centromeres and is even highly variable between homologous regions throughout the same mammalian genome.”
So basically what we are looking for is a large cluster of these alphoid sequences. As Tomkins states, alphoid sequences are not unique to centromeres, but we shouldn’t find large clusters elsewhere on the chromosome.
So let’s get a list of all the alphoid sequences that we can find on chromosome 2:
[glenn@macha] cat alphoid.fa >gi|117911456|emb|CS444613.1| Sequence 51 from Patent WO2006110680 CATTCTCAGAAACTTCTTTGTGATGTGTGCATTCAACTCACAGAGTTGAACCTTCCTTTTCATAGAGCAG TTTTGAAACACTCTTTTTGTAGAATCTGCAAGTGGATATTTGGACCGCTTTGAGGCCTTCGTTGGAAACG GGAATATCTTCATATAAAAACTAGACAGAAG
… and then …
[glenn@macha] blastn -query alphoid.fa -subject /Users/glenn/Data/hg19/chr2.fa -outfmt '10 sstart send pident nident length evalue' -out alphoid.csv -task blastn -dust no -soft_masking false -word_size 7 -evalue 1e-30
This command will search chromosome 2 for anything that looks like an alphoid sequence, and write the results to a file named alphoid.csv, and this is what the file looks like after it has been sorted:
70658558,70658701,86.806,125,144,1.10e-42 92272684,92272854,84.211,144,171,1.74e-46 92272855,92273025,88.304,151,171,2.95e-56 92273026,92273194,80.117,137,171,4.38e-35 92273195,92273363,85.294,145,170,1.74e-46 92273366,92273535,80.588,137,170,8.46e-38 92273537,92273707,85.380,146,171,3.37e-49 92274458,92274566,89.091,98,110,6.51e-33 92274567,92274738,83.721,144,172,7.42e-45 92274739,92274909,85.380,146,171,3.37e-49 92274910,92275079,84.706,144,170,1.43e-47 92275081,92275250,85.965,147,171,9.65e-50 ... ... ...
That first field (“sstart” in our command above) is where the matching DNA starts on chromosome 2. So if you look at the file in its entirety, you’ll see that there are 483 matches for this alphoid sequence across chromosome 2, and the vast majority – all but 2 of those 483 matches – are clustered around two locations.
The first location is around the 92Mb mark – and this corresponds to the beginning of the active centromere; the second location is around the 133Mb mark.
Could this be our centromere?
Well it certainly is a cluster of alphoid sequences, but it is in the right place? Let’s have a look at the genes either side of this cluster:
What you should be looking at here are all the genes that precede the cryptic centromere (from PLEKHB2 down to ANKRD30BL) and their corresponding position on chimpanzee chromosome 2B. Now a couple of the corresponding chimpanzee genes are found on scaffolds (the ones beginning with AACZ or GL), but for the genes that have been placed on the chromosome, you can see that they are all around the 132Mb mark.
For the genes on the other side of the cryptic centromere (GPR39 and LYPD1) you’ll notice that the corresponding genes on chimpanzee chromosome 2B are found near the 136Mb mark.
And what pray tell is in that gap between 132Mb and 136Mb on chimpanzee chromosome 2B? The centromere!
- On human chromosome 2 there are two clusters of alphoid sequences.
- One of those clusters is the current active centromere.
- The other cluster corresponds well to the centromere on chimpanzee chromosome 2B.
I’m gonna say it’s our cryptic centromere …