Chromosome 2 Fusion – The Cryptic Centromere

This is a brief tutorial on how one goes about demonstrating the existence of a cryptic centromere on human chromosome 2. It is in response to this point from Jeff Tomkins:

“The purported cryptic centromere on human chromosome 2, like the fusion site, is in a very different location to that predicted by a fusion event.”

New Research Undermines Key Argument for Human Evolution

So, first of all I need to mention that Jeff Tomkins implicitly admits that there is such a putative centromere, but his objection is that it is not where it should be. Nevertheless I’ll show you how to find it and then show that it is where it is expected it to be.

So what are we looking for?

“The DNA evidence in question is based on the fact that human, great-ape, and other mammalian centromeres are composed of a highly variable class of DNA sequence that is repeated over and over called alpha-satellite or alphoid DNA. Alphoid DNA, although found in centromeric areas, is not unique to centromeres and is even highly variable between homologous regions throughout the same mammalian genome.”

So basically what we are looking for is a large cluster of these alphoid sequences. As Tomkins states, alphoid sequences are not unique to centromeres, but we shouldn’t find large clusters elsewhere on the chromosome.

BLAST away!

So let’s get a list of all the alphoid sequences that we can find on chromosome 2:

[glenn@macha] cat alphoid.fa
>gi|117911456|emb|CS444613.1| Sequence 51 from Patent WO2006110680

… and then …

[glenn@macha] blastn -query alphoid.fa -subject /Users/glenn/Data/hg19/chr2.fa
 -outfmt '10 sstart send pident nident length evalue' -out alphoid.csv
 -task blastn -dust no -soft_masking false -word_size 7 -evalue 1e-30

This command will search chromosome 2 for anything that looks like an alphoid sequence, and write the results to a file named alphoid.csv, and this is what the file looks like after it has been sorted:


That first field (“sstart” in our command above) is where the matching DNA starts on chromosome 2. So if you look at the file in its entirety, you’ll see that there are 483 matches for this alphoid sequence across chromosome 2, and the vast majority – all but 2 of those 483 matches – are clustered around two locations.

The first location is around the 92Mb mark – and this corresponds to the beginning of the active centromere; the second location is around the 133Mb mark.

Could this be our centromere?

Well it certainly is a cluster of alphoid sequences, but it is in the right place? Let’s have a look at the genes either side of this cluster:


What you should be looking at here are all the genes that precede the cryptic centromere (from PLEKHB2 down to ANKRD30BL) and their corresponding position on chimpanzee chromosome 2B. Now a couple of the corresponding chimpanzee genes are found on scaffolds (the ones beginning with AACZ or GL), but for the genes that have been placed on the chromosome, you can see that they are all around the 132Mb mark.

For the genes on the other side of the cryptic centromere (GPR39 and LYPD1) you’ll notice that the corresponding genes on chimpanzee chromosome 2B are found near the 136Mb mark.

And what pray tell is in that gap between 132Mb and 136Mb on chimpanzee chromosome 2B? The centromere!

To recap

  1. On human chromosome 2 there are two clusters of alphoid sequences.
  2. One of those clusters is the current active centromere.
  3. The other cluster corresponds well to the centromere on chimpanzee chromosome 2B.

I’m gonna say it’s our cryptic centromere …


8 thoughts on “Chromosome 2 Fusion – The Cryptic Centromere

    1. Hey it’s Eugene!! The *EXACT* position is given in the link with the label “the file in its entirety”. It’s around the 133Mb mark in hg19. You can then go look up the genes surrounding it on whatever browser you prefer.

      1. I looked at the file and it’s very interesting that the region you propose to be the cryptic centromere is inside a protein coding gene, ANKRD30BL. How do you suppose a centromere becomes a protein?

    2. From the transcript table on ensembl:

      #202 – 14kb, “Protein coding”, does NOT span the centromere
      #201 – 14kb, “Nonsense mediated decay”, does NOT span the centromere
      #204 – 110kb, “No protein”, DOES span the centromere
      #203 – 5kb, “No protein”, does NOT span the centromere
      #205 – 8kb, “No protein”, does NOT span the centromere
      #206 – 14kb, “No protein”, does NOT span the centromere

      Which one of these transcripts would you like to discuss further?

    1. You mean the promoter region / transcription start site? That will be different depending on which transcript you’re looking at. So again, which transcript are we discussing?

  1. I forgot how difficult it was to get straight answers from you and I have way too many questions for you to answer them with yet more questions. Finding my own answers is way easier.

    1. Ell. Oh. Ell.

      You want to know how a transcript that DOES code for a protein but does NOT span the centromere … has a centromere in it?

      Or you want to know how a transcript that DOES span the centromere, but does NOT code for a protein … codes for a protein?

      We both know what just happened here 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s