Month: May 2016

Chromosome 2 Fusion – Dead in a Day?

Ian Juby has stated explicitly that DDX11L2 is “critical for life” both here and here, where he refers to a paper written by Dr Jeffrey Tomkins in 2013.

But just how important is this DDX11L2 gene anyway? Well, we can get some clues just by looking at the name.

A rose by any other name?

Let’s break down the name of the gene itself – DDX – 11 – L – 2

DDX is short for “DEAD Box” – which is an RNA Helicase gene. Helicase genes are incredibly important, since these are the machines that unwind the DNA or RNA so that other molecular machines can read the underlying information.

There is an enormous family of DDX genes across our genome – DDX1, DDX2A, DDX2B, DDX3X, DDX3Y, DDX4, DDX5, DDX6, DDX10, DDX11, DDX17, DDX18, DDX19A, DDX19B, DDX20, DDX21, DDX23, DDX24, DDX25, DDX27, DDX28, DDX31, DDX39A, DDX39B, DDX41, DDX42, DDX43, DDX46, DDX47, DDX48, DDX49, DDX50, DDX51, DDX52, DDX53, DDX54, DDX55, DDX56, DDX58, DDX59 and DDX60.

Yup, that’s a grand total 41 DEAD Box Helicase protein-coding genes in the human genome.

DDX11 is just one of those 41 DEAD Box Helicase protein-coding genes. There’s nothing obviously special about DDX11 that makes it stand out from all the other DDX genes.

L stands for “Like”. Back in 2009, Valerio Costa reported on a transcripts family with 18 members whose nucleotide sequence bears a strong resemblance to DDX11. In other words, it looks “Like” DDX11, but it does not code for a protein – it’s a pseudogene. Note that all of these sequences are found in subtelomeric regions. Except for one. Can you guess which one?

2. Yup, DDX11L2 isn’t found near telomeres in the modern human genome, it’s all by itself in the middle of chromosome 2 – make of that what you will. To break things down even further, there are actually two transcripts for this pseudogene – NR_024004.1, which is the longer of the two transcripts, and straddles the putative fusion site – and NR_024005.2, the shorter transcript which does not cross the fusion site. Even if we hypothetically split chromosome 2 at the fusion point, the DDX11L2 pseudogene would still exist. All we would lose is this longer transcript.

So just to recap, what we are talking about here is a solitary alternately-spliced transcript from a pseudogene that is fairly similar to 17 other pseudogenes, and those pseudogenes as a whole bear some resemblance to an actual protein-coding gene – DDX11 – which is itself part of a larger family of 41 genes.

Now we have a rough idea of where this DDX11L2 pseudogene sits in the scheme of things, let’s talk hard numbers.

Droppin’ English. Express Yourself.

Gene expression data is often a good guide to the relative importance of a particular gene or transcript. If you had a bacterial infection, would you go to the doctor to get some antibiotics (with enough penicillin molecules to go around killing off all the bacteria) or would you go the homoeopathic option, where the solution has been diluted so many times that there is virtually no active ingredient left?

So, let’s look at how frequently DDX11L2 is expressed. If If you click on this link you’ll see that cells in the testes express DDX11L2 at a rate of 3.905 RPKM (Reads per Kilobase per Million mapped reads), in the pituitary gland at 0.936 RPKM, in the prostate at 0.893 RPKM and in the spleen at 0.882 RPKM.

If you then go up a step and look a the expression data for DDX11 itself, you’ll see that in the testes it is expressed at a rate of 8.477 RPKM, in the pituitary gland at 8.068 RPKM, in the prostate at 9.212 RPKM and in the spleen at 10.047 RPKM.

Already we can see that the DDX11 gene expression levels dwarf those of DDX11L2, but don’t forget we have 40 other DDX genes being transcribed as well! Let’s look at some other DDX genes:

                  DDX1    DDX3X    DDX5    DDX6    DDX11  DDX11L2
Testes           33.980  50.016  138.808   8.816   8.477    3.905
Pituitary Gland  35.394  34.213  321.470   7.100   8.068    0.936
Prostate         24.214  36.621  240.220   9.998   9.212    0.893
Spleen           23.889  50.052  347.019  10.811  10.047    0.882

Wow, DDX5 is expressed almost 400 times more often in the spleen that DDX11L2!

But remember how I said there were two transcripts? Well, the expression data above are from GTEx, and according to the locus given for DDX11L2, it only gives data for the shorter transcript – the one that does not overlap the fusion site.

If we look at AceView we can actually get a breakdown of how frequently the introns are sequenced in RNA-seq studies. The intron that corresponds to the fusion site was sequenced 682 times, while the intron common to both transcripts was sequenced a total of 3,186 times, implying that the longer transcripts make up only around 21.4% of the total DDX11L2 transcripts.

So, what are we to make of these numbers?

When Tomkins claims that DDX11L2 is a “highly expressed gene“, we have to ask “highly expressed relative to what exactly?” According to the AceView link above, this gene is expressed at “only 26.8% of the average gene“. If you then take into account the fact that the transcript that spans the fusion site makes up only 21.4% of transcripts for this gene, then it is expressed only 5.7% as frequently as an average gene.

If you were to hypothetically split this chromosome in half at the fusion site, you wouldn’t be “dead in a day” as Ian Juby likes to say, you would just lose a very lowly expressed transcript of a pseudogene. That pseudogene is part of a family of 17 other similar pseudogenes (DDX11L), which as a group, bear some resemblance to an actual protein-coding gene (DDX11). That protein-coding gene is then part of a much larger group of protein-coding genes (DDX).

Chromosome 2 Fusion – The Low Hanging Fruit

Jeffrey Tomkins has written a number of articles in previous years that attempt to cast doubt on the claim that human chromosome 2 is the result of a head-to-head fusion of two ancestral chromosomes.

For the sake of brevity, I will address only a few of the more egregious errors that Dr Tomkins made in his articles; I will address the others when I have the time and the inclination.

Comparative Scale, huh.

In this article Dr Tomkins posts a diagram of the fusion supposedly drawn to scale. The desired effect here is obviously to have people believe that the human chromosome 2 doesn’t align to its chimpanzee counterparts. Here is Dr Tomkins’ diagram:

TomkinsToScale

And here is my diagram:

FusionToScale

The difference is that the PostScript code used to produce my diagram is freely available, and from that you are able to verify the genome coordinates I have used.

Dr Tomkins also claims here that the combined chimpanzee chromosomes are some 10% larger than the human chromosomes. However, according to the most recent chimpanzee assembly – named “panTro4” – the combined length of chimpanzee chromosomes 2A and 2B is 247.5Mbp while human chromosome 2 is 243.2Mbp. This is a difference of only 4.3Mbp, or 1.8%. It should be noted that the centromeres in the chimpanzee assembly are manually placed on the chromosome and are of an arbitrary fixed length of 3Mbp. This introduces some uncertainty in the true length of combined chimpanzee chromosomes. Human centromeres are known to range in length from 0.3Mbp to 5.0Mbp, and if the centromere on chimpanzee chromosome 2B is (in reality) at the lower end of that range, then the size difference would be easily less than 1%.

What should we see at the fusion site?

Tomkins predicts that under the fusion model “thousands of intact TTAGGG motifs in tandem should exist” yet he is fully aware that the function of telomeres is to prevent such fusions in the first place. To expect thousands of intact telomere motifs at the fusion site is to expect that intact telomeres somehow failed.

The most parsimonious explanation is that the telomeres were already missing (or severely shortened) and this allowed the fusion to occur. Lab experiments where components of the telomere nucleoprotein complex have been disabled demonstrate the ease at which head-to-head fusions occur when the telomeres are depleted.

Moteefs y’all

Jeffrey Tomkins says explicitly that forward telomere motifs (‘TTAGGG’) should only be found on the left side of the fusion site, and reverse telomere motifs (‘CCCTAA’) should only be found on the right of the fusion site.

Yet by my calculations, any 6 base pair sequence should occur entirely by chance approximately every 4,096 base pairs (4 ^ 6 = 4,096). Dr Tomkins produced a table showing the breakdown of forward and reverse motifs found both to the left and to the right of the fusion site.

On the RP11-395L14 BAC used by Tomkins, there are 108,569 base pairs to the left of the fusion site. Mathematically, I would expect 26 reverse telomeres to be present while Dr Tomkins expects zero – there are 18. To the right of the fusion site are 68,167 base pairs. I would expect 17 forward telomere motifs, Dr Tomkins expects zero – there are 18.

In your hands

As indicated by the title, these are just the low-hanging fruit; claims made by Tomkins that can be addressed quite easily. Some of the other claims are a little more complex and/or take more time to address. If there is a particular claim that anyone would like me to address, please let me know in the comments.