But just how important is this DDX11L2 gene anyway? Well, we can get some clues just by looking at the name.
A rose by any other name?
Let’s break down the name of the gene itself – DDX – 11 – L – 2
DDX is short for “DEAD Box” – which is an RNA Helicase gene. Helicase genes are incredibly important, since these are the machines that unwind the DNA or RNA so that other molecular machines can read the underlying information.
There is an enormous family of DDX genes across our genome – DDX1, DDX2A, DDX2B, DDX3X, DDX3Y, DDX4, DDX5, DDX6, DDX10, DDX11, DDX17, DDX18, DDX19A, DDX19B, DDX20, DDX21, DDX23, DDX24, DDX25, DDX27, DDX28, DDX31, DDX39A, DDX39B, DDX41, DDX42, DDX43, DDX46, DDX47, DDX48, DDX49, DDX50, DDX51, DDX52, DDX53, DDX54, DDX55, DDX56, DDX58, DDX59 and DDX60.
Yup, that’s a grand total 41 DEAD Box Helicase protein-coding genes in the human genome.
DDX11 is just one of those 41 DEAD Box Helicase protein-coding genes. There’s nothing obviously special about DDX11 that makes it stand out from all the other DDX genes.
L stands for “Like”. Back in 2009, Valerio Costa reported on a transcripts family with 18 members whose nucleotide sequence bears a strong resemblance to DDX11. In other words, it looks “Like” DDX11, but it does not code for a protein – it’s a pseudogene. Note that all of these sequences are found in subtelomeric regions. Except for one. Can you guess which one?
2. Yup, DDX11L2 isn’t found near telomeres in the modern human genome, it’s all by itself in the middle of chromosome 2 – make of that what you will. To break things down even further, there are actually two transcripts for this pseudogene – NR_024004.1, which is the longer of the two transcripts, and straddles the putative fusion site – and NR_024005.2, the shorter transcript which does not cross the fusion site. Even if we hypothetically split chromosome 2 at the fusion point, the DDX11L2 pseudogene would still exist. All we would lose is this longer transcript.
So just to recap, what we are talking about here is a solitary alternately-spliced transcript from a pseudogene that is fairly similar to 17 other pseudogenes, and those pseudogenes as a whole bear some resemblance to an actual protein-coding gene – DDX11 – which is itself part of a larger family of 41 genes.
Now we have a rough idea of where this DDX11L2 pseudogene sits in the scheme of things, let’s talk hard numbers.
Droppin’ English. Express Yourself.
Gene expression data is often a good guide to the relative importance of a particular gene or transcript. If you had a bacterial infection, would you go to the doctor to get some antibiotics (with enough penicillin molecules to go around killing off all the bacteria) or would you go the homoeopathic option, where the solution has been diluted so many times that there is virtually no active ingredient left?
So, let’s look at how frequently DDX11L2 is expressed. If If you click on this link you’ll see that cells in the testes express DDX11L2 at a rate of 3.905 RPKM (Reads per Kilobase per Million mapped reads), in the pituitary gland at 0.936 RPKM, in the prostate at 0.893 RPKM and in the spleen at 0.882 RPKM.
If you then go up a step and look a the expression data for DDX11 itself, you’ll see that in the testes it is expressed at a rate of 8.477 RPKM, in the pituitary gland at 8.068 RPKM, in the prostate at 9.212 RPKM and in the spleen at 10.047 RPKM.
Already we can see that the DDX11 gene expression levels dwarf those of DDX11L2, but don’t forget we have 40 other DDX genes being transcribed as well! Let’s look at some other DDX genes:
DDX1 DDX3X DDX5 DDX6 DDX11 DDX11L2 Testes 33.980 50.016 138.808 8.816 8.477 3.905 Pituitary Gland 35.394 34.213 321.470 7.100 8.068 0.936 Prostate 24.214 36.621 240.220 9.998 9.212 0.893 Spleen 23.889 50.052 347.019 10.811 10.047 0.882
Wow, DDX5 is expressed almost 400 times more often in the spleen that DDX11L2!
But remember how I said there were two transcripts? Well, the expression data above are from GTEx, and according to the locus given for DDX11L2, it only gives data for the shorter transcript – the one that does not overlap the fusion site.
If we look at AceView we can actually get a breakdown of how frequently the introns are sequenced in RNA-seq studies. The intron that corresponds to the fusion site was sequenced 682 times, while the intron common to both transcripts was sequenced a total of 3,186 times, implying that the longer transcripts make up only around 21.4% of the total DDX11L2 transcripts.
So, what are we to make of these numbers?
When Tomkins claims that DDX11L2 is a “highly expressed gene“, we have to ask “highly expressed relative to what exactly?” According to the AceView link above, this gene is expressed at “only 26.8% of the average gene“. If you then take into account the fact that the transcript that spans the fusion site makes up only 21.4% of transcripts for this gene, then it is expressed only 5.7% as frequently as an average gene.
If you were to hypothetically split this chromosome in half at the fusion site, you wouldn’t be “dead in a day” as Ian Juby likes to say, you would just lose a very lowly expressed transcript of a pseudogene. That pseudogene is part of a family of 17 other similar pseudogenes (DDX11L), which as a group, bear some resemblance to an actual protein-coding gene (DDX11). That protein-coding gene is then part of a much larger group of protein-coding genes (DDX).