Chromosome 2 Fusion – It’s a Binding Site. Whoopty-frikkin-do.

Both Jeff Tomkins:

“Clearly, the putative 800 base fusion site is not a degenerate fusion sequence, but a transcriptionally functional and active DNA binding motif read on the minus strand inside the DDX11L2 gene.”

Alleged Human Chromosome 2 “Fusion Site” Encodes an Active DNA Binding Domain Inside a Complex and Highly Expressed Gene—Negating Fusion

… and Cornelius Hunter over at Darwin’s God:

“Genes shouldn’t be there, regardless of expression level, and TFs shouldn’t be binding there.”

The Naked Ape: BioLogos on Human Chromosome Two

… seem to make much of the fact that the fusion site on Human chromosome 2 contains a transcription factor binding site.

The Evidence

In Tomkins’ paper, he posts an image of the UCSC Genome Browser that shows the evidence for transcription factor binding activity at the fusion site. I’ve reproduced the image below, but zoomed in on the relevant sections:


The first section (in red) shows where the 798 base pair fusion site sits on the chromosome. The next section shows the two DDX11L2 transcripts, transcribed from right to left. As you can see, the longer transcript completely encompasses the fusion site. So far, so good.

But what about the green bumps and the grey bars?

The Green Bumps

Without getting into the details of ChIP-seq, the bumps seen here are a good proxy for the relative strength of the binding site. A high peak signifies a strong and/or frequent bond, while a low peak signifies a weak and/or infrequent bond. The particular transcription factor here is named CTCF, and the highest peak we can see in the image above is 0.0222.

But how big is 0.0222? What do we have to compare it to? Back on the Wikipedia page for CTCF, it says that there are “anywhere between 15,000 – 40,000 CTCF binding sites” in the human genome. That would imply that there are somewhere between 1,200 and 3,100 CTCF binding sites on chromosome 2 alone. Maybe the binding site in the fusion sequence is counted among them?

No. Not even close. Fortunately the ENCODE data behind those green bumps is freely available. If you’re really keen, you can download the file here:

(Cell Line = H1-hESC; Antibody Target = CTCF (07-729); View = Peaks)

A cursory glance will tell you that a value of 0.0222 doesn’t even make it into the published data – the minimum value to be counted as a peak by ENCODE is 0.1000. If you’re not good at math, that’s 4.5 times taller than Dr Tomkins’ biggest green bump.

And how many peaks are there on chromosome 2 taller than 0.1000, you ask? Almost 6,000 of them. And how tall are they? Well, if I were to take the 1,000 tallest peaks on chromosome 2, the average height would be 2.1188. Yup, that’s almost one hundred times higher than the tallest peak in the fusion site. Can you see that little red pixel? That’s the binding site.


Kinda puts things into perspective, doesn’t it?

The Grey Bars

This is where things get a lot more interesting. The grey bars at the bottom of the image represent the collated signals for all the different transcription factors. In a similar fashion to the CTCF data above, a black bar represents a strong and/or frequent bond, while a grey bar represents a weak and/or infrequent bond.

If you dig down into the data, you’ll see that the transcription factor that causes the bar to be black is RNA Polymerase II. Great. But what is such a strong signal doing in a region that was supposedly caused by a fusion of sub-telomeric DNA? Both Dr Tomkins and Dr Hunter seem to think that transcription factor binding sites and sub-telomeric DNA are mutually exclusive.

No. No they are not.

Let’s move away from DDX11L2 for a minute and have a look at DDX11L1. It’s at the beginning of chromosome 1.


Would you like to see the DNA from the binding site immediately upstream? Here it is:

>hg19_wgEncodeRegTfbsClusteredV2_Pol2-4H8 range=chr1:10134-10362

Does it remind you of anything? Telomere repeats, perhaps?

Let’s look at DDX11L5 on chromosome 9. I wonder what the binding site sequence looks like there.

>hg19_wgEncodeRegTfbsClusteredV2_Pol2-4H8 range=chr9:9965-10327

Yup. Looks awfully like telomere repeats. The transcription factor binding site for DDX11L9 – right at the end of chromosome 15 – looks like this:

>hg19_wgEncodeRegTfbsClusteredV2_Pol2-4H8 range=chr15:102521116-102521200

Should I go on, or are Dr Tomkins and Dr Hunter willing to concede the point?


16 thoughts on “Chromosome 2 Fusion – It’s a Binding Site. Whoopty-frikkin-do.

  1. I am curious about this comment by Ken Miller. Why are there ‘some’ and not all databases the same?

    “Most of the genome databases show the DDX11L2 sequence off to one side of the fusion site – so it really does not span it. In some databases there are transcript variants that include the head-to-head telomere motifs as one of the introns in the primary transcript. This is where the claim comes that it ‘spans’ the site. But this is explained by the variability in the termination of transcription so that occasionally a somewhat longer RNA is produced. The fact that transcription factor binding has been found says nothing about the specificity of binding or its biological importance.”

    1. “Why are there ‘some’ and not all databases the same?

      Unsure – I know Ensembl make an effort to synchronise data between itself and NCBI. It might just be gltch, or they could be filtering it out for valid reasons. Perhaps ask Ensembl?

      But this is explained by the variability in the termination of transcription so that occasionally a somewhat longer RNA is produced.

      That doesn’t make a whole lot of sense when the DNA is transcribed right to left in this case (see the image rght at the top). That is, it’s not the termination of the transcript that is different, it’s the initiation. Maybe ask Ken Miller to double-check?

      The fact that transcription factor binding has been found says nothing about the specificity of binding or its biological importance.

      100% correct. Transcription factor binding sites are everywhere; all over the genome. And while it’s a necessary condition for a binding site to be present in order to assign some level of importance, it’s not a sufficient condition.

      1. Thanks for the response! Maybe, Miller means that the whole process of transcription to make the protein is terminated. The variability in that termination sometimes means that a longer RNA is produced.

      2. I think it’s more likely that he just didn’t give the question as much attention as it deserved. If he’s talking about variability in termination, then he’s looking at the wrong end … like I said, best to check with the man himself – I don’t want to put words in his mouth 🙂

  2. If I am not mistaken it is the transcript variant that spans the site not the DDX11L2 – correct. If so, when transcription begins why is it transcribing the fusion site? Can it be that it just identifies the sequence as something that needs to be transcribe and treats it as an intron? Along these lines why does the exon at the beginning of transcription (right side of the fusion site) also get transcribed if the gene does not span the site?

    I appreciate you insight as I am not well versed in genetics. Thanks!

    1. … the transcript variant that spans the site not the DDX11L2 – correct.

      It comes down to what you call a “gene”. What most databases use are the beginning and end points of ALL the transcripts. So in the case of DDX11L2, the “gene” spans the fusion site simply because the longest transcript crosses the fusion site. The shorter transcript – which does NOT cross the fusion site is expressed about 4 times more frequently than the longer transcript, and it is the shorter transcript that has the exons that identify it as part of the DDX11L family.

      If so, when transcription begins why is it transcribing the fusion site?

      “What would cause it to STOP transcribing the fusion site” is a better question 🙂

      Along these lines why does the exon at the beginning of transcription (right side of the fusion site) also get transcribed if the gene does not span the site?

      That particular transcript DOES span the fusion site.

      1. Thanks! That makes sense – why NOT transcribe it. If the shorter transcript’s exons are what identify it as part of the DDX11L family what is the first exon in the longer transcript? 🙂

      2. It’s the beginning of an unrelated transcript on what was previously a separate chromosome. Before the fusion, this transcript would have gone nowhere; off into the telomeres. After the fusion, it transcribes through the fusion, and picks up where the original DDX11L2 transcript would have started anyway.

        Does that make sense?

  3. I was just relaying this blog to some one who then responded with: “The assumption of a single universal ancestor is one of many assumptions built into the Basic Local Alignment Search Tool computer program. The results of this computer program are, of course, limited to it’s many built-in assumptions. As I say this is just a start. BLAST is only as good as it’s programing. ” And: “The program is open source, the parameters can be changed by anyone using it. Different parameters/assumptions = different results.”

    Well, I am not an expert on this program but it is clear he aint liking the results and wants to discredit the program. Any thoughts?

    1. Clearly that’s a comment coming from someone that has never used BLAST and really doesn’t have much of an idea of what it does. It’s quite simple – BLAST looks for a small “query” sequence inside a larger “subject” sequence. Or a small “needle” in a large “hay stack”. It will find the closest match it can.

      There are no evolutionary assumptions written into the software – it’s just looking for the best match it can find. Like looking for a phrase in a book. You might not find an exact match, but it will find the closest match.

      Yes you can adjust the parameters – if the two sequences are quite different, then you can relax the parameters to find weaker matches; conversely if the two sequences are quite similar, then you can tighten the parameters to save time.

      If you can put me in touch with this person, I’d be happy to explain.

  4. Oh this is fun how creationist respond roohif. Now another tool , pardon the pun, responds with this nonsense that even I can tell is BS.

    “Maximus, you yourself introduced that both sides use the program to prove each of their own respective positions. Without getting into functionality and parameter programming details, that should be enough for you to question any given result.

    At any rate, the arrangement of ancestral gene clusters is an assumption of Evolution theory, which is an applied underlying assumption/parameter. Set up the desired arrangement (a variable), then find patterns that emerge. No wonder both sides can use the program in proving their own position.

    It’s a computer program. Start with a hypothesis, set the parameters based on that hypothesis, and then find the anticipated patterns. If at first you don’t succeed, reset the parameters, and try try again. Once the results, consistent with the hypothesis, are obtained, publish a paper proving your hypothesis is correct! Results do not lie, just look at the patterns/sequences for yourself.”

    Well, I asked hi as if he wanted to discuss it with you too, but frankly I am getting the picture that they are just throwing shit around hoping something sticks. Anyway, I see what both say. Creatools got more twists and turn than Cirque du Soleil.

      1. Now he does not want to engage on here but said that if you wanted to join the conversation over there you are welcome. Here is what he said:

        “The assumptions tailor the pattern search parameters. That is why the program can be used by both sides to support either position. One can search for a simple pattern, and thus a lot of false positives emerge, or a more detailed and robust search can be conducted finding significant patterns. The search can be curtailed or detailed to find just the patterns you want it to find. So there are a range of patterns that can be detected, from insignificant to significant. BLAST has limitations. There are better pattern finding searches–they take longer and cost more to conduct. btw, I’m discussing this with you Max. If your buddy wants to come here, he can join the discussion.”
        He thinks he is a programmer or geneticist – I at least recognize my limitations – so I asked if he would discuss it with you.
        Anyway, you are probably busy and frankly might be a waste of time but maybe you could register and post some stuff. Here is the Forum thread if you so wish.

        Thanks roohif, you have been very helpful!

  5. Oh yeah forgot – the thread is about the GULOP. But the issue for this guy is about the BLAST program. Just a heads up!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s