Where did that guy get his data from?

Over at Evolution News and Views, I got a shout out from Ann Gauger with a tone that somewhat suggests that my data might be wrong. This post is for her.

Dear Ann,

BLASTN 2.2.30+


Query= 8 dna:chromosome chromosome:Galgal4:8:17537065:17580564:1

Length=43500

Subject= 1 dna:chromosome chromosome:GRCh38:1:78700000:78800000:-1

Length=100001


 Score =   291 bits (322),  Expect = 7e-79
 Identities = 361/483 (75%), Gaps = 22/483 (5%)
 Strand=Plus/Plus

Query  620    AATTATGAAAGCATACTTTTC-AGTGGTATTCCAGAGAAAGGACTTGCAAGAACTGGAAT  678
              || ||||||||  ||| |||| ||||   ||||| | | || ||||  |||| |||| ||
Sbjct  10940  AAATATGAAAG--TACATTTCTAGTGTATTTCCACA-ACAGTACTTAGAAGACCTGGGAT  10996

Query  679    AAGGATAAGAAGTGAAGTGGAAATTAGTGGTATTGGACCAAAACTTTGTCTTATTAGGGT  738
                  | || |||| |||| ||| | | ||| |  |  |  ||||   || |||||| |||
Sbjct  10997  GTAAACAAAAAGTAAAGTAGAAGTCACTGGCACAGATCTGAAACCAAGTTTTATTAAGGT  11056

Query  739    AAGAAAATTTATTCAATTTGAAAGGTAAAATTCTCTTGATACCAGTTTGTTGGGTTTTTT  798
              |||||||| ||| ||||||||||| |  ||||  |||        |||| |    |||| 
Sbjct  11057  AAGAAAATATATCCAATTTGAAAGCTGGAATTACCTTC-------TTTGATAACGTTTTC  11109

Query  799    TTTTTAAGCTTTTGGGAAGTAATTAAGTTTCATCATATGTTGTGCTTACTCAGGCAGAAT  858
                   ||||||| |  || |||||||||||||||||||||| |||||||||| |||||||
Sbjct  11110  -----AAGCTTTGGACAAATAATTAAGTTTCATCATATGTTTTGCTTACTCATGCAGAAT  11164

Query  859    GTAACTAACACTACTGTTTTTTTATT-CAGTGCTCTAAATTCTATTTG-CACTTT-GCCA  915
              ||||||||  ||  | |||||||| | |||    |||||||| ||||  |||| | ||||
Sbjct  11165  GTAACTAAGTCTTTTTTTTTTTTAATGCAGAAGCCTAAATTCCATTTCACACTGTAGCCA  11224

Query  916    GGTAATTCTCAGCTCAAGCCAACCTTGGGCTTGAAGGATTTCTTCTGCTTTGTGGCCAGG  975
              |  |||||||||||||||| |||||||||||||| |||||||||||||||||||  ||||
Sbjct  11225  GACAATTCTCAGCTCAAGCTAACCTTGGGCTTGAGGGATTTCTTCTGCTTTGTGCTCAGG  11284

Query  976    GAGACAATGGAATGTAATTTGAAATGCACAGTAATTTGTTATTGGATCAATCCAATTGTT  1035
              |||||||| ||   ||||||  ||||||||| | ||| ||    ||||||| ||||||| 
Sbjct  11285  GAGACAATAGAGCATAATTTTGAATGCACAGCAGTTTATTCCAAGATCAATTCAATTGTA  11344

Query  1036   CC-AAACTGTACAACCTAGGATTATTTAATCAACTGATTTCGTAGCCAGCAAACGAAAGG  1094
              || ||||  || ||| |  |||||||||||||||| ||||| ||  ||| |||| ||| |
Sbjct  11345  CCAAAACCATATAACTTTAGATTATTTAATCAACTTATTTCATAAGCAG-AAAC-AAATG  11402

Query  1095   CAA  1097
              |||
Sbjct  11403  CAA  11405


 Score =   223 bits (246),  Expect = 3e-58
 Identities = 287/390 (74%), Gaps = 31/390 (8%)
 Strand=Plus/Plus

Query  24111  TTTTTCTTGTTAAAGGACCAGAGCTGGCTCTTGCAGCCTATTTTTACAGTACCGTGTGAT  24170
              |||||  |||||||  ||||| ||   |||| || |||||||||||   | | || | ||
Sbjct  41719  TTTTTAATGTTAAAACACCAGCGCCAACTCTGGCTGCCTATTTTTATTATGCTGTATAAT  41778

Query  24171  TCTGCAGACATTGACATGTGTCACCTGTGATGCAGCTACATTTGTCG-GCTCTCTGTGCT  24229
              ||  |||  || ||||||||||||||||| |||||  ||||||| |  ||||| ||||||
Sbjct  41779  TCCACAGGTATCGACATGTGTCACCTGTGTTGCAGTCACATTTGGCCAGCTCTTTGTGCT  41838

Query  24230  CAACAGGGAGGAATCGATCTTCTACTTTCATTAGGTGGCAGGAGTAGACTATTGGCATAA  24289
              ||  ||||| | ||| ||||||  ||||||||| |||| || | ||||| ||||||||||
Sbjct  41839  CACTAGGGAAGCATCAATCTTCAGCTTTCATTAAGTGGTAGAAATAGACCATTGGCATAA  41898

Query  24290  AAAAT--ACTAAAAAAAAAAATGGAAAAGAAACCCCAGGGCTTTCTGCTTGGAGAC--CC  24345
              |||||    |  ||||| ||||||   ||||          ||||||| ||| |||  | 
Sbjct  41899  AAAATTATTTTTAAAAATAAATGG--GAGAA---------TTTTCTGCCTGGGGACTACA  41947

Query  24346  AGACTGCTGTTCAGTGGTCATTTGAATTATTGAATGGGATTAAAATAAAAGCCATTTC--  24403
                ||| ||||||  | || ||||||||||||||||||  |  ||||||||||||||||  
Sbjct  41948  CCACTACTGTTCTCTAGTAATTTGAATTATTGAATGGACT--AAATAAAAGCCATTTCTA  42005

Query  24404  --CTTTTT----TTTGTCCCTTTACTCAGATGAGCCATCTGAAATGCAAGTTGATTTGT-  24456
                ||||||    ||| |   ||| | ||||  |   | ||||||||||||||||||| | 
Sbjct  42006  TTCTTTTTATTCTTTTTTTTTTTGCCCAGACAA---ACCTGAAATGCAAGTTGATTTTTT  42062

Query  24457  -ATTTTCTTTTATCCCCTCAGCTTGTTAGG  24485
               |   |||||||||||||||||||||||||
Sbjct  42063  AAAAATCTTTTATCCCCTCAGCTTGTTAGG  42092


 Score =   219 bits (242),  Expect = 4e-57
 Identities = 269/362 (74%), Gaps = 19/362 (5%)
 Strand=Plus/Plus

Query  41677  AAAATTATGAAAGCTTTACCCATTCTCATTATGCAAACTTAATCTAAATGGATGTCCTAA  41736
              ||||| | | ||||||| | ||||||  ||| |||||| ||||||||| | | |  ||||
Sbjct  84967  AAAATCAGGGAAGCTTTGCACATTCTAGTTACGCAAACGTAATCTAAACGAAGGCTCTAA  85026

Query  41737  TATTCTTCCAGATGCAACGATAAACCTCCAACTATCTAAGAATATTTATTGGGAGGATGA  41796
              || |||||   || |   |||||||||||||  |  | ||||| |||||||| |||||||
Sbjct  85027  TACTCTTCTCTATACCTTGATAAACCTCCAAACAGTTGAGAATTTTTATTGGAAGGATGA  85086

Query  41797  GTATTATTATTCCATTTAGATTATTTCAGCATTAAGGGATATGGCTTATTCAAGCTGCTG  41856
               |  |||||||||||||||           ||||| |||||||||||||||| ||||| |
Sbjct  85087  ATCATATTATTCCATTTAG-----------ATTAAAGGATATGGCTTATTCAGGCTGCAG  85135

Query  41857  TTGATAAAGCAATGTGGTAAGGTTAATACTGCAACTGAC-AATGCTCTGCCAGATTTCAC  41915
               | |||||||   || || || ||||| ||  ||||||| |||||  |||||||||| | 
Sbjct  85136  ATAATAAAGCCGCGTAGTCAGCTTAATGCTAAAACTGACAAATGCGGTGCCAGATTTGAA  85195

Query  41916  AATATATGGCAAACTTTAATTAGAAGTTTATGAACCTCTGAAAATTCTCCGAAGGGCTTA  41975
              || |||  | ||| ||||||| ||||||||||||  ||||||||  |||| ||||| |||
Sbjct  85196  AACATACAGGAAATTTTAATTGGAAGTTTATGAAAGTCTGAAAA-ACTCCAAAGGGTTTA  85254

Query  41976  TCCTTCAGGATGAACTT-CGACAAA--TAGTCAGCTGAAATATGCAGTGATATGCA-GCA  42031
              |||||    || |||||   |||||  |||  |||| | ||||    ||||||||| |||
Sbjct  85255  TCCTTAGAAATAAACTTAAAACAAAATTAG--AGCTAATATATTAGCTGATATGCAGGCA  85312

Query  42032  AT  42033
              ||
Sbjct  85313  AT  85314


 Score =   102 bits (112),  Expect = 7e-22
 Identities = 219/321 (68%), Gaps = 40/321 (12%)
 Strand=Plus/Plus

Query  42618  AATGCATTATGTACAGTCTGCACTGCTTAATAAATATGTTGTTCATTAAATAAGTATTCA  42677
              ||||||||||||||||||||   ||||||||||||   ||  |||| |||||   |||| 
Sbjct  85882  AATGCATTATGTACAGTCTG---TGCTTAATAAATGCATTACTCATAAAATATACATTCG  85938

Query  42678  TGTGGTCTCCCTCTTTATTTTTCCATATCAACAAAACAATCAGACCAACTATAATATTAT  42737
                 |  ||||          || | |||||| | | ||||||  |  |  |||| |||||
Sbjct  85939  CAAGTCCTCC----------TTTCTTATCAATAGAGCAATCAAGCTGAGCATAAGATTAT  85988

Query  42738  CCAGAAATTCTGCTTCTTTTTAT-CTGAAATATAATTATAGCAGTCCTCTCTTAAAATTA  42796
                ||||||||  || |||  ||| |  ||||||||| | |  | |||||| ||||||| |
Sbjct  85989  TGAGAAATTCAACTCCTTCCTATACAAAAATATAATCACATTA-TCCTCTTTTAAAATCA  86047

Query  42797  TGTTCACTAGGTGATGAAAGGAAA----ACATGATTACAGCTACTGCTAACATTCCATTG  42852
              || |||      ||||||| ||||    | ||| ||  ||||     ||| ||| |||||
Sbjct  86048  TGCTCA------GATGAAAAGAAACAAGAGATGGTT--AGCT-----TAATATTTCATTG  86094

Query  42853  TGTAAAGAATTTCATATTTAGCATACTAAAGACACAGTAAGCATTTGTTTTCTTTTATGT  42912
              |  |||| |||| ||||||| | |||| || | ||  ||  ||||| |||||||| ||||
Sbjct  86095  TAAAAAGCATTTTATATTTAACCTACTCAAAATAC--TA--CATTT-TTTTCTTTCATGT  86149

Query  42913  TTTCTGGTA---AAGAGAAAA  42930
              || |  | |   |||||||||
Sbjct  86150  TTCCCAGCATAGAAGAGAAAA  86170


 Score = 57.2 bits (62),  Expect = 3e-08
 Identities = 45/54 (83%), Gaps = 0/54 (0%)
 Strand=Plus/Plus

Query  21281  AATACACTCTGGAAAACTATTGTAGCCAGATGCCAAACAAATGAAAACAGATGG  21334
              |||| || |||  |||| |||||| | | || ||||||||||||||||||||||
Sbjct  40402  AATATACCCTGTGAAACAATTGTAACTAAATACCAAACAAATGAAAACAGATGG  40455


 Score = 51.8 bits (56),  Expect = 1e-06
 Identities = 173/264 (66%), Gaps = 14/264 (5%)
 Strand=Plus/Plus

Query  19077  CATTTCATAAGTTGGTATTATAGAATATGACTG-AATGTAAATGATATAATAGTAGCAAT  19135
              ||||  || || || || || ||||| | |  | |||| ||||||| |    |  |||||
Sbjct  36099  CATTAGATGAGCTGATACTACAGAATGTTAAAGGAATGCAAATGATCTGGCTGGTGCAAT  36158

Query  19136  AAGAAAAATAGCATAGTCACTGCGCATTGAGCA--GGTACTTATAATTTTGCCAATTAAT  19193
                ||||||||  ||| |  |   | | ||| ||  || |||  ||||| ||    |||||
Sbjct  36159  TGGAAAAATATTATAATTTCCATGAAATGAACAAAGGGACTC-TAATTATGTATGTTAAT  36217

Query  19194  AGTTCAAAAAGGCAGAATATTCTGTTTATGGCTGTTTTATAATATTGGTTTTGTAGTTGA  19253
              ||  |||| | ||| | | ||||| |||||   |||| || | ||  ||||| | |||||
Sbjct  36218  AGGACAAAGATGCACAGTGTTCTGCTTATGATGGTTTAATGACATCAGTTTTATGGTTGA  36277

Query  19254  TTTTTCATATA-ATCTTGTATCAG--------TTGTGTTATGCAATTAGTAAACTGATAC  19304
                ||||  | |  |||  ||||||        |  | ||| |||||||||||  |||| |
Sbjct  36278  GATTTCCAACATTTCTCATATCAGACTTTATATCATATTAGGCAATTAGTAACTTGATTC  36337

Query  19305  AATCTGCA-AATGTGGCTTTAAAA  19327
              |  ||||| |  |||| |||||||
Sbjct  36338  ACACTGCACAGGGTGGTTTTAAAA  36361

You’re welcome.

Advertisements

2 thoughts on “Where did that guy get his data from?

  1. Dear Glen Williamson

    1) If I understand correctly you BLASTed VIT1/VTG1 of Gallus Gallus against the human genome.
    2) The GenBank file for VTG1 of Gallus Gallus has the following:

    join(15..54,314..334,550..701,3153..3404,5590..5751,
    6462..6613,8894..9044,9140..9297,9411..9576,10653..10775,
    12381..12599,14619..14836,14922..15019,19332..19456,
    19542..19643,19733..19852,20114..20303,20392..20485,
    20602..20787,25811..25955,27554..27773,28618..28787,
    29451..30185,30269..30325,31098..31331,32325..32531,
    33461..33603,34122..34260,35456..35594,36312..36498,
    36912..37009,37588..37682,37788..37947,39140..39301,
    42323..42441)

    which I believe indicates the beginning and ends of the exons.

    3) It appears that all of your human/VIT1 matches are in introns, except for the first
    where the first 81 bp overlag the 550..701 exon.

    4) Genbank also says the length of VIT1 is 42639 which means only the first 22 bp of
    your last match overlaps VIT1 at all, since you chose 43500 as the target length.

    5) Am I missing something?

    Cheers

    1. 1), 2) and 3) Correct!

      4) Sure. Ensembl, however, lists the coordinates as 8:17,537,065-17,580,564, which is where the 43,500bp comes from.

      5) Nope, I think you’ve covered it pretty well! I’m not overly concerned whether the hits are inside exons or introns – we’re just looking for similarity in between the two homologous genes on either side of the chicken VTG1.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s