
                                  dotpath 
                                      
   
   
Function

   Displays a non-overlapping wordmatch dotplot of two sequences
   
Description

   A dotplot is a graphical representation of the regions of similarity
   between two sequences.
   
   The two sequences are placed on the axes of a rectangular image and
   wherever there is a similarity between the sequences a dot is placed
   on the image.
   
   Where the two sequences have substantial regions of similarity, many
   dots align to form diagonal lines. It is therefore possible to see at
   a glance where there are local regions of similarity.
   
   dotpath is very similar to the program dottup which looks for places
   where words (tuples) of a specified length have an exact match in both
   sequences and draws a diagonal line over the position of these words.
   
   Using a longer word size thus displays less random noise, runs
   extremely quickly, but is less sensitive.
   
   dotpath finds all matches of size -wordsize or greater between two
   sequences. It then reduces the matches found to the minimal set of
   long matches that do not overlap. This is a way of finding the
   (nearly) optimal path aligning two sequences. It is not the true
   optimal path as produced by the algorithms used in water or needle,
   but for very closely related sequences it will produce the same result
   and will work well with very long sequences.
   
   If you wish to compare the path found by dotpath to the set of all
   matches found then the qualifier -overlaps will show all matches in
   red except for the matches in the minimal path which are shown in
   black, as normal.
   
Usage

   Here is a sample session with dotpath
   

% dotpath tembl:AF129756 tembl:AP000504 -word 20 -graph cps -overlaps 
Displays a non-overlapping wordmatch dotplot of two sequences

Created dotpath.ps
   
   Go to the input files for this example
   Go to the output files for this example
   
Command line arguments

   Standard (Mandatory) qualifiers:
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
   -wordsize           integer    Word size
   -graph              graph      Graph type

   Additional (Optional) qualifiers:
   -overlaps           boolean    Displays the overlapping matches (in red) as
                                  well as the minimal set of non-overlapping
                                  matches
   -[no]boxit          boolean    Draw a box around dotplot

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2             integer    First base used
   -send2               integer    Last base used, def=seq length
   -sreverse2           boolean    Reverse (if DNA)
   -sask2               boolean    Ask for begin/end/reverse
   -snucleotide2        boolean    Sequence is nucleotide
   -sprotein2           boolean    Sequence is protein
   -slower2             boolean    Make lower case
   -supper2             boolean    Make upper case
   -sformat2            string     Input sequence format
   -sdbname2            string     Database name
   -sid2                string     Entryname
   -ufo2                string     UFO features
   -fformat2            string     Features format
   -fopenfile2          string     Features file name

   "-graph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-asequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-bsequence]
   (Parameter 2) Sequence USA Readable sequence Required
   -wordsize Word size Integer 2 or more 4
   -graph Graph type EMBOSS has a list of known devices, including
   postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows,
   x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
   png EMBOSS_GRAPHICS value, or x11
   Additional (Optional) qualifiers Allowed values Default
   -overlaps Displays the overlapping matches (in red) as well as the
   minimal set of non-overlapping matches Boolean value Yes/No No
   -[no]boxit Draw a box around dotplot Boolean value Yes/No Yes
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)
   
Input file format

  Input files for usage example
  
   'tembl:AF129756' is a sequence entry in the example nucleic acid
   database 'tembl'
   
  Database entry: tembl:AF129756
  
ID   AF129756   standard; DNA; HUM; 184666 BP.
XX
AC   AF129756;
XX
SV   AF129756.1
XX
DT   12-MAR-1999 (Rel. 59, Created)
DT   29-OCT-1999 (Rel. 61, Last updated, Version 2)
XX
DE   Homo sapiens MSH55 gene, partial cds; and CLIC1, DDAH, G6b, G6c, G5b, G6d,
DE   G6e, G6f, BAT5, G5b, CSK2B, BAT4, G4, Apo M, BAT3, BAT2, AIF-1, 1C7, LST-1
,
DE   LTB, TNF, and LTA genes, complete cds.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia
;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-184666
RA   Rowen L., Madan A., Qin S., Shaffer T., James R., Ratcliffe A., Abbasi N.,
RA   Dickhoff R., Loretz C., Madan A., Dors M., Young J., Lasky S., Hood L.;
RT   "Sequence of the human major histocompatibility complex class III region";
RL   Unpublished.
XX
RN   [2]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (22-FEB-1999) to the EMBL/GenBank/DDBJ databases.
RL   Department of Molecular Biotechnology, Box 357730 University of Washington
,
RL   Seattle, WA 98195, USA
XX
RN   [3]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (28-OCT-1999) to the EMBL/GenBank/DDBJ databases.
RL   Multimegabase Sequencing Center, University of Washington, PO Box 357730,
RL   Seattle, WA 98195, USA
XX
DR   EPD; EP11158; HS_TNFA.
DR   EPD; EP11159; HS_TNFB.
DR   SPTREMBL; O00452; O00452.
DR   SPTREMBL; O14931; O14931.
DR   SPTREMBL; O95866; O95866.
DR   SPTREMBL; O95868; O95868.
DR   SPTREMBL; O95869; O95869.
DR   SPTREMBL; O95870; O95870.


  [Part of this file has been deleted for brevity]

     aaaccagttt accaccactc ctaacactaa acttaaatct gactctaaat gtaagtccaa    18174
0
     tctgagccac aagcctaaag ttgaacttta tcctgcttta tgaattattc atccattcct    18180
0
     ccatttagtg agtatctgcg tgcctaacac atgctgggca ttgtcctaag gcaggaggga    18186
0
     catggaggca aagggatcag agaaggtacc agcacctgtg gagcttgtat tccagtgagg    18192
0
     ccagacggaa aagaaagaaa ctgaagaaga aattggtact atgagaaaat aagacaggct    18198
0
     gatgttgtaa gagtggcagg gagctacttt taaatacagt agtcagcaaa atcctctttg    18204
0
     agtgtttggg tggcactgga gctgagaccc aaatgacaaa aaatagtgac caggtaaaag    18210
0
     tttgggagca aagcatttca ggtaaaggga gcagctactg caaaggctgg aaggcggaac    18216
0
     caagctgggg gtgttgacga caaacagaag gccagtgtgg ctggagcaga gagagagact    18222
0
     gggaggcggg tgggagatga ggtcagagag gagggcaggg gccaggtcat gcagggccat    18228
0
     gcaagaaggg taaagcctct agatttcatc cagccacagg aagcctttaa aggtcgtcag    18234
0
     agtgtgtggt gcgtgcgtgt gtgtgtgtgt gtgtgtgtgt gttgcagggg agagaggggg    18240
0
     agggagagag agagagagag agagaagagg gaggtgagca gaggtgattg gatttttttt    18246
0
     tcttttgaca tggtgtcttg ctctgtggcc taggctggag tgcagtggca ccatcatagc    18252
0
     ccactgcaac ctcaaaacca tgggctcaag tcatccttcc acctcagctt cccaagtatc    18258
0
     taggactaca ggtgtgtgcc actgtgcctg gctaatttta aaaaatattt taaaattttt    18264
0
     gttgagacag ggtctatgct gctcaggctg gtctcgaact cctggtttca agtgatctgc    18270
0
     ccatcttggc ctcccaaagt ttttttttgt tagtttgaga ggcggtttcg ctcgttgccc    18276
0
     aggctggagt gcaatgactg atctcatctc actgcaacct ctgcctcctg ggttcaagcg    18282
0
     attctcctgc ttcagcctcc caagtagctg ggattacagg tgcatgccac cattcccggc    18288
0
     taattttttg tatttagtag agatggggtt tcaccatgtt agtcaggctg atctcaaact    18294
0
     cctgacctca ggtgatccgc ctgcctcagc ctcccaaagt tttgggatta caggtgtgag    18300
0
     ccaccatgct gggccagcct cccaaagttt tgggattaca ggcatgagtc accacactgg    18306
0
     ccctggattt tttttctttc ttttttttgg agacggagtc tcactctgtt gcccaggctg    18312
0
     gagtgcaatg gcgtaatctc agctcactgc aacctctgct gcccgggttc aaacgattct    18318
0
     cctgtcttag cctcctgagt agctgggatt ataggtgcat gccaccatgc ctggctaatt    18324
0
     tttgtacttt tagtagagaa agtacaccat cttggccagg ctggtctcga actcctgacc    18330
0
     tcaggtgatc cacttgcgtc ggcctcccaa agtgctggga ttacaggcgt gagacaccgc    18336
0
     acccagcctt tttttttttt tttcttttaa gacagaatcg ctctgtcacc caggctggag    18342
0
     tgcagtggca caatctcggc tcactgcaac ctctgcctcc caggtttaag caatccacct    18348
0
     atgtcagtct cccaagtagc tgggattata ggtgcatgtc accatgcctg gctaattttt    18354
0
     gtacttttag tatagaaagt acaccatgtt ggccaggctg gtcttgaact cctgacctca    18360
0
     agtgatccgc ctgcctcagc ctcccgaagt gctggaatta cagacatgtg ccactgcacc    18366
0
     cggcctggtt ttttttttct aagagatgga gtctcacttt tctgcccagg ttggagtgca    18372
0
     atggcaccat catagctcac tgcagccttc aactcttggc ctcaggcaat ccttgcacct    18378
0
     tagcctcgca gtgttgggat tacaggcatg agccactgag ccttgcctgg actttttttt    18384
0
     ttttttgaga tggcgtctcg ctctgttgcc caggttggag tgctacggca tgatcttggc    18390
0
     tcactgcaac ttccacctcc caggttcaag cgattctctt gcctcggccc cccgagtagc    18396
0
     tgggattaca ggcatgcgcc accgtgcctg gctaattttg gtatttttag tagagatagg    18402
0
     gtttcatcat gttgggcagg ctggtcttga actcctgacc tcgtgatcca cccacctcgg    18408
0
     cctcccaaag tgctgggatt ataggcatag ccaacgcgcc cagcctggac ttgtttttaa    18414
0
     aagatcactg tggctcctgt gtttaggctg gctggtagga gacaggtggc agtggcattg    18420
0
     atggtgaaga gaaaatagtg gcagccatgg agatggagag aagtagacaa gtttgggata    18426
0
     tattatacat tccaggggta gaaacaacag gactagatga tggattgatg ggtgggagat    18432
0
     gtagatactg ggagagaagc aggattctga tggatggaaa aactaaaaaa ttctattttg    18438
0
     ggtgtggtaa gtctaagtct attagacatg caagtagaga tgtcactggg cagatacaca    18444
0
     tctggatttc aggggcaagg tccaagctag agaaagaaac ctgggcatgg tcagcatgag    18450
0
     gatggtgttt aaagccatgg aacttatctt gtgcatccct ataagacccc tttgaggcac    18456
0
     ttgtttcccc tcacaatgga tgcagtgcat cttccattct gaattccaga ggcaacaacc    18462
0
     tcctgctcct agaagctaaa ctctccagac ttagtcttct gaattc                   18466
6
//
   
  Database entry: tembl:AP000504
  
ID   AP000504   standard; DNA; HUM; 100000 BP.
XX
AC   AP000504; BA000025;
XX
SV   AP000504.1
XX
DT   28-SEP-1999 (Rel. 61, Created)
DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)
XX
DE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section
DE   3/20.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia
;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-100000
RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;
RT   ;
RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.
RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced
RL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
RL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,
RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)
XX
RN   [2]
RA   Shiina S., Tamiya G., Oka A., Inoko H.;
RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";
RL   Unpublished.
XX
DR   SWISS-PROT; O00299; CLI1_HUMAN.
DR   SWISS-PROT; O43196; MSH5_HUMAN.
DR   SWISS-PROT; O95445; APOM_HUMAN.
DR   SWISS-PROT; O95865; DDH2_HUMAN.
DR   SWISS-PROT; O95867; NG24_HUMAN.
DR   SWISS-PROT; P13862; KC2B_HUMAN.
XX
CC   This sequence is conducted by Tokai University as a JST sequencing
CC   Team.
CC   Principal Investigator: Hidetoshi Inoko Ph.D
CC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,
CC   The sequence is submitted by Human Genome Sequencing in ALIS
CC   project of JST
CC   Japan Science and Technology Corporation (JST)
CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 Japan
CC   For further infomation about this sequences, please visit our
CC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.


  [Part of this file has been deleted for brevity]

     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     9708
0
     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     9714
0
     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     9720
0
     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     9726
0
     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     9732
0
     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     9738
0
     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     9744
0
     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     9750
0
     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     9756
0
     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     9762
0
     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     9768
0
     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     9774
0
     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     9780
0
     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     9786
0
     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     9792
0
     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     9798
0
     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     9804
0
     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     9810
0
     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     9816
0
     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     9822
0
     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     9828
0
     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     9834
0
     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     9840
0
     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     9846
0
     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     9852
0
     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     9858
0
     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     9864
0
     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     9870
0
     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     9876
0
     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     9882
0
     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     9888
0
     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     9894
0
     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     9900
0
     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     9906
0
     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     9912
0
     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     9918
0
     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     9924
0
     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     9930
0
     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     9936
0
     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     9942
0
     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     9948
0
     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     9954
0
     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     9960
0
     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     9966
0
     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     9972
0
     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     9978
0
     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     9984
0
     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     9990
0
     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     9996
0
     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          10000
0
//
   
Output file format

   In normal operation, a dotplot image is displayed.
   
   With the -data qualifier a file of the positions of the matches in the
   minimal non-overlapping set of matches is output.
   
  Output files for usage example
  
  Graphics File: dotpath.ps
  
   [dotpath results]
   
Data files

Notes

References

Warnings

   If you give a small word size with a very large sequence you will run
   out of memory. If this happens, try again with a larger word size.
   
Diagnostic Error Messages

Exit status

Known bugs

See also

   Program name                       Description
   dotmatcher   Displays a thresholded dotplot of two sequences
   dottup       Displays a wordmatch dotplot of two sequences
   polydot      Displays all-against-all dotplots of a set of sequences
   
   This program is closely based on dottup with the addition of by
   default displaying only the minimal set of non-overlapping matches.
   
   This program uses the same algorithm as diffseq for finding a minimal
   set of very good matches between two sequences. diffseq may be more
   convenient if you are looking at the differences between two nearly
   identical sequences.
   
Author(s)

   Gary Williams (gwilliam  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

   Written 14 Aug 2000.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
