
                                  dottup 
                                      
   
   
Function

   Displays a wordmatch dotplot of two sequences
   
Description

   A dotplot is a graphical representation of the regions of similarity
   between two sequences.
   
   The two sequences are placed on the axes of a rectangular image and
   (in the simplest forms of dotplot) wherever there is a similarity
   between the sequences a dot is placed on the image.
   
   Where the two sequences have substantial regions of similarity, many
   dots align to form diagonal lines. It is therefore possible to see at
   a glance where there are local regions of similarity as these will
   have long diagonal lines. It is also easy to see other features such
   as repeats (which form parallel diagonal lines), and insertions or
   deletions (which form breaks or discontinuities in the diagonal
   lines).
   
   dottup looks for places where words (tuples) of a specified length
   have an exact match in both sequences and draws a diagonal line over
   the position of these words. This is a fast, but not especially
   sensitive way of creating dotplots. It is an acceptable method for
   displaying regions of substantial similarity between two sequences.
   
   Using a longer word (tuple) size displays less random noise, runs
   extremely quickly, but is less sensitive. Shorter word sizes are more
   sensitive to shorter or fragmentary regions of similarity, but also
   display more random points of similarity (noise) and runs slower.
   
Usage

   Here is a sample session with dottup
   

% dottup tembl:eclac tembl:eclaci -wordsize=6 -gtitle="eclaci vs eclac" -graph
cps 
Displays a wordmatch dotplot of two sequences

Created dottup.ps
   
   Go to the input files for this example
   Go to the output files for this example
   
Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
   -wordsize           integer    Word size
*  -graph              graph      Graph type
*  -xygraph            xygraph    Graph type

   Additional (Optional) qualifiers (* if not always prompted):
*  -[no]boxit          boolean    Draw a box around dotplot

   Advanced (Unprompted) qualifiers:
   -stretch            boolean    Use non-proportional axes

   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2             integer    First base used
   -send2               integer    Last base used, def=seq length
   -sreverse2           boolean    Reverse (if DNA)
   -sask2               boolean    Ask for begin/end/reverse
   -snucleotide2        boolean    Sequence is nucleotide
   -sprotein2           boolean    Sequence is protein
   -slower2             boolean    Make lower case
   -supper2             boolean    Make upper case
   -sformat2            string     Input sequence format
   -sdbname2            string     Database name
   -sid2                string     Entryname
   -ufo2                string     UFO features
   -fformat2            string     Features format
   -fopenfile2          string     Features file name

   "-graph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   "-xygraph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-asequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-bsequence]
   (Parameter 2) Sequence USA Readable sequence Required
   -wordsize Word size Integer 2 or more 10
   -graph Graph type EMBOSS has a list of known devices, including
   postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows,
   x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
   png EMBOSS_GRAPHICS value, or x11
   -xygraph Graph type EMBOSS has a list of known devices, including
   postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows,
   x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
   png EMBOSS_GRAPHICS value, or x11
   Additional (Optional) qualifiers Allowed values Default
   -[no]boxit Draw a box around dotplot Boolean value Yes/No Yes
   Advanced (Unprompted) qualifiers Allowed values Default
   -stretch Use non-proportional axes Boolean value Yes/No No
   
Input file format

   Any two sequence USAs of the same type (DNA or protein).
   
  Input files for usage example
  
   'tembl:eclac' is a sequence entry in the example nucleic acid database
   'tembl'
   
  Database entry: tembl:eclac
  
ID   ECLAC      standard; DNA; PRO; 7477 BP.
XX
AC   J01636; J01637; K01483; K01793;
XX
SV   J01636.1
XX
DT   30-NOV-1990 (Rel. 26, Created)
DT   04-MAR-2000 (Rel. 63, Last updated, Version 7)
XX
DE   E.coli lactose operon with lacI, lacZ, lacY and lacA genes.
XX
KW   acetyltransferase; beta-D-galactosidase; galactosidase; lac operon;
KW   lac repressor protein; lacA gene; lacI gene; lactose permease; lacY gene;
KW   lacZ gene; mutagenesis; palindrome; promoter region;
KW   thiogalactoside acetyltransferase.
XX
OS   Escherichia coli
OC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
OC   Escherichia.
XX
RN   [1]
RP   1243-1266
RX   MEDLINE; 74055539.
RA   Gilbert W., Maxam A.;
RT   "The nucleotide sequence of the lac operator";
RL   Proc. Natl. Acad. Sci. U.S.A. 70:3581-3584(1973).
XX
RN   [2]
RP   1246-1308
RX   MEDLINE; 74055540.
RA   Maizels N.M.;
RT   "The nucleotide sequence of the lactose messenger ribonucleic acid
RT   transcribed from the UV5 promoter mutant of Escherichia coli";
RL   Proc. Natl. Acad. Sci. U.S.A. 70:3585-3589(1973).
XX
RN   [3]
RX   MEDLINE; 74174501.
RA   Gilbert W., Maizels N., Maxam A.;
RT   "Sequences of controlling regions of the lactose operon";
RL   Cold Spring Harb. Symp. Quant. Biol. 38:845-855(1974).
XX
RN   [4]
RA   Gilbert W., Gralla J., Majors A.J., Maxam A.;
RT   "Lactose operator sequences and the action of lac repressor";
RL   (in) Sund H., Blauer G. (eds.);
RL   PROTEIN-LIGAND INTERACTIONS:193-207;
RL   Walter de Gruyter, New York (1975)
XX
RN   [5]
RP   1146-1282


  [Part of this file has been deleted for brevity]

     cgatttggct acatgacatc aaccatatca gcaaaagtga tacgggtatt atttttgccg      456
0
     ctatttctct gttctcgcta ttattccaac cgctgtttgg tctgctttct gacaaactcg      462
0
     ggctgcgcaa atacctgctg tggattatta ccggcatgtt agtgatgttt gcgccgttct      468
0
     ttatttttat cttcgggcca ctgttacaat acaacatttt agtaggatcg attgttggtg      474
0
     gtatttatct aggcttttgt tttaacgccg gtgcgccagc agtagaggca tttattgaga      480
0
     aagtcagccg tcgcagtaat ttcgaatttg gtcgcgcgcg gatgtttggc tgtgttggct      486
0
     gggcgctgtg tgcctcgatt gtcggcatca tgttcaccat caataatcag tttgttttct      492
0
     ggctgggctc tggctgtgca ctcatcctcg ccgttttact ctttttcgcc aaaacggatg      498
0
     cgccctcttc tgccacggtt gccaatgcgg taggtgccaa ccattcggca tttagcctta      504
0
     agctggcact ggaactgttc agacagccaa aactgtggtt tttgtcactg tatgttattg      510
0
     gcgtttcctg cacctacgat gtttttgacc aacagtttgc taatttcttt acttcgttct      516
0
     ttgctaccgg tgaacagggt acgcgggtat ttggctacgt aacgacaatg ggcgaattac      522
0
     ttaacgcctc gattatgttc tttgcgccac tgatcattaa tcgcatcggt gggaaaaacg      528
0
     ccctgctgct ggctggcact attatgtctg tacgtattat tggctcatcg ttcgccacct      534
0
     cagcgctgga agtggttatt ctgaaaacgc tgcatatgtt tgaagtaccg ttcctgctgg      540
0
     tgggctgctt taaatatatt accagccagt ttgaagtgcg tttttcagcg acgatttatc      546
0
     tggtctgttt ctgcttcttt aagcaactgg cgatgatttt tatgtctgta ctggcgggca      552
0
     atatgtatga aagcatcggt ttccagggcg cttatctggt gctgggtctg gtggcgctgg      558
0
     gcttcacctt aatttccgtg ttcacgctta gcggccccgg cccgctttcc ctgctgcgtc      564
0
     gtcaggtgaa tgaagtcgct taagcaatca atgtcggatg cggcgcgacg cttatccgac      570
0
     caacatatca taacggagtg atcgcattga acatgccaat gaccgaaaga ataagagcag      576
0
     gcaagctatt taccgatatg tgcgaaggct taccggaaaa aagacttcgt gggaaaacgt      582
0
     taatgtatga gtttaatcac tcgcatccat cagaagttga aaaaagagaa agcctgatta      588
0
     aagaaatgtt tgccacggta ggggaaaacg cctgggtaga accgcctgtc tatttctctt      594
0
     acggttccaa catccatata ggccgcaatt tttatgcaaa tttcaattta accattgtcg      600
0
     atgactacac ggtaacaatc ggtgataacg tactgattgc acccaacgtt actctttccg      606
0
     ttacgggaca ccctgtacac catgaattga gaaaaaacgg cgagatgtac tcttttccga      612
0
     taacgattgg caataacgtc tggatcggaa gtcatgtggt tattaatcca ggcgtcacca      618
0
     tcggggataa ttctgttatt ggcgcgggta gtatcgtcac aaaagacatt ccaccaaacg      624
0
     tcgtggcggc tggcgttcct tgtcgggtta ttcgcgaaat aaacgaccgg gataagcact      630
0
     attatttcaa agattataaa gttgaatcgt cagtttaaat tataaaaatt gcctgatacg      636
0
     ctgcgcttat caggcctaca agttcagcga tctacattag ccgcatccgg catgaacaaa      642
0
     gcgcaggaac aagcgtcgca tcatgcctct ttgacccaca gctgcggaaa acgtactggt      648
0
     gcaaaacgca gggttatgat catcagccca acgacgcaca gcgcatgaaa tgcccagtcc      654
0
     atcaggtaat tgccgctgat actacgcagc acgccagaaa accacggggc aagcccggcg      660
0
     atgataaaac cgattccctg cataaacgcc accagcttgc cagcaatagc cggttgcaca      666
0
     gagtgatcga gcgccagcag caaacagagc ggaaacgcgc cgcccagacc taacccacac      672
0
     accatcgccc acaataccgg caattgcatc ggcagccaga taaagccgca gaaccccacc      678
0
     agttgtaaca ccagcgccag cattaacagt ttgcgccgat cctgatggcg agccatagca      684
0
     ggcatcagca aagctcctgc ggcttgccca agcgtcatca atgccagtaa ggaaccgctg      690
0
     tactgcgcgc tggcaccaat ctcaatatag aaagcgggta accaggcaat caggctggcg      696
0
     taaccgccgt taatcagacc gaagtaaaca cccagcgtcc acgcgcgggg agtgaatacc      702
0
     acgcgaaccg gagtggttgt tgtcttgtgg gaagaggcga cctcgcgggc gctttgccac      708
0
     caccaggcaa agagcgcaac aacggcaggc agcgccacca ggcgagtgtt tgataccagg      714
0
     tttcgctatg ttgaactaac cagggcgtta tggcggcacc aagcccaccg ccgcccatca      720
0
     gagccgcgga ccacagcccc atcaccagtg gcgtgcgctg ctgaaaccgc cgtttaatca      726
0
     ccgaagcatc accgcctgaa tgatgccgat ccccacccca ccaagcagtg cgctgctaag      732
0
     cagcagcgca ctttgcgggt aaagctcacg catcaatgca ccgacggcaa tcagcaacag      738
0
     actgatggcg acactgcgac gttcgctgac atgctgatga agccagcttc cggccagcgc      744
0
     cagcccgccc atggtaacca ccggcagagc ggtcgac                               747
7
//
   
  Database entry: tembl:eclaci
  
ID   ECLACI     standard; DNA; PRO; 1113 BP.
XX
AC   V00294;
XX
SV   V00294.1
XX
DT   09-JUN-1982 (Rel. 01, Created)
DT   10-FEB-1999 (Rel. 58, Last updated, Version 2)
XX
DE   E. coli laci gene (codes for the lac repressor).
XX
KW   DNA binding protein; repressor.
XX
OS   Escherichia coli
OC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
OC   Escherichia.
XX
RN   [1]
RP   1-1113
RX   MEDLINE; 78246991.
RA   Farabaugh P.J.;
RT   "Sequence of the lacI gene";
RL   Nature 274:765-769(1978).
XX
DR   SWISS-PROT; P03023; LACI_ECOLI.
XX
CC   KST ECO.LACI
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1113
FT                   /db_xref="taxon:562"
FT                   /organism="Escherichia coli"
FT   CDS             31..1113
FT                   /db_xref="SWISS-PROT:P03023"
FT                   /note="reading frame"
FT                   /transl_table=11
FT                   /protein_id="CAA23569.1"
FT                   /translation="MKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAE
L
FT                   NYIPNRVAQQLAGKQSLLIGVATSSLALHAPSQIVAAIKSRADQLGASVVVSMVERSG
V
FT                   EACKAAVHNLLAQRVSGLIINYPLDDQDAIAVEAACTNVPALFLDVSDQTPINSIIFS
H
FT                   EDGTRLGVEHLVALGHQQIALLAGPLSSVSARLRLAGWHKYLTRNQIQPIAEREGDWS
A
FT                   MSGFQQTMQMLNEGIVPTAMLVANDQMALGAMRAITESGLRVGADISVVGYDDTEDSS
C
FT                   YIPPSTTIKQDFRLLGQTSVDRLLQLSQGQAVKGNQLLPVSLVKRKTTLAPNTQTASP
R
FT                   ALADSLMQLARQVSRLESGQ"
XX
SQ   Sequence 1113 BP; 249 A; 304 C; 322 G; 238 T; 0 other;
     ccggaagaga gtcaattcag ggtggtgaat gtgaaaccag taacgttata cgatgtcgca        6
0
     gagtatgccg gtgtctctta tcagaccgtt tcccgcgtgg tgaaccaggc cagccacgtt       12
0
     tctgcgaaaa cgcgggaaaa agtggaagcg gcgatggcgg agctgaatta cattcccaac       18
0
     cgcgtggcac aacaactggc gggcaaacag tcgttgctga ttggcgttgc cacctccagt       24
0
     ctggccctgc acgcgccgtc gcaaattgtc gcggcgatta aatctcgcgc cgatcaactg       30
0
     ggtgccagcg tggtggtgtc gatggtagaa cgaagcggcg tcgaagcctg taaagcggcg       36
0
     gtgcacaatc ttctcgcgca acgcgtcagt gggctgatca ttaactatcc gctggatgac       42
0
     caggatgcca ttgctgtgga agctgcctgc actaatgttc cggcgttatt tcttgatgtc       48
0
     tctgaccaga cacccatcaa cagtattatt ttctcccatg aagacggtac gcgactgggc       54
0
     gtggagcatc tggtcgcatt gggtcaccag caaatcgcgc tgttagcggg cccattaagt       60
0
     tctgtctcgg cgcgtctgcg tctggctggc tggcataaat atctcactcg caatcaaatt       66
0
     cagccgatag cggaacggga aggcgactgg agtgccatgt ccggttttca acaaaccatg       72
0
     caaatgctga atgagggcat cgttcccact gcgatgctgg ttgccaacga tcagatggcg       78
0
     ctgggcgcaa tgcgcgccat taccgagtcc gggctgcgcg ttggtgcgga tatctcggta       84
0
     gtgggatacg acgataccga agacagctca tgttatatcc cgccgtcaac caccatcaaa       90
0
     caggattttc gcctgctggg gcaaaccagc gtggaccgct tgctgcaact ctctcagggc       96
0
     caggcggtga agggcaatca gctgttgccc gtctcactgg tgaaaagaaa aaccaccctg      102
0
     gcgcccaata cgcaaaccgc ctctccccgc gcgttggccg attcattaat gcagctggca      108
0
     cgacaggttt cccgactgga aagcgggcag tga                                   111
3
//
   
Output file format

   An image is output to the resquested output device.
   
  Output files for usage example
  
  Graphics File: dottup.ps
  
   [dottup results]
   
Data files

   None.
   
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   0 upon successful completion.
   
Known bugs

   None.
   
See also

   Program name                          Description
   dotmatcher   Displays a thresholded dotplot of two sequences
   dotpath      Displays a non-overlapping wordmatch dotplot of two sequences
   polydot      Displays all-against-all dotplots of a set of sequences
   
   dotmatcher, by comparison, moves a window of specified length up each
   diagonal and displays a line over the window if the sum of the
   comparisons (using a substitution matrix) exceeds a threshold. It is
   slower but much more sensitive.
   
Author(s)

   Ian Longden (il  sanger.ac.uk)
   Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge,
   CB10 1SA, UK.
   
History

   Completed 24th March 1999.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
