
                                  merger 
                                      
   
   
Function

   Merge two overlapping nucleic acid sequences
   
Description

   This joins two overlapping nucleic acid sequences into one merged
   sequence.
   
   It uses a global alignment algorithm (Needleman & Wunsch) to optimally
   align the sequences and then it creates the merged sequence from the
   alignment. When there is a mismatch in the alignment between the two
   sequences, the correct base to include in the resulting sequence is
   chosen by using the base from the sequence which has the best local
   sequence quality score. The following heuristic is used to find the
   sequence quality score:
   
   If one of the bases is a 'N', then the other sequence's base is used,
   else:
   
   A window size around the disputed base is used to find the local
   quality score. This window size is increased from 5, to 10 to 20 bases
   or until there is a clear decision on the best choice. If there is no
   best choice after using a window of 20, then the base in the first
   sequence is used.
   
   To calculate the quality of a window of a sequence around a base:
     * quality = sequence value/length under window either side of the
       base
     * sequence value = sum of points in that window
     * unambiguous bases (ACGTU) score 2 points
     * ambiguous bases (MRWSYKVHDB) score 1 point
     * Ns score 0 points
     * off end of the sequence scores 0 points
       
   N.B. This heavily discriminates against the iffy bits at the end of
   sequence reads.
   
   This program was originally written to aid in the reconstruction of
   mRNA sequences which had been sequenced from both ends as a 5' and 3'
   EST (cDNA). eg. joining two reads produced by primer walking
   sequencing.
   
   Care should be taken to reverse one of the sequences (e.g. using the
   qualifier '-sreverse2') if this is required to get them both in the
   correct orientation.
   
   Because it uses a Needleman & Wunsch alignment the required memory may
   be greater than the available memory when attempting to merge large
   (cosmid-sized or greater) sequences.
   
   The gap open and gap extension penalties have been set at a higher
   level than is usual (50 and 5). This was experimentally determined to
   give the best results with a set of poor quality EST test sequences.
   
Usage

   Here is a sample session with merger
   

% merger 
Merge two overlapping nucleic acid sequences
Input sequence: tembl:eclacy
Second sequence: tembl:eclaca
Output sequence [eclacy.fasta]: 
Output alignment [eclacy.out2]: 
   
   Go to the input files for this example
   Go to the output files for this example
   
   Typically, one of the sequences will need to be reverse-complemented
   to put it into the correct orientation to make it join. For example:
   
% merger file1.seq file2.seq -sreverse2 -outseq merged.seq

Command line arguments

   Standard (Mandatory) qualifiers:
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
  [-outseq]            seqout     Output sequence USA
  [-outfile]           align      Output alignment file name

   Additional (Optional) qualifiers:
   -datafile           matrixf    This is the scoring matrix file used when
                                  comparing sequences. By default it is the
                                  file 'EBLOSUM62' (for proteins) or the file
                                  'EDNAFULL' (for nucleic sequences). These
                                  files are found in the 'data' directory of
                                  the EMBOSS installation.
   -gapopen            float      Gap opening penalty
   -gapextend          float      Gap extension penalty

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2             integer    First base used
   -send2               integer    Last base used, def=seq length
   -sreverse2           boolean    Reverse (if DNA)
   -sask2               boolean    Ask for begin/end/reverse
   -snucleotide2        boolean    Sequence is nucleotide
   -sprotein2           boolean    Sequence is protein
   -slower2             boolean    Make lower case
   -supper2             boolean    Make upper case
   -sformat2            string     Input sequence format
   -sdbname2            string     Database name
   -sid2                string     Entryname
   -ufo2                string     UFO features
   -fformat2            string     Features format
   -fopenfile2          string     Features file name

   "-outseq" associated qualifiers
   -osformat3           string     Output seq format
   -osextension3        string     File name extension
   -osname3             string     Base file name
   -osdirectory3        string     Output directory
   -osdbname3           string     Database name to add
   -ossingle3           boolean    Separate file for each entry
   -oufo3               string     UFO features
   -offormat3           string     Features format
   -ofname3             string     Features file name
   -ofdirectory3        string     Output directory

   "-outfile" associated qualifiers
   -aformat4            string     Alignment format
   -aextension4         string     File name extension
   -adirectory4         string     Output directory
   -aname4              string     Base file name
   -awidth4             integer    Alignment width
   -aaccshow4           boolean    Show accession number in the header
   -adesshow4           boolean    Show description in the header
   -ausashow4           boolean    Show the full USA in the alignment
   -aglobal4            boolean    Show the full sequence in alignment

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-asequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-bsequence]
   (Parameter 2) Sequence USA Readable sequence Required
   [-outseq]
   (Parameter 3) Output sequence USA Writeable sequence <sequence>.format
   [-outfile]
   (Parameter 4) Output alignment file name Alignment output file
   Additional (Optional) qualifiers Allowed values Default
   -datafile This is the scoring matrix file used when comparing
   sequences. By default it is the file 'EBLOSUM62' (for proteins) or the
   file 'EDNAFULL' (for nucleic sequences). These files are found in the
   'data' directory of the EMBOSS installation. Comparison matrix file in
   EMBOSS data path EBLOSUM62 for protein
   EDNAFULL for DNA
   -gapopen Gap opening penalty Number from 1.000 to 100.000 50.0
   -gapextend Gap extension penalty Number from 0.100 to 10.000 5
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)
   
Input file format

   merger reads any two sequence USAs of the same type (protein or
   nucleic acid.)
   
  Input files for usage example
  
   'tembl:eclacy' is a sequence entry in the example nucleic acid
   database 'tembl'
   
  Database entry: tembl:eclacy
  
ID   ECLACY     standard; DNA; PRO; 1500 BP.
XX
AC   V00295;
XX
SV   V00295.1
XX
DT   09-JUN-1982 (Rel. 01, Created)
DT   07-JUL-1995 (Rel. 44, Last updated, Version 4)
XX
DE   E. coli lacY gene (codes for lactose permease).
XX
KW   membrane protein.
XX
OS   Escherichia coli
OC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
OC   Escherichia.
XX
RN   [1]
RP   1-1500
RX   MEDLINE; 80120651.
RA   Buechel D.E., Gronenborn B., Mueller-Hill B.;
RT   "Sequence of the lactose permease gene";
RL   Nature 283:541-545(1980).
XX
DR   SWISS-PROT; P00722; BGAL_ECOLI.
DR   SWISS-PROT; P02920; LACY_ECOLI.
DR   SWISS-PROT; P07464; THGA_ECOLI.
XX
CC   lacZ is a beta-galactosidase and lacA is transacetylase.
CC   KST ECO.LACY
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1500
FT                   /db_xref="taxon:562"
FT                   /organism="Escherichia coli"
FT   CDS             <1..54
FT                   /codon_start=1
FT                   /db_xref="SWISS-PROT:P00722"
FT                   /note="reading frame (lacZ)"
FT                   /transl_table=11
FT                   /protein_id="CAA23570.1"
FT                   /translation="FQLSAGRYHYQLVWCQK"
FT   CDS             106..1359
FT                   /db_xref="SWISS-PROT:P02920"
FT                   /note="reading frame (lacY)"
FT                   /transl_table=11
FT                   /protein_id="CAA23571.1"
FT                   /translation="MYYLKNTNFWMFGLFFFFYFFIMGAYFPFFPIWLHDINHISKSD
T
FT                   GIIFAAISLFSLLFQPLFGLLSDKLGLRKYLLWIITGMLVMFAPFFIFIFGPLLQYNI
L
FT                   VGSIVGGIYLGFCFNAGAPAVEAFIEKVSRRSNFEFGRARMFGCVGWALCASIVGIMF
T
FT                   INNQFVFWLGSGCALILAVLLFFAKTDAPSSATVANAVGANHSAFSLKLALELFRQPK
L
FT                   WFLSLYVIGVSCTYDVFDQQFANFFTSFFATGEQGTRVFGYVTTMGELLNASIMFFAP
L
FT                   IINRIGGKNALLLAGTIMSVRIIGSSFATSALEVVILKTLHMFEVPFLLVGCFKYITS
Q
FT                   FEVRFSATIYLVCFCFFKQLAMIFMSVLAGNMYESIGFQGAYLVLGLVALGFTLISVF
T
FT                   LSGPGPLSLLRRQVNEVA"
FT   CDS             1423..>1500
FT                   /db_xref="SWISS-PROT:P07464"
FT                   /note="reading frame (lacA)"
FT                   /transl_table=11
FT                   /protein_id="CAA23572.1"
FT                   /translation="MNMPMTERIRAGKLFTDMCEGLPEKR"
XX
SQ   Sequence 1500 BP; 315 A; 342 C; 357 G; 486 T; 0 other;
     ttccagctga gcgccggtcg ctaccattac cagttggtct ggtgtcaaaa ataataataa        6
0
     ccgggcaggc catgtctgcc cgtatttcgc gtaaggaaat ccattatgta ctatttaaaa       12
0
     aacacaaact tttggatgtt cggtttattc tttttctttt acttttttat catgggagcc       18
0
     tacttcccgt ttttcccgat ttggctacat gacatcaacc atatcagcaa aagtgatacg       24
0
     ggtattattt ttgccgctat ttctctgttc tcgctattat tccaaccgct gtttggtctg       30
0
     ctttctgaca aactcgggct gcgcaaatac ctgctgtgga ttattaccgg catgttagtg       36
0
     atgtttgcgc cgttctttat ttttatcttc gggccactgt tacaatacaa cattttagta       42
0
     ggatcgattg ttggtggtat ttatctaggc ttttgtttta acgccggtgc gccagcagta       48
0
     gaggcattta ttgagaaagt cagccgtcgc agtaatttcg aatttggtcg cgcgcggatg       54
0
     tttggctgtg ttggctgggc gctgtgtgcc tcgattgtcg gcatcatgtt caccatcaat       60
0
     aatcagtttg ttttctggct gggctctggc tgtgcactca tcctcgccgt tttactcttt       66
0
     ttcgccaaaa cggatgcgcc ctcttctgcc acggttgcca atgcggtagg tgccaaccat       72
0
     tcggcattta gccttaagct ggcactggaa ctgttcagac agccaaaact gtggtttttg       78
0
     tcactgtatg ttattggcgt ttcctgcacc tacgatgttt ttgaccaaca gtttgctaat       84
0
     ttctttactt cgttctttgc taccggtgaa cagggtacgc gggtatttgg ctacgtaacg       90
0
     acaatgggcg aattacttaa cgcctcgatt atgttctttg cgccactgat cattaatcgc       96
0
     atcggtggga aaaacgccct gctgctggct ggcactatta tgtctgtacg tattattggc      102
0
     tcatcgttcg ccacctcagc gctggaagtg gttattctga aaacgctgca tatgtttgaa      108
0
     gtaccgttcc tgctggtggg ctgctttaaa tatattacca gccagtttga agtgcgtttt      114
0
     tcagcgacga tttatctggt ctgtttctgc ttctttaagc aactggcgat gatttttatg      120
0
     tctgtactgg cgggcaatat gtatgaaagc atcggtttcc agggcgctta tctggtgctg      126
0
     ggtctggtgg cgctgggctt caccttaatt tccgtgttca cgcttagcgg ccccggcccg      132
0
     ctttccctgc tgcgtcgtca ggtgaatgaa gtcgcttaag caatcaatgt cggatgcggc      138
0
     gcgacgctta tccgaccaac atatcataac ggagtgatcg cattgaacat gccaatgacc      144
0
     gaaagaataa gagcaggcaa gctatttacc gatatgtgcg aaggcttacc ggaaaaaaga      150
0
//
   
  Database entry: tembl:eclaca
  
ID   ECLACA     standard; DNA; PRO; 1832 BP.
XX
AC   X51872;
XX
SV   X51872.1
XX
DT   17-APR-1990 (Rel. 23, Created)
DT   05-JUL-1999 (Rel. 60, Last updated, Version 5)
XX
DE   Escherichia coli lacA gene for thiogalactoside transacetylase
XX
KW   lac operon; lacA gene; lacY gene; thiogalactoside transacetylase.
XX
OS   Escherichia coli
OC   Bacteria; Proteobacteria; gamma subdivision; Enterobacteriaceae;
OC   Escherichia.
XX
RN   [1]
RC   (1-1832)
RP   1-1832
RX   MEDLINE; 86016712.
RA   Hediger M.A, Johnson D.F., Nierlich D.P., Zabin I.;
RT   "DNA sequence of the lactose operon: The lacA gene and the transcriptional
RT   termination region";
RL   Proc. Natl. Acad. Sci. U.S.A. 82:6414-6418(1985).
XX
DR   REMTREMBL; CAA36161; CAA36161.
DR   SWISS-PROT; P07464; THGA_ECOLI.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..1832
FT                   /db_xref="taxon:562"
FT                   /organism="Escherichia coli"
FT   CDS             <1..18
FT                   /codon_start=1
FT                   /db_xref="REMTREMBL:CAA36161"
FT                   /transl_table=11
FT                   /product="lacY gene product"
FT                   /protein_id="CAA36161.1"
FT                   /translation="VNEVA"
FT   CDS             82..693
FT                   /db_xref="SWISS-PROT:P07464"
FT                   /transl_table=11
FT                   /product="thiogalactoside transacetylase"
FT                   /gene="lacA"
FT                   /protein_id="CAA36162.1"
FT                   /translation="MNMPMTERIRAGKLFTDMCEGLPEKRLRGKTLMYEFNHSHPSEV
E
FT                   KRESLIKEMFATVGENAWVEPPVYFSYGSNIHIGRNFYANFNLTIVDDYTVTIGDNVL
I
FT                   APNVTLSVTGHPVHHELRKNGEMYSFPITIGNNVWIGSHVVINPGVTIGDNSVIGAGS
I
FT                   VTKDIPPNVVAAGVPCRVIREINDRDKHYYFKDYKVESSV"
XX
SQ   Sequence 1832 BP; 519 A; 510 C; 450 G; 353 T; 0 other;
     gtgaatgaag tcgcttaagc aatcaatgtc ggatgcggcg cgacgcttat ccgaccaaca        6
0
     tatcataacg gagtgatcgc attgaacatg ccaatgaccg aaagaataag agcaggcaag       12
0
     ctatttaccg atatgtgcga aggcttaccg gaaaaaagac ttcgtgggaa aacgttaatg       18
0
     tatgagttta atcactcgca tccatcagaa gttgaaaaaa gagaaagcct gattaaagaa       24
0
     atgtttgcca cggtagggga aaacgcctgg gtagaaccgc ctgtctattt ctcttacggt       30
0
     tccaacatcc atataggccg caatttttat gcaaatttca atttaaccat tgtcgatgac       36
0
     tacacggtaa caatcggtga taacgtactg attgcaccca acgttactct ttccgttacg       42
0
     ggacaccctg tacaccatga attgagaaaa aacggcgaga tgtactcttt tccgataacg       48
0
     attggcaata acgtctggat cggaagtcat gtggttatta atccaggcgt caccatcggg       54
0
     gataattctg ttattggcgc gggtagtatc gtcacaaaag acattccacc aaacgtcgtg       60
0
     gcggctggcg ttccttgtcg ggttattcgc gaaataaacg accgggataa gcactattat       66
0
     ttcaaagatt ataaagttga atcgtcagtt taaattataa aaattgcctg atacgctgcg       72
0
     cttatcaggc ctacaagttc agcgatctac attagccgca tccggcatga acaaagcgca       78
0
     ggaacaagcg tcgcatcatg cctctttgac ccacagctgc ggaaaacgta ctggtgcaaa       84
0
     acgcagggtt atgatcatca gcccaacgac gcacagcgca tgaaatgccc agtccatcag       90
0
     gtaattgccg ctgatactac gcagcacgcc agaaaaccac ggggcaagcc cggcgatgat       96
0
     aaaaccgatt ccctgcataa acgccaccag cttgccagca atagccggtt gcacagagtg      102
0
     atcgagcgcc agcagcaaac agagcggaaa cgcgccgccc agacctaacc cacacaccat      108
0
     cgcccacaat accggcaatt gcatcggcag ccagataaag ccgcagaacc ccaccagttg      114
0
     taacaccagc gccagcatta acagtttgcg ccgatcctga tggcgagcca tagcaggcat      120
0
     cagcaaagct cctgcggctt gcccaagcgt catcaatgcc agtaaggaac cgctgtactg      126
0
     cgcgctggca ccaatctcaa tatagaaagc gggtaaccag gcaatcaggc tggcgtaacc      132
0
     gccgttaatc agaccgaagt aaacacccag cgtccacgcg cggggagtga ataccacgcg      138
0
     aaccggagtg gttgttgtct tgtgggaaga ggcgacctcg cgggcgcttt gccaccacca      144
0
     ggcaaagagc gcaacaacgg caggcagcgc caccaggcga gtgtttgata ccaggtttcg      150
0
     ctatgttgaa ctaaccaggg cgttatggcg gcaccaagcc caccgccgcc catcagagcc      156
0
     gcggaccaca gccccatcac cagtggcgtg cgctgctgaa accgccgttt aatcaccgaa      162
0
     gcatcaccgc ctgaatgatg ccgatcccca ccccaccaag cagtgcgctg ctaagcagca      168
0
     gcgcactttg cgggtaaagc tcacgcatca atgcaccgac ggcaatcagc aacagactga      174
0
     tggcgacact gcgacgttcg ctgacatgct gatgaagcca gcttccggcc agcgccagcc      180
0
     cgcccatggt aaccaccggc agagcggtcg ac                                    183
2
//
   
Output file format

   The output sequence file contains the joined sequence, by default in
   FASTA format. Where there is a mismatch in the alignment, the chosen
   base is written to the output sequence in uppercase.
   
   The output is a standard EMBOSS alignment file.
   
   The results can be output in one of several styles by using the
   command-line qualifier -aformat xxx, where 'xxx' is replaced by the
   name of the required format. Some of the alignment formats can cope
   with an unlimited number of sequences, while others are only for pairs
   of sequences.
   
   The available multiple alignment format names are: unknown, multiple,
   simple, fasta, msf, trace, srs
   
   The available pairwise alignment format names are: pair, markx0,
   markx1, markx2, markx3, markx10, srspair, score
   
   See: http://www.uk.embnet.org/Software/EMBOSS/Themes/AlignFormats.html
   for further information on alignment formats.
   
   The output report file contains descriptions of the positions where
   there is a mismatch in the alignment and shows the alignment. Where
   there is a mismatch in the alignment, the chosen base is written in
   uppercase.
   
  Output files for usage example
  
  File: eclacy.fasta
  
>ECLACY V00295.1 E. coli lacY gene (codes for lactose permease).
ttccagctgagcgccggtcgctaccattaccagttggtctggtgtcaaaaataataataa
ccgggcaggccatgtctgcccgtatttcgcgtaaggaaatccattatgtactatttaaaa
aacacaaacttttggatgttcggtttattctttttcttttacttttttatcatgggagcc
tacttcccgtttttcccgatttggctacatgacatcaaccatatcagcaaaagtgatacg
ggtattatttttgccgctatttctctgttctcgctattattccaaccgctgtttggtctg
ctttctgacaaactcgggctgcgcaaatacctgctgtggattattaccggcatgttagtg
atgtttgcgccgttctttatttttatcttcgggccactgttacaatacaacattttagta
ggatcgattgttggtggtatttatctaggcttttgttttaacgccggtgcgccagcagta
gaggcatttattgagaaagtcagccgtcgcagtaatttcgaatttggtcgcgcgcggatg
tttggctgtgttggctgggcgctgtgtgcctcgattgtcggcatcatgttcaccatcaat
aatcagtttgttttctggctgggctctggctgtgcactcatcctcgccgttttactcttt
ttcgccaaaacggatgcgccctcttctgccacggttgccaatgcggtaggtgccaaccat
tcggcatttagccttaagctggcactggaactgttcagacagccaaaactgtggtttttg
tcactgtatgttattggcgtttcctgcacctacgatgtttttgaccaacagtttgctaat
ttctttacttcgttctttgctaccggtgaacagggtacgcgggtatttggctacgtaacg
acaatgggcgaattacttaacgcctcgattatgttctttgcgccactgatcattaatcgc
atcggtgggaaaaacgccctgctgctggctggcactattatgtctgtacgtattattggc
tcatcgttcgccacctcagcgctggaagtggttattctgaaaacgctgcatatgtttgaa
gtaccgttcctgctggtgggctgctttaaatatattaccagccagtttgaagtgcgtttt
tcagcgacgatttatctggtctgtttctgcttctttaagcaactggcgatgatttttatg
tctgtactggcgggcaatatgtatgaaagcatcggtttccagggcgcttatctggtgctg
ggtctggtggcgctgggcttcaccttaatttccgtgttcacgcttagcggccccggcccg
ctttccctgctgcgtcgtcaggtgaatgaagtcgcttaagcaatcaatgtcggatgcggc
gcgacgcttatccgaccaacatatcataacggagtgatcgcattgaacatgccaatgacc
gaaagaataagagcaggcaagctatttaccgatatgtgcgaaggcttaccggaaaaaaga
cttcgtgggaaaacgttaatgtatgagtttaatcactcgcatccatcagaagttgaaaaa
agagaaagcctgattaaagaaatgtttgccacggtaggggaaaacgcctgggtagaaccg
cctgtctatttctcttacggttccaacatccatataggccgcaatttttatgcaaatttc
aatttaaccattgtcgatgactacacggtaacaatcggtgataacgtactgattgcaccc
aacgttactctttccgttacgggacaccctgtacaccatgaattgagaaaaaacggcgag
atgtactcttttccgataacgattggcaataacgtctggatcggaagtcatgtggttatt
aatccaggcgtcaccatcggggataattctgttattggcgcgggtagtatcgtcacaaaa
gacattccaccaaacgtcgtggcggctggcgttccttgtcgggttattcgcgaaataaac
gaccgggataagcactattatttcaaagattataaagttgaatcgtcagtttaaattata
aaaattgcctgatacgctgcgcttatcaggcctacaagttcagcgatctacattagccgc
atccggcatgaacaaagcgcaggaacaagcgtcgcatcatgcctctttgacccacagctg
cggaaaacgtactggtgcaaaacgcagggttatgatcatcagcccaacgacgcacagcgc
atgaaatgcccagtccatcaggtaattgccgctgatactacgcagcacgccagaaaacca
cggggcaagcccggcgatgataaaaccgattccctgcataaacgccaccagcttgccagc
aatagccggttgcacagagtgatcgagcgccagcagcaaacagagcggaaacgcgccgcc
cagacctaacccacacaccatcgcccacaataccggcaattgcatcggcagccagataaa
gccgcagaaccccaccagttgtaacaccagcgccagcattaacagtttgcgccgatcctg
atggcgagccatagcaggcatcagcaaagctcctgcggcttgcccaagcgtcatcaatgc
cagtaaggaaccgctgtactgcgcgctggcaccaatctcaatatagaaagcgggtaacca
ggcaatcaggctggcgtaaccgccgttaatcagaccgaagtaaacacccagcgtccacgc
gcggggagtgaataccacgcgaaccggagtggttgttgtcttgtgggaagaggcgacctc
gcgggcgctttgccaccaccaggcaaagagcgcaacaacggcaggcagcgccaccaggcg
agtgtttgataccaggtttcgctatgttgaactaaccagggcgttatggcggcaccaagc
ccaccgccgcccatcagagccgcggaccacagccccatcaccagtggcgtgcgctgctga
aaccgccgtttaatcaccgaagcatcaccgcctgaatgatgccgatccccaccccaccaa
gcagtgcgctgctaagcagcagcgcactttgcgggtaaagctcacgcatcaatgcaccga
cggcaatcagcaacagactgatggcgacactgcgacgttcgctgacatgctgatgaagcc
agcttccggccagcgccagcccgcccatggtaaccaccggcagagcggtcgac
   
  File: eclacy.out2
  
########################################
# Program:  merger
# Rundate:  Thu Nov 27 15:22:29 2003
# Align_format: simple
# Report_file: eclacy.out2
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: ECLACY
# 2: ECLACA
# Matrix: EDNAFULL
# Gap_penalty: 50.0
# Extend_penalty: 5.0
#
# Length: 3173
# Identity:     159/3173 ( 5.0%)
# Similarity:   159/3173 ( 5.0%)
# Gaps:        3014/3173 (95.0%)
# Score: 795.0
#
#
#=======================================

ECLACY             1 ttccagctgagcgccggtcgctaccattaccagttggtctggtgtcaaaa     50

ECLACA             1                                                         0

ECLACY            51 ataataataaccgggcaggccatgtctgcccgtatttcgcgtaaggaaat    100

ECLACA             1                                                         0

ECLACY           101 ccattatgtactatttaaaaaacacaaacttttggatgttcggtttattc    150

ECLACA             1                                                         0

ECLACY           151 tttttcttttacttttttatcatgggagcctacttcccgtttttcccgat    200

ECLACA             1                                                         0

ECLACY           201 ttggctacatgacatcaaccatatcagcaaaagtgatacgggtattattt    250

ECLACA             1                                                         0

ECLACY           251 ttgccgctatttctctgttctcgctattattccaaccgctgtttggtctg    300

ECLACA             1                                                         0

ECLACY           301 ctttctgacaaactcgggctgcgcaaatacctgctgtggattattaccgg    350



  [Part of this file has been deleted for brevity]

ECLACY          1501                                                      1500

ECLACA          1310 ctggcgtaaccgccgttaatcagaccgaagtaaacacccagcgtccacgc   1359

ECLACY          1501                                                      1500

ECLACA          1360 gcggggagtgaataccacgcgaaccggagtggttgttgtcttgtgggaag   1409

ECLACY          1501                                                      1500

ECLACA          1410 aggcgacctcgcgggcgctttgccaccaccaggcaaagagcgcaacaacg   1459

ECLACY          1501                                                      1500

ECLACA          1460 gcaggcagcgccaccaggcgagtgtttgataccaggtttcgctatgttga   1509

ECLACY          1501                                                      1500

ECLACA          1510 actaaccagggcgttatggcggcaccaagcccaccgccgcccatcagagc   1559

ECLACY          1501                                                      1500

ECLACA          1560 cgcggaccacagccccatcaccagtggcgtgcgctgctgaaaccgccgtt   1609

ECLACY          1501                                                      1500

ECLACA          1610 taatcaccgaagcatcaccgcctgaatgatgccgatccccaccccaccaa   1659

ECLACY          1501                                                      1500

ECLACA          1660 gcagtgcgctgctaagcagcagcgcactttgcgggtaaagctcacgcatc   1709

ECLACY          1501                                                      1500

ECLACA          1710 aatgcaccgacggcaatcagcaacagactgatggcgacactgcgacgttc   1759

ECLACY          1501                                                      1500

ECLACA          1760 gctgacatgctgatgaagccagcttccggccagcgccagcccgcccatgg   1809

ECLACY          1501                           1500

ECLACA          1810 taaccaccggcagagcggtcgac   1832


#---------------------------------------
#
# ECLACY position base          ECLACA position base    Using
#
#
#---------------------------------------
   
Data files

   It reads the scoring matrix for the alignment from the standard EMBOSS
   'data' directory. By default it is the file 'EBLOSUM62' (for proteins)
   or the file 'EDNAFULL' (for nucleic sequences).
   
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It exits with a status of 0
   
Known bugs

   None.
   
See also

   Program name                    Description
   cons         Creates a consensus from multiple alignments
   megamerger   Merge two large overlapping nucleic acid sequences
   
Author(s)

   Gary Williams (gwilliam  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

   Written (Gary Williams) 1999
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
