
                                  infoseq 
                                      
   
   
Function

   Displays some simple information about sequences
   
Description

   This is a small utility to list the sequences' USA, name, accession
   number, type (nucleic or protein), length, percentage C+G, and/or
   description.
   
   Any combination of these types of information can be easily selected
   or unselected.
   
   By default, the output file starts each line with the USA of the
   sequence being described, so the output file is a list file that can
   be manually edited and read in by any other EMBOSS program that can
   read in one or more sequence to be analysed.
   
Usage

   Here is a sample session with infoseq
   
   Display information on a sequence:
   

% infoseq tembl:paamir 
Displays some simple information about sequences

# USA             Name        Accession Type Length      %GC   Description
tembl-id:PAAMIR   PAAMIR        X13776  N    2167        66.54 Pseudomonas aeru
ginosa amiC and amiR gene for aliphatic amidase regulation
   
   Go to the input files for this example
   
   Example 2
   
   Don't display the USA of a sequence:
   

% infoseq tembl:paamir -nousa 
Displays some simple information about sequences

# Name        Accession Type Length      %GC   Description
PAAMIR        X13776    N    2167        66.54 Pseudomonas aeruginosa amiC and
amiR gene for aliphatic amidase regulation
   
   Example 3
   
   Display only the name and length of a sequence:
   

% infoseq tembl:paamir -only -name -length 
Displays some simple information about sequences

PAAMIR        2167
   
   Example 4
   
   Display only the description of a sequence:
   

% infoseq tembl:paamir -only -desc 
Displays some simple information about sequences

Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
   
   Example 5
   
   Display the type of a sequence:
   

% infoseq tembl:paamir -only -type 
Displays some simple information about sequences

N
   
   Example 6
   
   Display information formatted with HTML:
   

% infoseq tembl:paamir -html 
Displays some simple information about sequences


 USA Name Accession Type Length %GC Description

 tembl-id:PAAMIR PAAMIR X13776 N 2167 66.54  Pseudomonas aeruginosa amiC and am
iR gene for aliphatic amidase regulation



Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence database USA

   Additional (Optional) qualifiers:
   -outfile            outfile    If you enter the name of a file here then
                                  this program will write the sequence details
                                  into that file.
   -html               boolean    Format output as an HTML table

   Advanced (Unprompted) qualifiers:
   -only               boolean    This is a way of shortening the command line
                                  if you only want a few things to be
                                  displayed. Instead of specifying:
                                  '-nohead -noname -noacc -notype -nopgc
                                  -nodesc'
                                  to get only the length output, you can
                                  specify
                                  '-only -length'
   -heading            boolean    Display column headings
   -usa                boolean    Display the USA of the sequence
   -name               boolean    Display 'name' column
   -accession          boolean    Display 'accession' column
   -gi                 boolean    Display 'GI' column
   -version            boolean    Display 'version' column
   -type               boolean    Display 'type' column
   -length             boolean    Display 'length' column
   -pgc                boolean    Display 'percent GC content' column
   -description        boolean    Display 'description' column

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   Additional (Optional) qualifiers Allowed values Default
   -outfile If you enter the name of a file here then this program will
   write the sequence details into that file. Output file stdout
   -html Format output as an HTML table Boolean value Yes/No No
   Advanced (Unprompted) qualifiers Allowed values Default
   -only This is a way of shortening the command line if you only want a
   few things to be displayed. Instead of specifying: '-nohead -noname
   -noacc -notype -nopgc -nodesc' to get only the length output, you can
   specify '-only -length' Boolean value Yes/No No
   -heading Display column headings Boolean value Yes/No @(!$(only))
   -usa Display the USA of the sequence Boolean value Yes/No @(!$(only))
   -name Display 'name' column Boolean value Yes/No @(!$(only))
   -accession Display 'accession' column Boolean value Yes/No @(!$(only))
   -gi Display 'GI' column Boolean value Yes/No No
   -version Display 'version' column Boolean value Yes/No No
   -type Display 'type' column Boolean value Yes/No @(!$(only))
   -length Display 'length' column Boolean value Yes/No @(!$(only))
   -pgc Display 'percent GC content' column Boolean value Yes/No
   @(!$(only))
   -description Display 'description' column Boolean value Yes/No
   @(!$(only))
   
Input file format

   infoseq reads any sequence USAs.
   
  Input files for usage example
  
   'tembl:paamir' is a sequence entry in the example nucleic acid
   database 'tembl'
   
  Database entry: tembl:paamir
  
ID   PAAMIR     standard; DNA; PRO; 2167 BP.
XX
AC   X13776; M43175;
XX
SV   X13776.1
XX
DT   19-APR-1989 (Rel. 19, Created)
DT   17-FEB-1997 (Rel. 50, Last updated, Version 22)
XX
DE   Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
XX
KW   aliphatic amidase regulator; amiC gene; amiR gene.
XX
OS   Pseudomonas aeruginosa
OC   Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae; Pseudomonas
.
XX
RN   [1]
RP   1167-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (16-DEC-1988) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG
.
XX
RN   [2]
RP   1167-2167
RX   MEDLINE; 89211409.
RA   Lowe N., Rice P.M., Drew R.E.;
RT   "Nucleotide sequence of the aliphatic amidase regulator gene of Pseudomona
s
RT   aeruginosa";
RL   FEBS Lett. 246:39-43(1989).
XX
RN   [3]
RP   1-1292
RX   MEDLINE; 91317707.
RA   Wilson S., Drew R.;
RT   "Cloning and DNA seqence of amiC, a new gene regulating expression of the
RT   Pseudomonas aeruginosa aliphatic amidase, and purification of the amiC
RT   product.";
RL   J. Bacteriol. 173:4914-4921(1991).
XX
RN   [4]
RP   1-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (04-SEP-1991) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG
.
XX
DR   SWISS-PROT; P10932; AMIR_PSEAE.
DR   SWISS-PROT; P27017; AMIC_PSEAE.
DR   SWISS-PROT; Q51417; AMIS_PSEAE.


  [Part of this file has been deleted for brevity]

FT                   phenotype"
FT                   /replace=""
FT                   /gene="amiC"
FT   misc_feature    1
FT                   /note="last base of an XhoI site"
FT   misc_feature    648..653
FT                   /note="end of 658bp XhoI fragment, deletion in  pSW3 cause
s
FT                   constitutive expression of amiE"
FT   conflict        1281
FT                   /replace="g"
FT                   /citation=[3]
XX
SQ   Sequence 2167 BP; 363 A; 712 C; 730 G; 362 T; 0 other;
     ggtaccgctg gccgagcatc tgctcgatca ccaccagccg ggcgacggga actgcacgat        6
0
     ctacctggcg agcctggagc acgagcgggt tcgcttcgta cggcgctgag cgacagtcac       12
0
     aggagaggaa acggatggga tcgcaccagg agcggccgct gatcggcctg ctgttctccg       18
0
     aaaccggcgt caccgccgat atcgagcgct cgcacgcgta tggcgcattg ctcgcggtcg       24
0
     agcaactgaa ccgcgagggc ggcgtcggcg gtcgcccgat cgaaacgctg tcccaggacc       30
0
     ccggcggcga cccggaccgc tatcggctgt gcgccgagga cttcattcgc aaccgggggg       36
0
     tacggttcct cgtgggctgc tacatgtcgc acacgcgcaa ggcggtgatg ccggtggtcg       42
0
     agcgcgccga cgcgctgctc tgctacccga ccccctacga gggcttcgag tattcgccga       48
0
     acatcgtcta cggcggtccg gcgccgaacc agaacagtgc gccgctggcg gcgtacctga       54
0
     ttcgccacta cggcgagcgg gtggtgttca tcggctcgga ctacatctat ccgcgggaaa       60
0
     gcaaccatgt gatgcgccac ctgtatcgcc agcacggcgg cacggtgctc gaggaaatct       66
0
     acattccgct gtatccctcc gacgacgact tgcagcgcgc cgtcgagcgc atctaccagg       72
0
     cgcgcgccga cgtggtcttc tccaccgtgg tgggcaccgg caccgccgag ctgtatcgcg       78
0
     ccatcgcccg tcgctacggc gacggcaggc ggccgccgat cgccagcctg accaccagcg       84
0
     aggcggaggt ggcgaagatg gagagtgacg tggcagaggg gcaggtggtg gtcgcgcctt       90
0
     acttctccag catcgatacg cccgccagcc gggccttcgt ccaggcctgc catggtttct       96
0
     tcccggagaa cgcgaccatc accgcctggg ccgaggcggc ctactggcag accttgttgc      102
0
     tcggccgcgc cgcgcaggcc gcaggcaact ggcgggtgga agacgtgcag cggcacctgt      108
0
     acgacatcga catcgacgcg ccacaggggc cggtccgggt ggagcgccag aacaaccaca      114
0
     gccgcctgtc ttcgcgcatc gcggaaatcg atgcgcgcgg cgtgttccag gtccgctggc      120
0
     agtcgcccga accgattcgc cccgaccctt atgtcgtcgt gcataacctc gacgactggt      126
0
     ccgccagcat gggcggggga ccgctcccat gagcgccaac tcgctgctcg gcagcctgcg      132
0
     cgagttgcag gtgctggtcc tcaacccgcc gggggaggtc agcgacgccc tggtcttgca      138
0
     gctgatccgc atcggttgtt cggtgcgcca gtgctggccg ccgccggaag ccttcgacgt      144
0
     gccggtggac gtggtcttca ccagcatttt ccagaatggc caccacgacg agatcgctgc      150
0
     gctgctcgcc gccgggactc cgcgcactac cctggtggcg ctggtggagt acgaaagccc      156
0
     cgcggtgctc tcgcagatca tcgagctgga gtgccacggc gtgatcaccc agccgctcga      162
0
     tgcccaccgg gtgctgcctg tgctggtatc ggcgcggcgc atcagcgagg aaatggcgaa      168
0
     gctgaagcag aagaccgagc agctccagga ccgcatcgcc ggccaggccc ggatcaacca      174
0
     ggccaaggtg ttgctgatgc agcgccatgg ctgggacgag cgcgaggcgc accagcacct      180
0
     gtcgcgggaa gcgatgaagc ggcgcgagcc gatcctgaag atcgctcagg agttgctggg      186
0
     aaacgagccg tccgcctgag cgatccgggc cgaccagaac aataacaaga ggggtatcgt      192
0
     catcatgctg ggactggttc tgctgtacgt tggcgcggtg ctgtttctca atgccgtctg      198
0
     gttgctgggc aagatcagcg gtcgggaggt ggcggtgatc aacttcctgg tcggcgtgct      204
0
     gagcgcctgc gtcgcgttct acctgatctt ttccgcagca gccgggcagg gctcgctgaa      210
0
     ggccggagcg ctgaccctgc tattcgcttt tacctatctg tgggtggccg ccaaccagtt      216
0
     cctcgag                                                                216
7
//
   
Output file format

   The output is displayed on the screen (stdout) by default.
   
   The first non-blank line is the heading. This is followed by one line
   per sequence containing the following columns of data separated by one
   of more space or TAB characters:
     * The USA (Uniform Sequence Address) that EMBOSS can use to read in
       the sequence.
     * The name or ID of the sequence. If this is not known then '-' is
       output.
     * The accession number. If this is not known then '-' is output.
     * The type ('N' is nucleic, 'P' is protein).
     * The sequence length.
     * The description line of the sequence. This may be blank.
       
   If qualifiers to inhibit various columns of information are used, then
   the remaining columns of information are output in the same order as
   shown above, so if '-nolength' is used, the order of output is: usa,
   name, accession, type, description.
   
   When the -html qualifier is specified, then the output will be wrapped
   in HTML tags, ready for inclusion in a Web page. Note that tags such
   as <HTML> and <BODY> are not output by this program as the table of
   databases is expected to form only part of the contents of a web page
   - the rest of the web page must be supplier by the user.
   
   The lines of out information are guaranteed not to have trailing
   white-space at the end.
   
Data files

   None.
   
Notes

   This program was written to make it easier to get some specific bits
   of information on a sequence for use in small perl scripts. This Perl
   code fragment to get the type of a sequence is typical:
$type = `$PATH_TO_EMBOSS/infoseq $sequence -auto -only -type`;
chomp $type;

   You may find other uses for it, of course.
   
   By default, the output file starts each line with the USA of the
   sequence being described, so the output file is a list file that can
   be manually edited and read in by other EMBOSS programs using the
   list-file specification of '@filename'.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0
   
Known bugs

   None noted.
   
See also

   Program name Description
   infoalign Information on a multiple sequence alignment
   seealso Finds programs sharing group names
   showdb Displays information on the currently available databases
   textsearch Search sequence documentation text. SRS and Entrez are
   faster!
   tfm Displays a program's help documentation manual
   whichdb Search all databases for an entry
   wossname Finds programs by keywords in their one-line documentation
   
     * geecee - Calculates the fractional GC content of a nucleic acid
       sequence
       
Author(s)

   Gary Williams (gwilliam  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
