
                                textsearch 
                                      
   
   
Function

   Search sequence documentation text. SRS and Entrez are faster!
   
Description

   This is a small utility search for words in the description text of a
   sequence and for each match list the sequence's name and/or
   description. NB. It only searches the description line of the
   annotation, not the full annotation.
   
Usage

   Here is a sample session with textsearch
   
   Search for 'lactose':
   

% textsearch tsw:* 'lactose' 
Search sequence documentation text. SRS and Entrez are faster!
Output file [100k_rat.textsearch]: 
   
   Go to the input files for this example
   Go to the output files for this example
   
   Example 2
   
   Search for 'lactose' or 'permease' in E.coli proteins:
   

% textsearch tsw:*_ecoli 'lactose | permease' 
Search sequence documentation text. SRS and Entrez are faster!
Output file [laci_ecoli.textsearch]: 
   
   Go to the output files for this example
   
   Example 3
   
   Output a search for 'lacz' formatted with HTML to a file:
   
   
% textsearch tembl:* 'lacz' -html -outfile embl.lacz.html 
Search sequence documentation text. SRS and Entrez are faster!

   Go to the input files for this example
   Go to the output files for this example
   
Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-pattern]           string     The search pattern is a regular expression.
                                  Use a | to indicate OR.
                                  For example:
                                  human|mouse
                                  will find text with either 'human' OR
                                  'mouse' in the text
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers:
   -casesensitive      boolean    Do a case-sensitive search
   -html               boolean    Format output as an HTML table

   Advanced (Unprompted) qualifiers:
   -only               boolean    This is a way of shortening the command line
                                  if you only want a few things to be
                                  displayed. Instead of specifying:
                                  '-nohead -noname -nousa -noacc -nodesc'
                                  to get only the name output, you can specify
                                  '-only -name'
   -heading            boolean    Display column headings
   -usa                boolean    Display the USA of the sequence
   -accession          boolean    Display 'accession' column
   -name               boolean    Display 'name' column
   -description        boolean    Display 'description' column

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory3         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-sequence]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-pattern]
   (Parameter 2) The search pattern is a regular expression. Use a | to
   indicate OR. For example: human|mouse will find text with either
   'human' OR 'mouse' in the text Any string is accepted An empty string
   is accepted
   [-outfile]
   (Parameter 3) Output file name Output file <sequence>.textsearch
   Additional (Optional) qualifiers Allowed values Default
   -casesensitive Do a case-sensitive search Boolean value Yes/No No
   -html Format output as an HTML table Boolean value Yes/No No
   Advanced (Unprompted) qualifiers Allowed values Default
   -only This is a way of shortening the command line if you only want a
   few things to be displayed. Instead of specifying: '-nohead -noname
   -nousa -noacc -nodesc' to get only the name output, you can specify
   '-only -name' Boolean value Yes/No No
   -heading Display column headings Boolean value Yes/No @(!$(only))
   -usa Display the USA of the sequence Boolean value Yes/No @(!$(only))
   -accession Display 'accession' column Boolean value Yes/No @(!$(only))
   -name Display 'name' column Boolean value Yes/No @(!$(only))
   -description Display 'description' column Boolean value Yes/No
   @(!$(only))
   
Input file format

   textsearch reads one or more normal sequence USAs.
   
  Input files for usage example
  
   'tsw:*' is a sequence entry in the example protein database 'tsw'
   
  Input files for usage example 3
  
   'tembl:*' is a sequence entry in the example nucleic acid database
   'tembl'
   
Output file format

  Output files for usage example
  
  File: 100k_rat.textsearch
  
# Search for: lactose
tsw-id:LACI_ECOLI LACI_ECOLI    P03023  LACTOSE OPERON REPRESSOR.
tsw-id:LACY_ECOLI LACY_ECOLI    P02920  LACTOSE PERMEASE (LACTOSE-PROTON SYMPOR
T).
   
  Output files for usage example 2
  
  File: laci_ecoli.textsearch
  
# Search for: lactose | permease
tsw-id:LACI_ECOLI LACI_ECOLI    P03023  LACTOSE OPERON REPRESSOR.
tsw-id:LACY_ECOLI LACY_ECOLI    P02920  LACTOSE PERMEASE (LACTOSE-PROTON SYMPOR
T).
   
  Output files for usage example 3
  
  File: embl.lacz.html
  
   Search for: lacz
   tembl-id:ECLAC ECLAC J01636 E.coli lactose operon with lacI, lacZ,
   lacY and lacA genes.
   tembl-id:ECLACZ ECLACZ V00296 E. coli gene lacZ coding for
   beta-galactosidase (EC 3.2.1.23).
   
   The first column in the name or ID of each sequence. The remaining
   text is the description line of the sequence.
   
   When the -html qualifier is specified, then the output will be wrapped
   in HTML tags, ready for inclusion in a Web page. Note that tags such
   as <HTML>, <BODY>, </BODY> and </HTML> are not output by this program
   as the table of databases is expected to form only part of the
   contents of a web page - the rest of the web page must be supplier by
   the user.
   
   The lines of out information are guaranteed not to have trailing
   white-space at the end. So if '-nodesc' is used, there will not be any
   whitespace after the ID name.
   
Data files

   None.
   
Notes

   This is a rather slow way to search for text in databases. If you are
   searching for text in public databases, you should consider using
   either Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) or SRS
   (http://srs.hgmp.mrc.ac.uk/ or http://www.sanger.ac.uk/srs6/ etc.)
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0
   
Known bugs

   None noted.
   
See also

   Program name                          Description
   abiview      Reads ABI file and display the trace
   cirdna       Draws circular maps of DNA constructs
   infoalign    Information on a multiple sequence alignment
   infoseq      Displays some simple information about sequences
   lindna       Draws linear maps of DNA constructs
   pepnet       Displays proteins as a helical net
   pepwheel     Shows protein sequences as helices
   prettyplot   Displays aligned sequences, with colouring and boxing
   prettyseq    Output sequence with translated ranges
   remap        Display a sequence with restriction cut sites, translation etc
   seealso      Finds programs sharing group names
   showalign    Displays a multiple sequence alignment
   showdb       Displays information on the currently available databases
   showfeat     Show features of a sequence
   showseq      Display a sequence with features, translation etc
   sixpack      Display a DNA sequence with 6-frame translation and ORFs
   tfm          Displays a program's help documentation manual
   whichdb      Search all databases for an entry
   wossname     Finds programs by keywords in their one-line documentation
   
Author(s)

   Gary Williams (gwilliam  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
