
                                   chips 
                                      
   
   
Function

   Codon usage statistics
   
Description

   chips calculates Frank Wright's Nc statistic for the effective number
   of codons used (ref 1).
   
   This is a simple measure that quantifies how far the codon usage of a
   gene departs from equal usage of synonymous codons. This measure of
   synonymous codon usage bias, the 'effective number of codons used in a
   gene', Nc, can be easily calculated from codon usage data alone, and
   is independent of gene length and amino acid (aa) composition. Nc can
   take values from 20, in the case of extreme bias where one codon is
   exclusively used for each aa, to 61 when the use of alternative
   synonymous codons is equally likely. Nc thus provides an intuitively
   meaningful measure of the extent of codon preference in a gene.
   
   The Nc statistic has problems in very short sequences (20 amino acids
   or less) which are yet to be fully resolved. They are caused by the
   need to consider amino acids which are missing in the sequence.
   
   This calculation was originally in the EGCG package as "codfish"
   (codon usage for fission yeast). As Frank Wright is a vegan, we looked
   for a meat-free name for the EMBOSS version, "chips". The official
   explanation is "Codon Heterozygosity (Inverse of) in a Protein-coding
   Sequence"
   
   If the sequence extends beyond the coding region then the start and/or
   end positions of the CDS must be provided because chips analyses
   exclusively protein coding regions.
   
Usage

   Here is a sample session with chips
   

% chips -sbeg 135 -send 1292 
Codon usage statistics
Input sequence(s): tembl:paamir
Output file [paamir.chips]: 
   
   Go to the input files for this example
   Go to the output files for this example
   
Command line arguments

   Standard (Mandatory) qualifiers:
  [-seqall]            seqall     Sequence database USA
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -cfile              codon      Codon usage table name
   -[no]sum            boolean    Sum codons over all sequences

   Associated qualifiers:

   "-seqall" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory2         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-seqall]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   [-outfile]
   (Parameter 2) Output file name Output file <sequence>.chips
   Additional (Optional) qualifiers Allowed values Default
   (none)
   Advanced (Unprompted) qualifiers Allowed values Default
   -cfile Codon usage table name Codon usage file in EMBOSS data path
   Ehum.cut
   -[no]sum Sum codons over all sequences Boolean value Yes/No Yes
   
Input file format

   A nucleic acid sequence USA.
   
  Input files for usage example
  
   'tembl:paamir' is a sequence entry in the example nucleic acid
   database 'tembl'
   
  Database entry: tembl:paamir
  
ID   PAAMIR     standard; DNA; PRO; 2167 BP.
XX
AC   X13776; M43175;
XX
SV   X13776.1
XX
DT   19-APR-1989 (Rel. 19, Created)
DT   17-FEB-1997 (Rel. 50, Last updated, Version 22)
XX
DE   Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
XX
KW   aliphatic amidase regulator; amiC gene; amiR gene.
XX
OS   Pseudomonas aeruginosa
OC   Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae; Pseudomonas
.
XX
RN   [1]
RP   1167-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (16-DEC-1988) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG
.
XX
RN   [2]
RP   1167-2167
RX   MEDLINE; 89211409.
RA   Lowe N., Rice P.M., Drew R.E.;
RT   "Nucleotide sequence of the aliphatic amidase regulator gene of Pseudomona
s
RT   aeruginosa";
RL   FEBS Lett. 246:39-43(1989).
XX
RN   [3]
RP   1-1292
RX   MEDLINE; 91317707.
RA   Wilson S., Drew R.;
RT   "Cloning and DNA seqence of amiC, a new gene regulating expression of the
RT   Pseudomonas aeruginosa aliphatic amidase, and purification of the amiC
RT   product.";
RL   J. Bacteriol. 173:4914-4921(1991).
XX
RN   [4]
RP   1-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (04-SEP-1991) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG
.
XX
DR   SWISS-PROT; P10932; AMIR_PSEAE.
DR   SWISS-PROT; P27017; AMIC_PSEAE.
DR   SWISS-PROT; Q51417; AMIS_PSEAE.


  [Part of this file has been deleted for brevity]

FT                   phenotype"
FT                   /replace=""
FT                   /gene="amiC"
FT   misc_feature    1
FT                   /note="last base of an XhoI site"
FT   misc_feature    648..653
FT                   /note="end of 658bp XhoI fragment, deletion in  pSW3 cause
s
FT                   constitutive expression of amiE"
FT   conflict        1281
FT                   /replace="g"
FT                   /citation=[3]
XX
SQ   Sequence 2167 BP; 363 A; 712 C; 730 G; 362 T; 0 other;
     ggtaccgctg gccgagcatc tgctcgatca ccaccagccg ggcgacggga actgcacgat        6
0
     ctacctggcg agcctggagc acgagcgggt tcgcttcgta cggcgctgag cgacagtcac       12
0
     aggagaggaa acggatggga tcgcaccagg agcggccgct gatcggcctg ctgttctccg       18
0
     aaaccggcgt caccgccgat atcgagcgct cgcacgcgta tggcgcattg ctcgcggtcg       24
0
     agcaactgaa ccgcgagggc ggcgtcggcg gtcgcccgat cgaaacgctg tcccaggacc       30
0
     ccggcggcga cccggaccgc tatcggctgt gcgccgagga cttcattcgc aaccgggggg       36
0
     tacggttcct cgtgggctgc tacatgtcgc acacgcgcaa ggcggtgatg ccggtggtcg       42
0
     agcgcgccga cgcgctgctc tgctacccga ccccctacga gggcttcgag tattcgccga       48
0
     acatcgtcta cggcggtccg gcgccgaacc agaacagtgc gccgctggcg gcgtacctga       54
0
     ttcgccacta cggcgagcgg gtggtgttca tcggctcgga ctacatctat ccgcgggaaa       60
0
     gcaaccatgt gatgcgccac ctgtatcgcc agcacggcgg cacggtgctc gaggaaatct       66
0
     acattccgct gtatccctcc gacgacgact tgcagcgcgc cgtcgagcgc atctaccagg       72
0
     cgcgcgccga cgtggtcttc tccaccgtgg tgggcaccgg caccgccgag ctgtatcgcg       78
0
     ccatcgcccg tcgctacggc gacggcaggc ggccgccgat cgccagcctg accaccagcg       84
0
     aggcggaggt ggcgaagatg gagagtgacg tggcagaggg gcaggtggtg gtcgcgcctt       90
0
     acttctccag catcgatacg cccgccagcc gggccttcgt ccaggcctgc catggtttct       96
0
     tcccggagaa cgcgaccatc accgcctggg ccgaggcggc ctactggcag accttgttgc      102
0
     tcggccgcgc cgcgcaggcc gcaggcaact ggcgggtgga agacgtgcag cggcacctgt      108
0
     acgacatcga catcgacgcg ccacaggggc cggtccgggt ggagcgccag aacaaccaca      114
0
     gccgcctgtc ttcgcgcatc gcggaaatcg atgcgcgcgg cgtgttccag gtccgctggc      120
0
     agtcgcccga accgattcgc cccgaccctt atgtcgtcgt gcataacctc gacgactggt      126
0
     ccgccagcat gggcggggga ccgctcccat gagcgccaac tcgctgctcg gcagcctgcg      132
0
     cgagttgcag gtgctggtcc tcaacccgcc gggggaggtc agcgacgccc tggtcttgca      138
0
     gctgatccgc atcggttgtt cggtgcgcca gtgctggccg ccgccggaag ccttcgacgt      144
0
     gccggtggac gtggtcttca ccagcatttt ccagaatggc caccacgacg agatcgctgc      150
0
     gctgctcgcc gccgggactc cgcgcactac cctggtggcg ctggtggagt acgaaagccc      156
0
     cgcggtgctc tcgcagatca tcgagctgga gtgccacggc gtgatcaccc agccgctcga      162
0
     tgcccaccgg gtgctgcctg tgctggtatc ggcgcggcgc atcagcgagg aaatggcgaa      168
0
     gctgaagcag aagaccgagc agctccagga ccgcatcgcc ggccaggccc ggatcaacca      174
0
     ggccaaggtg ttgctgatgc agcgccatgg ctgggacgag cgcgaggcgc accagcacct      180
0
     gtcgcgggaa gcgatgaagc ggcgcgagcc gatcctgaag atcgctcagg agttgctggg      186
0
     aaacgagccg tccgcctgag cgatccgggc cgaccagaac aataacaaga ggggtatcgt      192
0
     catcatgctg ggactggttc tgctgtacgt tggcgcggtg ctgtttctca atgccgtctg      198
0
     gttgctgggc aagatcagcg gtcgggaggt ggcggtgatc aacttcctgg tcggcgtgct      204
0
     gagcgcctgc gtcgcgttct acctgatctt ttccgcagca gccgggcagg gctcgctgaa      210
0
     ggccggagcg ctgaccctgc tattcgcttt tacctatctg tgggtggccg ccaaccagtt      216
0
     cctcgag                                                                216
7
//
   
Output file format

  Output files for usage example
  
  File: paamir.chips
  
# CHIPS codon usage statistics

Nc = 32.951
   
   If all codons are used, the Nc value will be 61. If only one codon is
   used for each amino acid the Nc value will be 20. Low values therefore
   indicate a strong codon bias, and high values indicate a low bias (and
   possibly a non-coding region).
   
Data files

   chips reads a codon usage file but only as a template and ignores the
   original data.
   
   The codon usage table is by default the file "CODONS/Ehum.cut" in the
   EMBOSS distribution directory.
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by EMBOSS
   environment variable EMBOSS_DATA.
   
   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
Notes

   None.
   
References

    1. Wright, F. (1990) Gene 87:23-29 "The 'effective number of codons'
       used in a gene."
       
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with a status of 0.
   
Known bugs

   None.
   
See also

   Program name                  Description
   cai          CAI codon adaptation index
   codcmp       Codon usage table comparison
   cusp         Create a codon usage table
   syco         Synonymous codon usage Gribskov statistic plot
   
Author(s)

   Alan Bleasby (ableasby  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

   1999 - Written - Alan Bleasby.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
