
                                  charge 
                                      
   
   
Function

   Protein charge plot
   
Description

   charge reads a protein sequence and writes a file (or plots a graph)
   of the charges of the amino acids within a window of specified length
   as the window is moved along the sequence.
   
Algorithm

   charge uses the column "charge" from the datafile Eamino.dat. It gives
   the residues 'D' and 'E' a charge of -1, 'K' and 'R' a charge of +1,
   and the residue 'H' a charge of +0.5. Then it calculates the mean
   charge across -window (default is 5).
   
Usage

   Here is a sample session with charge
   

% charge tsw:hbb_human 
Protein charge plot
Output file [hbb_human.charge]: 
   
   Go to the input files for this example
   Go to the output files for this example
   
Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-seqall]            seqall     Sequence database USA
*  -graph              xygraph    Graph type
*  -outfile            outfile    Output file name

   Additional (Optional) qualifiers:
   -window             integer    Window

   Advanced (Unprompted) qualifiers:
   -aadata             string     Amino acid property data file name
   -plot               boolean    Produce graphic

   Associated qualifiers:

   "-seqall" associated qualifiers
   -sbegin1             integer    First base used
   -send1               integer    Last base used, def=seq length
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-graph" associated qualifiers
   -gprompt             boolean    Graph prompting
   -gtitle              string     Graph title
   -gsubtitle           string     Graph subtitle
   -gxtitle             string     Graph x axis title
   -gytitle             string     Graph y axis title
   -goutfile            string     Output file for non interactive displays
   -gdirectory          string     Output directory

   "-outfile" associated qualifiers
   -odirectory          string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths
   

   Standard (Mandatory) qualifiers Allowed values Default
   [-seqall]
   (Parameter 1) Sequence database USA Readable sequence(s) Required
   -graph Graph type EMBOSS has a list of known devices, including
   postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows,
   x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
   png EMBOSS_GRAPHICS value, or x11
   -outfile Output file name Output file <sequence>.charge
   Additional (Optional) qualifiers Allowed values Default
   -window Window Integer 1 or more 5
   Advanced (Unprompted) qualifiers Allowed values Default
   -aadata Amino acid property data file name Any string is accepted
   Eamino.dat
   -plot Produce graphic Boolean value Yes/No No
   
Input file format

   charge reads in a protein sequence.
   
  Input files for usage example
  
   'tsw:hbb_human' is a sequence entry in the example protein database
   'tsw'
   
  Database entry: tsw:hbb_human
  
ID   HBB_HUMAN      STANDARD;      PRT;   146 AA.
AC   P02023;
DT   21-JUL-1986 (Rel. 01, Created)
DT   21-JUL-1986 (Rel. 01, Last sequence update)
DT   15-JUL-1999 (Rel. 38, Last annotation update)
DE   HEMOGLOBIN BETA CHAIN.
GN   HBB.
OS   Homo sapiens (Human), Pan troglodytes (Chimpanzee), and
OS   Pan paniscus (Pygmy chimpanzee) (Bonobo).
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
RN   [1]
RP   SEQUENCE.
RC   SPECIES=HUMAN;
RA   BRAUNITZER G., GEHRING-MULLER R., HILSCHMANN N., HILSE K., HOBOM G.,
RA   RUDLOFF V., WITTMANN-LIEBOLD B.;
RT   "The constitution of normal adult human haemoglobin.";
RL   Hoppe-Seyler's Z. Physiol. Chem. 325:283-286(1961).
RN   [2]
RP   SEQUENCE FROM N.A.
RC   SPECIES=HUMAN;
RX   MEDLINE; 81064667.
RA   LAWN R.M., EFSTRATIADIS A., O'CONNELL C., MANIATIS T.;
RT   "The nucleotide sequence of the human beta-globin gene.";
RL   Cell 21:647-651(1980).
RN   [3]
RP   SEQUENCE OF 121-146 FROM N.A.
RC   SPECIES=HUMAN;
RX   MEDLINE; 85205333.
RA   LANG K.M., SPRITZ R.A.;
RT   "Cloning specific complete polyadenylylated 3'-terminal cDNA
RT   segments.";
RL   Gene 33:191-196(1985).
RN   [4]
RP   X-RAY CRYSTALLOGRAPHY (2.5 ANGSTROMS) OF DEOXYHEMOGLOBIN.
RC   SPECIES=HUMAN;
RX   MEDLINE; 76027820.
RA   FERMI G.;
RT   "Three-dimensional fourier synthesis of human deoxyhaemoglobin at
RT   2.5-A resolution: refinement of the atomic model.";
RL   J. Mol. Biol. 97:237-256(1975).
RN   [5]
RP   SEQUENCE.
RC   SPECIES=P.TROGLODYTES;
RX   MEDLINE; 66071496.
RA   RIFKIN D.B., KONIGSBERG W.;
RT   "The characterization of the tryptic peptides from the hemoglobin of
RT   the chimpanzee (Pan troglodytes).";
RL   Biochim. Biophys. Acta 104:457-461(1965).
RN   [6]


  [Part of this file has been deleted for brevity]

FT   VARIANT     140    140       A -> T (IN ST JACQUES: O2 AFFINITY UP).
FT                                /FTId=VAR_003081.
FT   VARIANT     140    140       A -> V (IN PUTTELANGE; POLYCYTHEMIA;
FT                                O2 AFFINITY UP).
FT                                /FTId=VAR_003082.
FT   VARIANT     141    141       L -> R (IN OLMSTED; UNSTABLE).
FT                                /FTId=VAR_003083.
FT   VARIANT     142    142       A -> D (IN OHIO; O2 AFFINITY UP).
FT                                /FTId=VAR_003084.
FT   VARIANT     143    143       H -> D (IN RANCHO MIRAGE).
FT                                /FTId=VAR_003085.
FT   VARIANT     143    143       H -> Q (IN LITTLE ROCK; O2 AFFINITY UP).
FT                                /FTId=VAR_003086.
FT   VARIANT     143    143       H -> P (IN SYRACUSE; O2 AFFINITY UP).
FT                                /FTId=VAR_003087.
FT   VARIANT     143    143       H -> R (IN ABRUZZO; O2 AFFINITY UP).
FT                                /FTId=VAR_003088.
FT   VARIANT     144    144       K -> E (IN MITO; O2 AFFINITY UP).
FT                                /FTId=VAR_003089.
FT   VARIANT     145    145       Y -> C (IN RAINIER; O2 AFFINITY UP).
FT                                /FTId=VAR_003090.
FT   VARIANT     145    145       Y -> H (IN BETHESDA; O2 AFFINITY UP).
FT                                /FTId=VAR_003091.
FT   VARIANT     146    146       H -> D (IN HIROSHIMA; O2 AFFINITY UP).
FT                                /FTId=VAR_003092.
FT   VARIANT     146    146       H -> L (IN COWTOWN; O2 AFFINITY UP).
FT                                /FTId=VAR_003093.
FT   VARIANT     146    146       H -> P (IN YORK; O2 AFFINITY UP).
FT                                /FTId=VAR_003094.
FT   VARIANT     146    146       H -> Q (IN KODAIRA; O2 AFFINITY UP).
FT                                /FTId=VAR_003095.
FT   HELIX         5     15
FT   TURN         16     17
FT   HELIX        20     34
FT   HELIX        36     41
FT   HELIX        43     45
FT   HELIX        51     55
FT   TURN         56     56
FT   HELIX        58     75
FT   TURN         76     77
FT   HELIX        78     94
FT   TURN         95     96
FT   TURN        100    100
FT   HELIX       101    121
FT   HELIX       124    142
FT   TURN        143    144
SQ   SEQUENCE   146 AA;  15867 MW;  EC9744C9 CRC32;
     VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
     KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
     EFTPPVQAAY QKVVAGVANA LAHKYH
//
   
Output file format

   If the option '-plot' is specified then charge displays a graph of the
   charge along the sequence otherwise it writes out a file containing
   the charges within a window starting at each position along the
   sequence.
   
   The output file contains two columns separated by space or TAB
   characters. The first column is the position of the start of the
   window. The second column is the charge of the amino acids within that
   window.
   
  Output files for usage example
  
  File: hbb_human.charge
  
CHARGE of HBB_HUMAN from 1 to 146: window 5

Position        Charge
1               0.100
2               -0.100
3               -0.400
4               -0.200
5               -0.200
6               -0.200
7               0.000
8               0.200
9               0.000
10              0.000
11              0.000
12              0.000
13              0.200
14              0.200
15              0.200
16              0.200
17              0.000
18              -0.400
19              -0.400
20              -0.400
21              -0.400
22              -0.400
23              -0.200
24              -0.200
25              -0.200
26              0.000
27              0.200
28              0.200
29              0.200
30              0.200
31              0.000
32              0.000
33              0.000
34              0.000
35              0.000
36              0.200
37              0.200
38              0.200
39              0.000
40              0.000
41              -0.200
42              -0.200
43              -0.400
44              -0.200
45              -0.200
46              -0.200
47              -0.200


  [Part of this file has been deleted for brevity]

92              0.100
93              0.100
94              0.100
95              0.100
96              -0.100
97              -0.300
98              -0.400
99              -0.400
100             0.000
101             0.000
102             0.200
103             0.200
104             0.200
105             0.000
106             0.000
107             0.000
108             0.000
109             0.000
110             0.000
111             0.000
112             0.100
113             0.200
114             0.200
115             0.200
116             0.400
117             0.100
118             0.000
119             0.000
120             0.000
121             -0.200
122             0.000
123             0.000
124             0.000
125             0.000
126             0.000
127             0.000
128             0.200
129             0.200
130             0.200
131             0.200
132             0.200
133             0.000
134             0.000
135             0.000
136             0.000
137             0.000
138             0.000
139             0.100
140             0.300
141             0.300
142             0.400
   
Data files

   charge reads the data file 'Eamino.dat' to find the charge of the
   amino acids in the protein.
   
   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by the EMBOSS
   environment variable EMBOSS_DATA.
   
   To see the available EMBOSS data files, run:
   
% embossdata -showall

   To fetch one of the data files (for example 'Exxx.dat') into your
   current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat

   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata".
   Files for all EMBOSS runs can be put in the user's home directory, or
   again in a subdirectory called ".embossdata".
   
   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata
       
Notes

   None.
   
References

   None.
   
Warnings

   None.
   
Diagnostic Error Messages

   None.
   
Exit status

   It always exits with status 0.
   
Known bugs

   None.
   
See also

   Program name                          Description
   backtranseq  Back translate a protein sequence
   checktrans   Reports STOP codons and ORF statistics of a protein
   compseq      Counts the composition of dimer/trimer/etc words in a sequence
   emowse       Protein identification by mass spectrometry
   freak        Residue/base frequency table or plot
   iep          Calculates the isoelectric point of a protein
   mwcontam     Shows molwts that match across a set of files
   mwfilter     Filter noisy molwts from mass spec output
   octanol      Displays protein hydropathy
   pepinfo      Plots simple amino acid properties in parallel
   pepstats     Protein statistics
   pepwindow    Displays protein hydropathy
   pepwindowall Displays protein hydropathy of a set of sequences
   
Author(s)

   Alan Bleasby (ableasby  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   
History

   Written (March 2001) - Alan Bleasby.
   
Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
   
Comments
