History and bug fixes of PAML (Phylogenetic Analysis by Maximum Likelihood)
Ziheng Yang

Please report bugs to Ziheng Yang at z.yang@ucl.ac.uk.





Version 3.14beta2, November 2003

(1) baseml: Models TN93 and REV are wrong when used with Mgene = 3 or 4.
This seems correct in version 3.12 but went wrong in 3.13 and 3.14
since I inserted TN92 between HKY85 and TN93.  Thanks for Lee Bofkin
for reporting the error.

(2) codonml (codeml for codons): When codon models are used to reconstruct
ancestral sequences (with RateAncestor = 1), the program lists
synonymous and nonsynonymous changes at each site under the heading
"Changes at sites (syn nonsyn)."  This listing is incorrect due to a
bug in the program.  This bug affects versions 3.12, 3.13 and 3.14,
and version 3.11 seems correct.  Thanks to Joe Bielawski.



Version 3.14beta, 20 July 2003

(1) baseml/codeml: rewrote likelihood clock and local clock models.
   Implemented models for combined analysis of multiple genes using
   multiple calibration points (Yang and Yoder 2003).  The option
   variable clock in the control files is now used differently from
   before.  See documentation below and also the readme and example
   files in the folder paml/examples/MouseLemurs/.

(2) codeml: added branch-site models C and D (Bielawski and Yang
   submitted).

(3) baseml/codeml: The SE's for divergence times under the clock
   models are calculated incorrectly.  This happens when you use clock
   = 1 or 2, supply fossil date to calculate absolute times, and
   request standard errors for times.  The estimates of times
   themselves are correct, but standard errors for times are wrong.
   The SEs for times and for the rate under the TipDate model (clock =
   3) are wrong as well.  The programs print out the variances after
   the +-, instead of their square roots.  This error was introduced
   in version 3.13.  Versions prior to 3.13 are correct.  I posted an
   update 3.13d to fix this bug.

(4) mcmctree: this program is decommissioned in this release.  The old
   program died and the new program is still under construction.

(5) evolver: the simulation program can now accept species names in
   the tree.  Note that the data file formats for evolver (MCbase.dat,
   MCcodon.dat, MCaa.dat) have changed and you might have to change
   your own data files.  Look at the files included in the package.

(6) Improved the iteration algorithm a little.  No change to the
   interface.

(7) The documentation in paml v3.13 from August 2002 - 12 December
   2002 had a mistake about the critical values for the newer test
   using a modified M8.  The critical values for the test are 2.71 at
   the 5% significance level and 5.41 at the 1% level, rather than
   1.95 and 3.32 as in the documentation.

(8) Edited the manual pamlDOC.pdf.




15 May 2003
paml3.13d

baseml/codeml: The SE's for times under the clock models are
calculated incorrectly.  This happens when you use clock = 1 or 2,
supply fossil date to calculate absolute times, and request standard
errors for times.  The estimates of times themselves are correct, but
standard errors for times are wrong.  The SEs for times and for the
rate under the TipDate model (clock = 3) are wrong as well.  The
programs print out the variances after the +-, instead of their square
roots.  

This error was introduced in version 3.13.  Versions prior to 3.13 are
correct.


12 December 2002 correction of documentation

The paml documentation in paml v3.13 from August 2002 - 12 December 2002
has a mistake about the critical values for the newer test using a
modified M8.

The critical values for the test are 2.71 at the 5% significance level
and 5.41 at the 1% level, rather than 1.95 and 3.32 as in the
documentation.  I made a mistake and quickly discovered it.  I thought
I corrected this a few months ago, but apparently the document
included in the paml release had this mistake.  I am sorry.  I'll
correct it now.



Version 3.13, August 2002

(1) baseml: nhomo = 1 and model = 6 (for both the one-rate and gamma-rates
    models) in version 3.12 gives incorrect results.  The error was introduced 
    in version 3.12 and is now fixed.  Sorry for the trouble and thanks to
    Joshua Scott Rest.

(2) baseml: Tamura's (1992) model is inserted between HKY85 and TN93.  
    Unfortunately this means that the model code for TN93 and REV are shifted, 
    and you have to change your control file for those two models.  I also 
    added user-specified REV or UNREST models.
    0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85, 5:T92, 6:TN93,
    7:REV, 8:UNREST, 9:REVu; 10:UNRESTu

(4) I implemented the nonhomogeneous models of Galtier and Gouy
    (1998).  The options are model = 5 (T92) and nhomo = 3 (N1) or 4
    (N2) or 5 (user).  nhomo = 5 is added for the model of Yang and
    Roberts (1995) as well.  It means that the user specifies, in the
    tree, how many sets of base frequencies (or how many GC content
    parameters in the model of Galtier and Gouy 1998) should be used
    and which branches has which sets.

(5) codeml: For pairwise amino acid sequence comparison (seqtype = 2
    and runmode = -2), the program crashes.  This is fixed now.
    Thanks to Dennis Paul Wall.  

(6) codeml: The mechanistic models of amino acid substitution (model = 6 or 7)
    are broken for quite some time due to lack of maintenance.  I have
    now fixed them.  See the file examples/mtCDNA/readme.txt.

(7) codemlsites is now removed, which is for batch run of multiple NSsites
    models.  This is now done by specifying several NSsites models in
    the control file codeml.ctl for codeml, for example: NSsites = 0 1
    7 8.

(8) baseml & codeml used to ask the user what to do if the tree has branch 
    lengths.  I have now introduced a variable fix_blength in the control 
    file that does the same thing. 
    fix_blength = -1  * 0: ignore, -1: random, 1: initial, 2: fixed



Version 3.12, March 2002

(1) baseml and codeml, Mgene=2 or 4 (different pi's) and RateAncestor
    = 1 Ancestral reconstruction does not work properly although the
    ML iteration is fine.  The output likelihood values (in the form
    "Node 20: lnL = -1047.965624") are different from the optimum
    likelihood already calculated.  The option was fine in version
    3.0e and 3.1, and was broken in version 3.11.

(2) codeml stops with an error message "pi=0" when some amino acids are 
    missing in the sequences.  Earlier versions of the program are correct.

(3) yn00 now accepts multiple data sets (specified by ndata).


Version 3.11, September 2001

(1) baseml and codeml (clock = 1 or clock = 2) sometimes print out
    ridiculous (e.g., large negative) tree lengths and branch lengths.
    I thought this was fixed in version 3.1, but it was not.
    Hopefully it is now fixed.

(2) baseml and codeml: Malpha (a separate alpha for each site
    partition) does not work with Mgene = 0.  The output will give you
    the same alpha value for each gene as you specify in the control
    file.  This is now fixed.

(3) baseml: When the REV or UNREST models (model = 6 or 7) are used
    together with Mgene = 2, 3, 4, the program outputs one Q matrix,
    when each site partition should have a different Q matrix.  The ML
    parameter estimates are correct, but I did not bother to format
    the output correctly.  The single Q matrix in the output is the
    one for the first partition.  I have added the output for the
    other matrices now.

(4) baseml and codeml.  Ancestral reconstruction was disabled for
    large trees with more than 100 or 200 sequences in previous
    versions.  It is now allowed.  On large trees, the multiplication
    of transition probabilities across branches in the tree cause the
    product to become too small to be represented in the computer.
    This is known as underflows.  I implemented a scaling algorithm to
    avoid underflows some time ago but did not bother to do it with
    the ancestral reconstruction algorithm at the time.  The algorithm
    should now work, up to the upper limit set in the program, that
    is, 1000 or 2000 sequences.



Version 3.1, June 2001

Changed the way of counting base or amino acid frequencies so that
this version of baseml or codeml might generate different results from
previous versions under the following model specifications.  

   . baseml & aaml: if you have multiple genes and the model uses the same
   frequencies in the model for the different site partitions.  

   . codonml:
   if you have ambiguity characters at all.  Ambiguity characters were
   ignored when counting frequencies in previous versions but are used now.

There is no difference if your data do not contain any alignment gaps
or ambiguity characters or if you choose to remove them before any
analysis (cleandata = 1).  The early versions used two ad hoc methods
in dealing with such ambiguity characters in different places of the
programs.  The new version uses another ad hoc method and in all
places.  I think it will be fine which ever version you use, but do
not use different versions in the same analysis.

codonml (codeml with seqtype = 1) with Mgene = 1 does not work
correctly if the sequences have alignment gaps or ambiguity
characters.  Mgene = 1 means separate analysis.  The bug causes data
for the genes except the first to be read out of frames, so that most
likely you will see a lot of messages "stop codon XXX" before the
program aborts.  This is now fixed.


codonml (codeml with seqtype = 1) with fix_alpha=0: In the final
output for the codon + gamma model, parameter alpha is not correctly
identified.  If you have fix_kappa = 0, fix_omega =0, and fix_alpha=0,
the last three numbers should be kappa, omega (dn/ds), and alpha.  The
detailed output did not identify alpha correctly, although the
likelihood value and the outputted rates and frequencies (r and f) are
correct.  I believe this is now corrected.



Version 3.0e, 14 May 2001

codonml (codeml with seqtype = 1) with model = 1: The program crashes
when you have a small tree with <= 5 branches.  This is a new error
introduced in version 3.0c.  It is now fixed.

baseml and codeml (clock = 1 or clock = 2) sometimes print out ridiculous
tree lengths and branch lengths.  

yn00 in version 3.0c ignores icode and uses the universal code only.
It was correct in vesion 3.0b.  This is fixed in version 3.0d.


Versions 3.0c-3.0d, 20 February 2001

We bought a Pentium III cluster running redhat linux 6.2.  The gcc
compiler 2.61 seems to have bugs and codeml does not run for NSsites =
7 or 8.  The program might output "points out of order" or "Det goes
to 0" error messages.  3.0d seems to be safer for these models.

codonml (codeml with seqtype = 1) with NSsites = 3 might generate a
message "error: sortwM3." and then stop with no output after the
iteration.  This is now fixed.  During the iteration, there was no
restriction on the w values, so that w0 can be greater than w1 and so
on.  At the end of the iteration, I sort the three w values so that w0
< w1 < w2 before outputting.  3.0c has a bug at sorting.  If you don't
see the error message, everything is fine.


10 November 2000

The probabilities for Joint ancestral reconstruction are not correct.
Many values are larger than 1.  Fixed in version 3.0c.



Version 3.0c, 30 October 2000

(1) Changed parameterization of clock models (clock = 1, 2, or 3), so that
the output now lists node ages, and the SEs are for node ages as well.
Node ages are measured by the expected number of nucleotide or amino
acid substitutions per site (nucleotide, amino acid, or codon) from
the node to the present time, and are proportional to the divergence
times.  In earlier versions, the output list proportions.

(2) codonml: problem with NSsites model with multiple gene data
(Mgene=1) is now fixed(?).  In the earlier version, the likelihood
calculation was o.k., but identification of sites under positive
selection was incorrect as the program attempted to list sites in
another gene when it was analyzing data for one gene.  (1) Mgene
options for codons and amino acids:



Version 3.0b, 6 October 2000

The local clock models (clock = 2) are not implemented correctly in paml3.0 or 3.0a. 

Yoder, A. D., and Z. Yang.  2000.  Estimation of primate speciation
dates using local molecular clocks.  Molecular Biology and Evolution
17 (7): 1081-1090.

I introduced a serious error before I made the models available to the
public, in ver3.0 (May 2000).  Two symptoms are noted right now: (1)
crashes (which are good as they make you realise that something is
wrong immediately) and (2) the local clock model (clock=2) having a
worse likelihood than the global clock model (clock = 1).  The results
in the paper are correct except for a typo ("37.30 - 48.48" in table 3
for the cetacean-whale divergence using the C3 calibration should be
"57.30 - 48.48").  I now include the example data sets and results
from that paper to help you prepare your own data files.

Sorry for any damages done.

Many thanks to Toby Johnson and Philip Awadalla for spotting the error.



Version 3, 4 February 2000

The program evolver may not generate sequences correctly if branch
lengths are very large, say >10.  

Scaling by nodes to avoid underflow.  When there are many sequences in
the data (say > 200 or 300), the probability of observing data at a
site can become too small to represent in the machine.  This is
because the probabilities are multiplied along branches in the tree
and a large tree has many branches.  Since version 2.0, this underflow
is dealt with by scaling, and it works with both the one-rate and
gamma-rates models.  It did not work with the NSsites models in
codonml, but probably nobody has analysed large data sets to notice
this.  Anyway, this version hopefully fixes that problem, when I was
analysing the data set of Bush et al. (1999 MBE 16:1457-1465), which
has about 350 sequences.  If you use the program on very large data
sets, watch out for anything strange and let me know.  I suppose with
the option of using supplied branch lengths, data sets of such sizes
will become manageable, and this scaling will become important.



Version 2.0h, Oct 14, 1999

Fixed the following errors, all simple errors.

The RemoveIndel routine in 2.0e and 20.f has a bug that removes most
sites in the sequence except those with all T's and C's for baseml.
The same routine in 2.0g has a different bug that removes most sites
except those will all T's and C's in the sequence for aaml.  In either
case, the results will be grossly wrong.  The bugs affect the analysis
only if you have gaps or ambiguity characters in the data and use
runmode = 1 for aaml in 2.0e and 2.0f, and runmode = 1 for codonml for
2.0g.

The newly introduced option for amino acid reconstruction by baseml
(icode) does not work.  The program aborts with an error message.
This is now fixed.




Version 2.0g, Sept 21, 1999

Fixed a bug in 2.0f that causes crash in codeml when getSE = 1.  

If the tree has branch lengths, codeml and baseml in 2.0f will simply
use the branch lengths rather than estimating them by iteration.  I
have added a question for the user to choose to ignore the branch
lengths, use them as fixed or use them as initial values for the
iteration.



Version 2.0d, June 30, 1999
Fixed some convergence problems in ML pairwise estimation of dS and 
dN, in particular, in presence of data partitions (the G option).

Fixed a bug in evolver.  The bug means that no sensible data are
generated if the tree has branch lengths smaller than 1e-7.  At some
point, I did something "clever" (taking advantage of the fact that at
such small branch lengths, no evolution would occur), which destroyed
the calculation.  



Version 2.0c, May 12, 1999
Corrected a few problems with the evolver program, and with running 
multiple data sets using baseml.  Added REV in MCbase.dat.
Thanks to John Mercer.

Note--Uncomment ndata in baseml or codeml to analyse multiple
data sets generated from evolver.



April 16, 1999

Although posted only a few days ago, paml2.0 has an error in the
program codeml, which invalidates most analyses based on codon
substitution models.  The symptom is relatively easy to spot and is
indicated by one of the parameters (say omega) not being estimated at
all and in the output you will have exactly the same number as you
specified in the control file.

Amino acid-based analysis or pairwise comparison of codon sequences
are not affected.  Use paml2.0a to redo the analysis.  Apologies for
the trouble.



version 2.0, 31 March 1999
Added back the faster calculation when there is no missing data.
Enabled the programs/analysis that broke in the temporary versions, 
such as pamp, joint reconstruction.

Renamed listtree as evolver, and added simulation options.



PAML 1.4 Temporary versions
Dec98 & Jan99 

  pamlTMP.Jan99.notes
  ===================
  Ziheng Yang
  Fri Jan 15 10:01:42 GMT 1999

  .Type make to compile the programs.  If you got error messages, see
  whether you know how to change the file Makefile.  If it does not work
  either, look at the files paml.cc, paml.gcc, etc.

  .At the suggestion of David Posada, I added ML estimation of pairwise 
  distances between amino acid sequences in the program codeml.  

  The relavant variables in the control file codeml.ctl are 

	runmode = -2 for pairwise distance calculation
     aaRatefile = jones.dat * only used for aa seqs with model=empirical(_F)
			     * dayhoff.dat, jones.dat, mtmam.dat, or your own
	  model = 3 
	  alpha = 0

  model specifies the amino acid substitution model.  You should choose
  a value from 0, 1, 2, 3.  If you want to estimate the rate matrix from
  your data, you should do that using many sequences and then move the
  estimated substitution rate matrix (in the output file mlc) into a
  file and change the variable aaRatefile.  

  Choosing alpha>0 means using gamma distance with that specified alpha
  parameter.  If you see anything strange, please let me know.

  .I have included only three programs baseml, codeml, and listtree.
  I have not tested the others and so they are not included.



  pamlTMP.Dec98.notes
  ===================
  Ziheng Yang
  Wed Dec  9 10:44:37 GMT 1998


  This temporary version of paml has involved a number of changes, and
  is quite likely to contain errors.  Please let me know if you notice
  anything strange.  Some of the options do not work for now.  Details
  follow.  I will compile the Win32 and PowerMac versions after
  everything is fixed and tested.

  . Missing data: baseml and codeml now handle missing data.  Choose
  DelMissing = 1 in the control file to remove sites with ambiguity
  characters or alignment gaps, as did previous versions.

  . Clock: The clock is now fixed.  It may still crash, but the option
  is not worse than the no-clock options.  The definitions of the node
  times or branch lengths are now different from the documentation
  (pamlDOC.html), and so check the branch lengths in the output tree
  topology.  Thanks to Jeff Thorne.

  . Nexus file formats: Paml programs now read sequence files and tree
  structure files in the Nexus format used by paup and MacClade.  Only
  the data or tree are read, and everything else in the file is ignored.

  .  Ancestral reconstruction: Marginal ancestral reconstructions are
  now done under the gamma-rates model as well as the constant-rate
  model.  For protein-coding genes, reconstructions can now be done at
  the nucleotide level with baseml (in particular, by using models that
  account for differences at the three codon positions), at the amino
  acid level with aaml, and at the codon level with aaml.  According to
  Belinda Chang, those different reconstructions tend to be similar.
  Results for ancestral reconstruction are now listed by site, and not
  by site pattern as in earlier versions.  Missing data are now handled
  in those analysis.  See instructions in this doc about baseml.ctl for
  more options.  Thanks to Belinda Chang.  I would like to take this
  opportunity to point out that the reconstructed ancestral sequences
  are not the real observed sequences and may contain errors.

  . I have disabled the option for variable rates among sites in
  combination with variable dN/dS rate ratios among lineages in codonml.
  This option has never been implemented in the program but was not
  disabled.

  . Distance matrices from codonml.  The program codonml outputs
  matrices of synonymous and nonsynonymous rates in all pairwise
  comparisons using the method of Nei & Gojobori (1986) into two files
  named DistanceNG.dN & DistanceNG.dN.  The matrices are lower-diagonal
  and can be used with programs, such as neighbor and fitch, in phylip
  (and possibly paup* too) for tree reconstruction or branch length
  estimation by distance methods.  If you choose runmode = -2 for ML
  pairwise comparison, the program will also output two matrices of ML
  estimates into two files named DistanceML.dS & DistanceML.dN, for
  synonymous and nonsynonymous rates.  Since many users seem interested
  in looking at dN/dS rate ratios among lineages, examination of the
  tree shapes indicated by branch lengths calculated from the two rates
  may be interesting although the analysis is intuitive.  Obviously, you
  should NOT name your own data files with those names; otherwise they
  will get overwritten.  If your species name has more than 10
  characters, you will have to edit the files and cut the names short
  before you can use Phylip programs.  I have decided to leave this work
  to the user.  I plan to create some other distance matrix files for
  nucleotides and amino acids as well.

  . An minor prompt error when running codonml with model = 2 (variable
  dN/dS ratios among lineages) is fixed.  The example data set
  lysozymeSmall.nuc is included for the user to duplicate my results in
  Yang (1998 Molecular Biology and Evolution 15: 568-573).  The control
  file codemlLysozyme.ctl explains in more detail how to specify the
  options.  I would like to remind you that it is not correct to use the
  option model = 1 to estimate dN/dS ratios for branches to find out
  which ratios are greater than one, and then to use model = 2 to test
  whether that ratio is significantly greater than one.  The problem
  with such an analysis is that the hypothesis being tested is derived
  from the data which you use to test the hypothesis, and as a result,
  you tend to get significantly results too often.  Strictly speaking,
  the hypothesis should be formulated before the data are analysed.

  Things that do not work in this version include

  . All parsimony-based analyses are now broken.  The program pamp is
  not included.  Programs baseml and codeml used to calculate the MP
  score for each tree before doing an ML analysis on that tree.  Now a
  -1 is used to indicate this is not possible.



Version 1.4 (UCL)
July 1998 

     1. Most of the changes are in the program codeml (aaml for amino
  acid sequences and codonml for codon sequences). A few new models of
  codon substitution are implemented (see Yang and Nielsen 1998; Nielsen
  and Yang 1998; Yang 1998).
     2. Some problems relating to ancestral sequence reconstruction (in
  baseml and codeml) are fixed. The earlier algorithm does not work
  properly if the data contain more than several sequences. The
  algorithm makes a guess at the likely character states at the interior
  nodes of the tree, and then uses those to generate reconstructions
  (joint reconstructions, see eq. 2 in Yang et al. 1995) to be
  evaluated. Sometimes this strategy misses important reconstructions,
  and as a result, the probabilities for reconstructed characters at
  specific interior nodes (marginal reconstructions; see eq. 4 in Yang
  et al. 1995) are substantially underestimated if the number of
  sequences is not very small. I have now written separate codes to do
  the marginal reconstruction, evaluating all possible character states
  for each interior node. This works also under the gamma model of
  substitution rates among sites, a feature that was not implemented
  before. The joint reconstruction works with one rate for all sites
  only and has the old problem of possibly missing important
  reconstructions. However, the posterior probabilities for those joint
  reconstructions that do get evaluated are accurate. I have added an
  option (choose RateAncestor = 2 rather than 1) for the user to specify
  the reconstruction to be evaluated. The two approaches are expected to
  produce very similar results, and since the marginal reconstruction is
  always reliable, perhaps it can always be used for data analysis.
  (Thanks to Belinda Chang.)
     3. The output file for estimated rates for sites under the
  variable-rates models is now "Rates.out" instead of "rst".
     4. I have implemented ancestral sequence reconstructions based on
  codon-substitution models (codonml). This has not been tested
  carefully, and I would appreciate your comments if you use it.
     5. A number of minor changes have been made. I have fixed several
  floating exception errors that seem to occur on DEC Alpha machines
  only.
     6. The documentation is now changed into the html format. 



Version 1.3a,b,c (UCL)
June 1998 

  1. Basemlg was put back at the request of a user.

  2. Some changes are made to codeml concerning codon-substitution
  models (Yang and Nielsen 1998 JME; Yang 1998 MBE; Nielsen and Yang
  1998 Genetics)



PAML Version 1.3: (UC Berkeley)
July 1997 

  1. Starting with this version, the program basemlg will not be
  included in the package.  This program implements the (continuous)
  gamma model of substitution rate variation among nucleotide sites
  (Yang 1993).  It involves intensive computation and has been
  superseded by the discrete-gamma model implemented in the program
  baseml.  If you want to use this program, you should use previous
  versions of paml, which can be obtained from me if nowhere else.

  2. The simulation program mcml.c is removed from the package.  This is
  now superced by the program listtree.c, which does different things
  besides simulating data sets.

  3. I have implemented the stepwise addition algorithm in the programs
  baseml and codeml.  This algorithm is faster than the
  star-decomposition algorithm implemented before, but my implementation
  is crude and inefficient.  The NNI perturbation is implemented too.
  These work without the molecular clock.

  4. Many changes are made with the program codonml (codeml with
  seqtype=1).  A few new codon-based models are implemented.  One
  assumes different nonsynonymous/synonymous (dN/dS) rate ratios among
  branches in the phylogeny, and may be useful for detecting episodic
  positive selection.  Another allows the dN/dS ratio to vary among
  sites and may be useful for testing neutrality and detecting
  positively-selected sites in the sequence.  These are based on my
  collaborative work with Rasmus Nielsen.  I made some changes to the
  output of estimates of dS and dN in pairwise comparisons (codonml with
  runmode=-2), so that the results are easy to understand.  Another
  change with this program is in the codon-substitution model of Goldman
  and Yang (1994).  We used the amino acid distance matrix of Grantham
  (1974) to modify the rate of nonsynonymous substitutions, but
  unfortunately the model does not fit data well.  It does not always
  fit the data better than if we simply assume equal distance between
  any two amino acids.  So I have added the option of ignoring the amino
  acid differences.  Goldman and Yang's original formula is still
  available but not recommended for use.  As a result of these changes,
  the formulation of the Goldman and Yang model is now somewhat
  different from the paper and previous versions of the program.  See
  Section 6.2 for details.

  5. I have again made some changes with the user interface, based on my
  guess of what the "user" would like to see.



Version 1.1a-d: (UC Berkeley)
1995-1996 

1.1a No records of changes remain.

1.1b November, 1995:

The following line at the beginning of the file tools.h 
		#define __cplusplus
is removed.  I added this line to avoid some (unjustified) warning
messages from a compiler I used to use and forgot to remove it when I
was preparing paml1.1.  This line made some compilers unable to
compile.

A bug that caused baseml and basemlg to give wrong estimates of kappa
under the F84 model is now fixed.  Estimates of kappa under the F84
models listed in the paml 1.0 and 1.1 documents are all incorrect and
are a bit too small.  For example, the estimate of kappa under the F84
model for the mtDNA data of Brown et al. (1982) should be 4.344 while
the wrong estimate given in the large table of the document is 3.420.
Likelihood value, branch lengths and other parameters seem to have
been correctly calculated by paml 1.0 and 1.1.  Versions prior to 1.0
do not have this bug, and results in Yang (J. Mol. Evol. 1994,
39:306-314) and Yang, Lauder, and Lin (J. Mol. Evol. 1995,
41:587-596), where the F84 model was used, are correct.

1.1c March 1996 

Representation of tree topologies using branches (see the document)
are now allowed (restored) in the tree structure file.  This is for
coping with cases where some taxa are ancestors of others.  If this
representation is used, the line starts with a dollar sign.  So the
following line in the file trees.5s (the tree file for 5 species)

$ 5      6 3  6 4  6 5  3 1  3 2

represents the tree ((12)34), with 5 to be the ancestor of 1 and 2.

Alignment gaps are now allowed when option G is used (for multiple
genes).  The gaps are removed before analysis.

A small program listtree lists all the bifurcating trees for a given
(small) number of species.

codonml (codeml.c with seqtype=1) now has an option for calculating
the number of synonymous substitutions per synonymous site (Ks) and
the number of nonsynonymous substitutions per nonsynonymous site (Ka)
and their ratio for each pairwise comparison, using the method of
Goldman and Yang (1994).  This is implemented with
runmode=-2. Estimates of Ks and Ka are collected in the file rst and
estimates of model parameters in the file mlc.  This option produces
table 2 in the Goldman & Yang paper.

The interior nodes were incorrectly identified when the ancestral
sequences were listed in the file rst.  They were all one less than
their correct numbers.  Nodes 7, 8, 9, 10 for the stewart.aa data were
incorrectly identified as nodes 6, 7, 8, 9.  This is corrected now.

1.1d. July 1996

An unintended version of codeml.c was included in version 1.1c, which
causes the program to produce different results under the codon-based
model of Goldman and Yang (1994).  The small number in baseml.c was
set too small, making the program do unnecessary iterations; this is
fixed now.  I also changed the program pamp so that it will estimate
the substitution rate matrix without using a tree.

I have included a few other genetic code tables in this version of
codeml; see the file codeml.ctl.


PAML Version 1.2: November 1996 (UC Berkeley)

  1. I have done some changes with the user interface.  Name of files
(sequence data file, control file, tree structure file) are now
specified in the control files (The default are baseml.ctl,
codeml.ctl, pamp.ctl, mcmctree.ctl).  The command line for executing a
program is now ProgramName ControlFileName with ControlFileName
optional.  I have added some text in the output file of baseml so that
the results are eassier to understand.  Choose verbose=0 in the
control file if you do not want the detailed output.

  2. A new program mcmctree is included, which performs Bayesian
estimation of phylogenies using Markov chain Monte Carlo methods
(Rannala and Yang 1996; Yang and Rannala 1997).

  3. A number of minor changes are made, such as using species names
in the parenthesis tree representation.



July 1995 
Version 1.1: (IMEG PennState)

  Models for analyzing data of multiple genes (Yang 1996 JME) are
  implemented in baseml and codeml.  Two more programs are included:
  pamp for parsimony-based analysis implementing the methods of Yang and
  Kumar (1996 MBE) and mcml for Monte Carlo simulation (prepared for
  Arndt von Haeseler but apparently not used).


February 1995 
Version 1.0 (IMEG, PennState)

  This is the first official version of paml.  Three main programs are
  included: baseml, basemlg and codeml.  Control files are used for all
  of them.  The program codeml is formed by merging two programs codonml
  and aaml, for codon sequences and amino acid sequences, respectively.
  One rate, gamma rates, and auto-discrete-gamma rates for sites are
  implemented for nucleotide, amino acid, and codon-based models.  The
  document is provided as postscript files (paml1_3.ps).


October 1994 
Version 0.12, no name for package (BAU and IMEG PennState) 

  The bracket notation of tree topologies is introduced.  Tree
  search by star decomposition is implemented in baseml and codonml.
  The options for baseml are moved into an option file baseml.ctl.  The
  molecular clock seems to work with the automatic tree search model
  (runmode=1 or 2).  The old readme file is renamed manual.


March 1994 
Version 0.10, no name for package: (NHM, England)

  Programs tripml and comptree are renamed codonml and rell.  The readme
  file is expanded and the numerical optimmization routine, ming1, used
  by dnaml and codonml, is improved.  The discrete-gamma model of Yang
  (1994 JME) is implemented in both baseml and codonml.

  Preliminary version 0.11, no name for package: April 1994 (NHM,
  England) Programs dnaml and dnamls are renamed baseml and basemlg to
  avoid confusion with Joe Felsenstein's programs.  The
  auto-discrete-gamma model and the nonparametric models of rate
  variation and correlation mong sites of Yang (1995 Genetics) are added
  in baseml.  Nonhomogeneous-process models of Yang and Roberts (1995
  MBE) are implemented in baseml.


April 1993 
Version 0.01, no name for package (Cambridge).  Three programs are included in 
the package:

  1. dnamls implements the model of gamma rates among sites of Yang
  (1993) and works with arbitrary topologies and substitution models
  (JC69, K80, F81, HKY85).

  2. dnaml implements the model of a single rate for all sites. (The
  same name as Joe Felsenstein's program was used.)

  3. tripml implements the codon-based model of Goldman and Yang (1994).
  Trees are represented by the "branch notation".

// end of file
