XMLPPM 0.96 README 

James Cheney 5/10/2001

ABOUT XMLPPM

This directory contains version 0.96 of XMLPPM, an XML-specific compressor.
XMLPPM reads well-formed XML text from standard input, compresses
it, and sends the compressed bits to standard output.  The companion
decompressor, XMLUNPPM, restores the text version of the XML data from
the compressed bits.  (Actually, the restored version might be slightly
different, for example, some whitespace might be stripped).

XMLPPM is *experimental*.  I do *not* recommend that you use XMLPPM to
archive important files, as XMLPPM is not fully tested and future 
versions of XMLPPM may not be compatible with this initial version.
This version is being made available for research purposes.



COPYRIGHT and LICENSE TERMS

Portions of the XMLPPM source code are based on Alistair Moffat's
arithmetic coding sources, Bill Teahan's sources for the PPMD+ text
compressor, and Dmitri Shkarin's sources for PPMDG.  This code is used
and placed under the GPL with permission.  Those files are copyright
their respective authors as described in the source files.  The rest of
the source code is copyright James Cheney, November 2000.

This code is covered by the Gnu Public License.


COMPILING XMLPPM

This is the XMLPPM source code distribution, so to use XMLPPM you need
to compile the sources.  XMLPPM uses version 1.95 of the "expat" XML
parser, and so you need to get and install the development version of that
parser before you can compile XMLPPM.  In the future, if there is demand,
I may make statically linked binaries available for selected platforms.

Expat (and the installation instructions whereof) is available at:
http://expat.sourceforge.net/.  You need both the shared library
and the headers to compile XMLPPM.  You can also get these as RPMs from
http://www.rpmfind.net, by searching for "expat" and "expat-devel".

Once you have installed expat, go to the src subdirectory (or wherever you
installed the XMLPPM sources) and do:

make all

This should create two binary files, xmlppm and xmlunppm.

Because XMLPPM is still relatively untested, I don't recommend
performing further installation steps like putting xmlppm in /usr/bin, 
because then other users of your machine might think it's a "real"
(i.e. fully tested) utility.

USING XMLPPM

XMLPPM and its companion decompressor XMLUNPPM are command-line driven.
Also, XMLPPM only reads and compresses XML text files.  What counts as
an XML text file actually depends on the underlying XML parser, expat;
if expat does not know how to parse a document, XMLPPM will print expat's
error message and quit.  If XMLPPM spits out an XML parsing error and
won't compress your (well-formed) document, it's more likely a problem
in expat, not in XMLPPM, so I may not be able to do anything about it.


Supposing you do have an XML file that expat likes, to compress it do:

./xmlppm  doc.xml  doc.xppm

You can of course call the compressed file anything you like, but I'm
planning on making xppm the depault extension (xpm already being taken).

To expand the compressed document, do:

./xmlunppm  doc.xppm  doc.new.xml

(I don't recommend that you overwrite the original document).

NEW IN VERSION 0.96

The original version, 0.95, is in the src/ subdirectory.  Expect this
to go away.  The new version, 0.96, is in the xmlppmdg/ subdirectory.
This version uses a much faster, more efficient, and more effective
implementation of the PPM algorithm, called PPMDG, by Dmitri Shkarin.
The resulting XML compressor is within a factor of 2 of the speed of
gzip, faster than bzip2, and compresses better than either (and also
better than 0.95 xmlppm).



CONTACT 

James Cheney, jcheney@cs.cornell.edu

