NAME XML::Fast - Simple and very fast XML to hash conversion SYNOPSIS use XML::Fast; my $hash = xml2hash $xml; my $hash2 = xml2hash $xml, attr => '.', text => '~'; DESCRIPTION This module implements simple, state machine based, XML parser written in C. It could parse and recover some kind of broken XML's. If you need XML validator, use XML::LibXML RATIONALE Another similar module is XML::Bare. I've used it for some time, but it have some failures: * If your XML have node with name 'value', you'll got a segfault * If your XML have node with TextNode, then CDATANode, then again TextNode, you'll got broken value * It doesn't support charsets * It doesn't support any kind of entities. So, after count of tries to fix XML::Bare I've decided to write parser from scratch. It is about 40% faster than XML::Bare and about 120% faster, than XML::LibXML I got this results using the following test on 35kb xml doc: cmpthese timethese -10, { libxml => sub { XML::LibXML->new->parse_string($doc) }, xmlfast => sub { XML::Fast::xml2hash($doc) }, xmlbare => sub { XML::Bare->new(text => $doc)->parse }, }; Rate libxml xmlbare xmlfast libxml 1107/s -- -38% -56% xmlbare 1782/s 61% -- -28% xmlfast 2490/s 125% 40% -- Of course, the results could be defferent for different xml files. With non-utf encodings and with many entities it could be slower. This test was taken for a sample RSS feed in utf-8 mode with a small count of xml entities. Here is some features and principles: * It uses minimal count of memory allocations. * All XML is parsed in 1 scan. * All values are copied from source XML only once (to destination keys/values) * If some types of nodes (for ex comments) are ignored, there are no memory allocations/copy for them. EXPORT xml2hash $xml, [ %options ] OPTIONS order [ = 0 ] Not implemented yet. Strictly keep the output order. When enabled, structures become more complex, but xml could be completely reverted. attr [ = '-' ] Attribute prefix => { node => { -attr => "test" } } text [ = '#text' ] Key name for storing text When undef, text nodes will be ignored text => { node => { sub => '', '#text' => "test" } } join [ = '' ] Join separator for text nodes, splitted by subnodes Ignored when "order" in effect # default: xml2hash( 'Test1Test2' ) : { item => { sub => '', '~' => 'Test1Test2' } }; xml2hash( 'Test1Test2', join => '+' ) : { item => { sub => '', '~' => 'Test1+Test2' } }; trim [ = 1 ] Trim leading and trailing whitespace from text nodes cdata [ = undef ] When defined, CDATA sections will be stored under this key # cdata = undef => { node => 'test' } # cdata = '#' => { node => { '#' => 'test' } } comm [ = undef ] When defined, comments sections will be stored under this key When undef, comments will be ignored # comm = undef => { node => { sub => '' } } # comm = '/' => { node => { sub => '', '/' => 'comm' } } array => 1 Force all nodes to be kept as arrays. # no array => { node => { sub => '' } } # array = 1 => { node => [ { sub => [ '' ] } ] } array => [ 'node', 'names'] Force nodes with names to be stored as arrays # no array => { node => { sub => '' } } # array => ['sub'] => { node => { sub => [ '' ] } } SEE ALSO * XML::Bare Another fast parser, but have problems * XML::LibXML The most powerful XML parser for perl. If you don't need to parse gigabytes of XML ;) * XML::Hash::LX XML parser, that uses XML::LibXML for parsing and then constructs hash structure, identical to one, generated by this module. (At least, it should ;)). But of course it is much more slower, than XML::Fast TODO * Ordered mode (as implemented in XML::Hash::LX) * Create hash2xml, identical to one in XML::Hash::LX * Partial content event-based parsing (I need this for reading XML streams) Patches, propositions and bug reports are welcome ;) AUTHOR Mons Anderson, COPYRIGHT AND LICENSE Copyright (C) 2010 Mons Anderson This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.