mail usprint this pagerss feed

www.liip.ch

Liip is hiring!

Processing Large XML Documents with PHP 5 - Update

I just implemented a new feature into XMLReader, which makes the parsing of a large XML document mentioned in my previous post faster. The new method is called next() as opposed to read(). next() does "Skip to the node following the current one in document order while avoiding the subtree if any". This new method will be quite useful, as we have a flat XML structure and we're only interested in the "entry" elements (we look for the right ID there). Therefore we only have to loop through the "entry" elements and not all elements and nodes, which means less PHP calls and more can be done in C. In the "XMLReader Next" example , we iterate with read() to the first "entry" element and then just loop through all the siblings of this entry with next(). This is approx. 4 times faster than the "read()-only" approach and even faster than with DOM. Here's the updated chart for full document parsing.

Of course, we only can take advantage of this technique, since the document is well-structured and we know exactly what's in the document until this level. Therefore again, your mileage may vary with other structured documents ;)

I also checked with a 100MB Document. With DOM it took 38 sec of user time and 256MB of memory, with "XMLReader next", it took 16 sec and memory usage was still at 3.2 MB.
Comments (2) |  Permalink

Comments

Harry Fuecks @ 11.05.2004 03:20 CEST
next() sounds good from the point of view of ease of use as well. Parsing a structured document is probably a common case and as Jeff points @ http://www.procata.com/blog/archives/2004/05/05/api-design/

"Good API design makes common things simple while leaving uncommon things possible"
Full(o)bloG @ 17.05.2004 12:51 CEST (Trackback)
php più veloce di sax
dai test (1 - 2) effettuati da Christian pare che le nuove primitive di PHP 5.0 per la gestione dei file XML siano più performanti dell'uso di SAX e di DOM, e con un minore spreco di risorse.rnrncaiuz

add a comment

Your email adress will never be published.
Comment spam will be deleted!

For Spammers Only
Name*
E-Mail
URL
Comment*
Notify me via E-Mail when new comments are made to this entry
Remember me (needs cookies)