Processing Large XML Documents with PHP 5 – Update

I just implemented a new feature into XMLReader, which makes the parsing of a large XML document mentioned in my previous post faster. The new method is called next() as opposed to read(). next() does “Skip to the node following the current one in document order while avoiding the subtree if any”.
This new method will be quite useful, as we have a flat XML structure and we’re only interested in the “entry” elements (we look for the right ID there). Therefore we only have to loop through the “entry” elements and not all elements and nodes, which means less PHP calls and more can be done in C. In the “XMLReader Next” example , we iterate with read() to the first “entry” element and then just loop through all the siblings of this entry with next(). This is approx. 4 times faster than the “read()-only” approach and even faster than with DOM. Here’s the updated chart for full document parsing.

Of course, we only can take advantage of this technique, since the document is well-structured and we know exactly what’s in the document until this level. Therefore again, your mileage may vary with other structured documents ;)

I also checked with a 100MB Document. With DOM it took 38 sec of user time and 256MB of memory, with “XMLReader next”, it took 16 sec and memory usage was still at 3.2 MB.

next() sounds good from the point of view of ease of use as well. Parsing a structured document is probably a common case and as Jeff points @ http://www.procata.com/blog/archives/2004/05/05/api-design/

“Good API design makes common things simple while leaving uncommon things possible”