Pimp up your XSLT transformation · Blog

XSLT transformations are usually quite fast and the libxslt/libxml2 combo used in PHP is one of the fastest around. But you may have experienced that quite a lot of time of a script is spent in importing the stylesheet, especially if your stylesheets are big. As this is basically always the same and one could reuse those imported stylesheets, the question pops up from time to time, why this potential reuse isn't possible across different requests. I'd say it would be technically feasible within PHP (with some shared memory like APC uses it), but the work to do this, scares me (besides a lot of potential additional issues involving such a setup).

But after having read Daniel Veillard's post about exactly that problem and why dumping compiled XSLT stylesheets do not make much sense in the libxslt environment, I came up with an idea of having some kind of an XSLT daemon written in PHP, which always runs and keeps the compiled stylesheet in its memory. The actual PHP script then communicates with this daemon, which does the transformation and returns the result. And here's the proof of concept (meaning, do not use it in a production environment :) ) with some impressive results.

As underlying library I used the nanoserv socket daemon library, which looks like providing everything I need. I decided to use the HTTP daemon as this made the client side much easier, but one could of course use a generic socket daemon. This daemons waits for HTTP requests on its port, then creates an xsltprocessor object for the requested xslt-file, if it doesn't already exists and then transforms the XML document to a new one and returns the output. In this scenario, just the filename for the xslt file is sent, but the whole serialized XML document (with a POST request). Usually, the XSLT files are quite static, but the input XML document always changes, so this made the most sense to me. One could of course adjust the script to serve other setups.

Here's the whole code for the server part. (you just start that on the command line)

On the client side we then do a curl request to the server with the needed parameters and in this example just print the output.

Here's the short code for the client part.

That's it, now to the interesting numbers, for which I used a rather extreme case. The XSLT is approx. 150kB large, the input XML really short (some 100 bytes or so). Tested with Apache's ab:

With daemon: Requests per second: 272.21 [#/sec] (mean)

Old way: Requests per second: 9.88 [#/sec] (mean)

Meaning, instead of 100ms for the whole import and transformation each and every time it took less than 4ms for the whole HTTP communication (on localhost) and transformation. Quite impressive, IMHO. But as said, the scenario is certainly not your typical setup, but even with smaller xslt file, larger input xml documents and a more refined server script (there's for example absolutely no error-checking right now), you should see a significant performance boost.

Sounds too good to be true and indeed there are some drawbacks. First and most importantly, if you have many different stylesheets, the memory consumptions may be big (as all are kept in memory), furthermore the current script doesn't check, if an xslt stylesheet did change (that'd be easy to implement) and you're in a completely different PHP context on the daemon side than on the client-side. Depending on how your XSLT stylesheets (using callback php functions for example) are written, that would affect you or not. And last but not least, I have no idea, how well that daemon scales on many concurrent requests (but running ab with -c 10 did not show any significant slowdowns)

To sum up: This little approach may significantly improve your XSLT transformation times, but the server scripts needs improvements (and testing) to be actually used in a production environment and you should be aware of the consequences with that approach (eventual massive memory consumption). But hopefully it's at least a start into a new direction.

Do you have a question, a comment, or just feeling inspired? Mention us or share this article on Mastodon, Twitter or LinkedIn.