About 3 weeks ago I got contacted by Will Drewry from the oCERT Team about a possible XSS attack vector in the popoon externalinput class, which was discovered by Alexios Fakos. It used a (again stupid) behaviour by browsers which treat a “/” like a space, so something like <body/onload=alert(/hello/)> was not cleaned up… The actual fix was quite easy and a no-brainer. But as I always said that externalinput.php alone should not be used as your own line of defense, I went for a more thorough solution.
Many issues with cleaning XSS attacks comes from the fact, that browsers accept a lot of really f***ed up markup like the example above and therefore before actually trying to clean unwanted stuff away one should make a well formed HTML out of the input. Tidy does that job pretty well, the loadHTML method of PHP’s DomDocument class also produces well-formed XHTML as output. That’s what I always told people to do and that’s what we do since always in Flux CMS. That’s why Flux CMS is not affected by this exploit at all – some of the advantages of XML based CMS systems.
To more enforce that approach, I wrote a new class called lx_externalinput_clean, which uses the same regexes but does by default filter first the input with tidy, and if that’s not available with DomDocument->loadHTML and if even that is not available, it does a striptags(), just to be sure. This can be overridden, so that if you do that cleaning already before, you don’t have to do that again. I hope this helps people just blindly copying code :)
Having said that, the class still tries to clean many nasty things without first having to tidy up the HTML to get proper markup (and the above attack is also handled without tidy et al.). But there are most certainly more attacks possible which come from WTF-a-browser-does-parse-that? markup and using those tools should get rid of them for once and all. You have been warned, if you don’t use them.
You can also test the “new” tool over at labs.liip.ch/xsscleaner and check, if you find more vulnerabilities.
Thanks a lot to the oCERT team to give me this much time ahead to fix it and to inform a lot of other projects in advance which are apparently using my code.
The full advisory can be found at http://www.ocert.org/advisories/ocert-2008-012.html
This case remains:
Grrr, the comment preview isn’t like the final one… here we go:
You’re right, that’s indeed exploitable with IE, if you don’t use tidy before.
As it gets too annoying and one in general doesn’t want people using style attributes anyway, the latest version now just strips away style tags, problem (hopefully) solved.
what about HTML Purifier?
There’s nothing wrong with HTML Purifier IMHO :)
if the blog filters mangle it… the idea here is that a few of your regular expression filters get confused when there’s a close tag located within an attribute’s value, allowing style expressions or on* event handlers to evade filtering.
$doc = new DOMDocument();
$doc->loadHTML(“<span style=”width: expression(alert(‘Ping!’));”></span>”);
still are not cleaned…
Sven? the dom extenion itself doesn’t do XSS protection, noone did claim that.
Ok, I am pretty new to using php classes, could you give me an example on how to use your script in php?
$cleandhtml = lx_externalinput_clean::basic($html);
The clean.php you get from https://svn.liip.ch/repos/public/ext/externalinput/trunk/lx/externalinput/clean.php
Thanks Chregu :)
do you have chinese course of XSS?
all is english it’s hard to study!…..