From the OWASP AntiSamy Project page’s “What is it” section:
So as far as I understand it, it is trying to prevent Cross Site Scripting (XSS). But to be fair, the user guide is a little bit more realistic:
Anyway, I found a XSS that worked in the case of the web application I was testing:
Version: antisamy-1.5.2.jar with all default configuration files there are.
Browsers tested (all working): Firefox 22.0 (on Mac OSX), Safari 6.0.5 (on Mac OSX), Internet Explorer 11.0.9200 (Windows 7) and Android Browser (Android 2.2).
July 16th, 2013: Wrote to Arshan (maintainer) about the issue
July 16th, 2013: Response, questions about version and browser compatiblity
July 16th, 2013: Clarification about versions/browser and that I only tried getNumberOfErrors(), informed that I’m planning to release this blog post end of August
July 23rd, 2013: Some more E-Mails about similar issue that was just resolved (not the same issue), including Kristian who comitted a fix
July 25th, 2013: Kristian sent a mail, he will have a look at the getNumberOfErrors() logic before releasing an update
July 31st, 2013: Asked if there are any updates on the issue, no response
Sept 09th, 2013: Asked if there are any updates on the issue, response that it should be fixed. Requested new .jar file
Oct 21st, 2013: Tested with the newest version available for download, antisamy 1.5.3. Problem still present. Public release.
Antisamy doesn’t give any error messages and getNumberOfErrors() is 0. Although the getCleanHTML() will give back sanitised code without the XSS, people relying only on the getNumberOfErrors() method to check if an input is valid or not will have a XSS.
Btw. the included configuration file names are somehow misleading (the names include company names like ebay). Those names are made up, doesn’t mean the companies use those conifg files at all. I don’t even know if they are using the antisamy project at all.
I can’t recommend relying on that project. Proper output encoding is important and is the real XSS prevention. And validating HTML with regex is hard. Very hard. Very, very hard. Don’t.