From the OWASP AntiSamy Project page’s “What is it” section:
It’s an API that helps you make sure that clients don’t supply malicious cargo code in the HTML they supply for their profile, comments, etc., that get persisted on the server. The term “malicious code” in regards to web applications usually mean “JavaScript.” Cascading Stylesheets are only considered malicious when they invoke the JavaScript engine. However, there are many situations where “normal” HTML and CSS can be used in a malicious manner. So we take care of that too.
So as far as I understand it, it is trying to prevent Cross Site Scripting (XSS). But to be fair, the user guide is a little bit more realistic:
AntiSamy does a very good job of removing malicious HTML, CSS and JavaScript, but in security, no solution is guaranteed. AntiSamy’s success depends on the strictness of your policy file and the predictability of browsers’ behavior. AntiSamy is a sanitization framework; it is up to the user how it does its sanitization. Using AntiSamy does not guarantee filtering of all malicious code. AntiSamy simply does what is described by the policy file.
Anyway, I found a XSS that worked in the case of the web application I was testing:
<a href="http://example.com"&/onclick=alert(8)>foo</a>
Version: antisamy-1.5.2.jar with all default configuration files there are.
Browsers tested (all working): Firefox 22.0 (on Mac OSX), Safari 6.0.5 (on Mac OSX), Internet Explorer 11.0.9200 (Windows 7) and Android Browser (Android 2.2).
Disclosure timeline:
July 16th, 2013: Wrote to Arshan (maintainer) about the issue
July 16th, 2013: Response, questions about version and browser compatiblity
July 16th, 2013: Clarification about versions/browser and that I only tried getNumberOfErrors(), informed that I’m planning to release this blog post end of August
July 23rd, 2013: Some more E-Mails about similar issue that was just resolved (not the same issue), including Kristian who comitted a fix
July 25th, 2013: Kristian sent a mail, he will have a look at the getNumberOfErrors() logic before releasing an update
July 31st, 2013: Asked if there are any updates on the issue, no response
Sept 09th, 2013: Asked if there are any updates on the issue, response that it should be fixed. Requested new .jar file
Oct 21st, 2013: Tested with the newest version available for download, antisamy 1.5.3. Problem still present. Public release.
Antisamy doesn’t give any error messages and getNumberOfErrors() is 0. Although the getCleanHTML() will give back sanitised code without the XSS, people relying only on the getNumberOfErrors() method to check if an input is valid or not will have a XSS.
Btw. the included configuration file names are somehow misleading (the names include company names like ebay). Those names are made up, doesn’t mean the companies use those conifg files at all. I don’t even know if they are using the antisamy project at all.
I can’t recommend relying on that project. Proper output encoding is important and is the real XSS prevention. And validating HTML with regex is hard. Very hard. Very, very hard. Don’t.