This is my first blog post and any feedback is appreciated. During my daily work I have been using w3af, the web application application attack and audit framework, extensively. What I’ve noticed is that w3af sometimes struggles with the basic task of spidering a targeted web application. And if the targeted web application can’t be properly spidered you won’t find any vulnerabilities. The reasons might differ but there is an obvious one which other tools suffer as well. What about JavaScript support in the spiders? Most of the time JavaScript is not properly or not at all supported by web spiders in open source security testing tools. Every web application uses some kind of JavaScript/Ajax functionality to work properly these days. Wouldn’t it be a necessity for spiders in security related applications to support JavaScript? I’m not talking about commercial products which certainly have some kind of JavaScript support but the open source tools certainly lack this feature. There are some open source libraries available I’d like to point out.
The Mozilla Rhino project builds the basis for the HtmlUnit project which describes itself as “GUI-Less browser for Java programs. It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc… just like you do in your normal browser.” Sounds interesting… and it is. I’ve written a basic web spider component using HtmlUnit for a larger project I am currently working on (more about this later). The great benefit is, as you’ve imagined, the JavaScript support it provides. Let’s see some basic HTML page with embedded JavaScript code (the example is a bit far fetched):
<html> <head><title>Login Form</title></head> <body> <form action="checkLogin.php" name="loginForm"> <script type="text/javascript"> function writeInputElement(inputType, inputName) { document.write('<input type="' + inputType + '" name="' + inputName + '" />'); } writeInputElement("text", "username"); writeInputElement("password", "password"); writeInputElement("submit", "loginButton"); </script> </form> </body> </html>
The preceeding snippet shows the JavaScript code for creating a dynamically HTML form. Web Application Security tools such as w3af are not able to find and work with the dynamically created input fields. Whereas HtmlUnit properly updates the HTML DOM and provides an easy to use API for accessing and modifying the embedded HTML fields. The following code shows how to parse and modify the HTML page:
public void submittingForm() throws Exception { // Creates a new browser object using a proxy server // and simulating Mozilla Firefox version 3 final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3, "http://myproxyserver", 8080); // Set proxy username and password final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider(); credentialsProvider.addProxyCredentials("proxyUsername", "myProxyPassword123"); // Get the first page final HtmlPage page1 = webClient.getPage("http://www.example.com/login.php"); // Get the form that we are dealing with and within that form, // find the submit button and the field that we want to change. final HtmlForm form = page1.getFormByName("loginForm"); final HtmlSubmitInput button = form.getInputByName("loginButton"); final HtmlTextInput textFieldUsername = form.getInputByName("username"); final HtmlPasswordInput textFieldPassword = form.getInputByName("password"); // Change the value of the text fields textFieldUsername.setText("john"); textFieldPassword.setText("gaephah6MueD"); // Now submit the form by clicking the button and get back the second page final HtmlPage page2 = button.click(); }
I’ve tested my spider with the WIVET benchmark project. From the WIVET project page: “WIVET is a benchmarking project that aims to statistically analyze web link extractors. In general, web application vulnerability scanners fall into this category.” My spider scored a 85% discovery rate whereas w3af only scored between 16% and 50% depending on the version used. I hope that there might be more JavaScript/DOM Manipulation libraries for other programming languages in the future.
Cheers, nini0