Blog
Malicious Website / Malicious Code of the Week
09.29.2006 - 10:07 AMPrevious Posts
September 2006| 09/29/2006 | Malicious Website / Malicious Code of the Week » |
| 09/24/2006 | Web Attacker with VML being setup. » |
| 09/24/2006 | Keep an eye on NEWS stories. » |
| 09/22/2006 | MOTW: VML Payload Analysis » |
| 09/22/2006 | VML Candid Camera » |
| 09/20/2006 | Proof-of-Concept (POC) for I.E. zero-day posted. » |
| 09/19/2006 | New Internet Explorer Zero-Day being utilized. » |
| 09/16/2006 | MOTW: Downloader Analysis » |
| 09/08/2006 | MOTW: zCodec Delivers » |
| 09/01/2006 | MOTW: VMProtect Analysis » |
+ August 2006
+ July 2006
+ June 2006
+ May 2006
+ April 2006
This week we are going to shift gears away from the topic of malicious executable analysis, and address another important battlefront in the malcode wars: obfuscated HTML and Javascript. We will first talk about some simple regular-expression based approaches to unwinding the simpler types of obfuscation, and then discuss a more complicated example that requires us to directly manipulate Javascript.
What is it?
Normally, when you are viewing a web page through your browser, you can "view source" on the page to view the HTML markup directly. Most of the time, everything is human readable, meaning the document's structure is clearly visible in the page text. For example:

One of the main areas of focus when performing security research on a HTML specimen is any URLs that show up as links to other sites or resources. There are several alternate ways to literally represent these URLs in the web page, and one of them is through a process called "escaping" or "uri-encoding". The general conversion process involves taking the plain single character such as "." and converting it to its 2-character hexadecimal representation such as "2E", and then prefixing the sequence with a percent sign. So, to represent a period as its uri-encoded form, the result would be the sequence "%2E". This is commonly used to express reserved characters in a URI in a standardized way.
As you might expect, this formatting can be abused by people who wish to hide the content of an external link from casual view. Here is a fairly common example, which is to escape the entire URI:

The question is, how do we decode this? One way is to use a perl regex:
$ cat document | perl -pe 's/%([0-9A-Z]){2}/chr(hex($1))/ieg;'
The regex does a global search-and-replace on sequences of two bytes in the range 0-9 or A-Z, which are valid hexadecimal numbers, prefixed by the percent-sign. Each of these are replaced with the converted value of the backmatching sequence, which is the two-byte string we specified above in our range operator.
Conversion is accomplished through use of the "e" flag in the regex which means "extended". This allows calling perl functions directly in the replacement expression. First, the hex() function will convert the hex string, which is stored in the variable $1 by the backmatch, to a proper character-byte, and, second, the chr() function will convert that character-byte into a screen-readable character. The "i" specifies to make the matching case-insensitive and the "g" specifies to perform the replacement globally on the entire input.
Another way to do it is to use the perl URI::Escape module, which is available by default in most perl installations:
$ cat document | perl -MURI::Escape -pne '$_=uri_unescape($_)'
URI:Escape is useful as well because the inverse operation is easy. This will take the document and escape it. The "\0-\377" range is the sequence over which to perform the escaping, which is to say, all characters of the document:
$ cat document | perl -MURI::Escape -pne '$_=uri_escape($_,"\0-\377");'
Upping the Ante
The above example is effectively a simple search-and-replace on the document to unescape all trivially-escaped strings. What about more complex examples? Enter Javascript.
With Javascript it is possible to hide the entire document inside the script, including all markup tags, and even other Javascript. Because Javascript evaluates the code in the browser after delivery but before display, and that evaluation can change the state of the document, this complicates the process of safely analyzing the vulnerability.
Here is an example of a malicious document. This is the entirety of the page, and is what comes across the network to the browser. Our goal is to unwind to the final de-obfuscated result, which contains the actual exploit.

There is no human-readable HTML markup inside this page. Right now we have no idea of the structure of the page or the exploit it is trying to deliver. Using what was demonstrated in the first section on URI escaping, we can decode the first part of this script, which is between the document.write(unescape(...)) call.
Doing that, we end up with the following snippet:

So, the part inside the first Javascript call is actually more Javascript, which is written to the document. We can see that it defines the dF() function which is used to decode the rest of the payload.
Now we have the decoding function in question, and a call to that function. All that remains is to find a safe way to evaluate the Javascript code outside of the browser environment. To do this, we use the command-line "SpiderMonkey" Javascript-C interpreter available from the Mozilla project, which is the same interpreter that Firefox uses. This is the version used for this analysis:
JavaScript-C 1.5 pre-release 6a 2004-06-09
For this example, we cut-n-paste and cleanup the code into a separate file, call it example1.js. Be careful only to include javascript code in this file, no HTML markup:

Note that all browser-specific code is disabled, and all extraneous HTML is removed. The call to document.write() is replaced with a call to the core Javascript function print(), which allows us to view the results. It is run through the interpreter thusly:
$ js example1.js
With the following results:

A cursory examination of this shows exploits for several vulnerabilities, MS05-002 and MS04-013 among them, all of which are intended to deliver malicious code to the victim's system.
Conclusion
This document shows how to perform basic de-obfuscation on HTML and Javascript payload. The strategies outlined here are adaptable to other hiding strategies. Because the code is not encrypted but merely transformed through another function, we will always have the ability to decode the original data and access the delivered payload.
There are many automation strategies that can be applied to this to improve its ease-of-use and especially analysis time on new exploits. This is a ripe topic for further research.
Researcher: NJ Verenini, Websense Security Labs
Post a Comment:






