This post serves to provide a basic methodology for identifying misconfigured web applications internally or externally. Custom web applications are a common source of leaked data and provide a significant risk when not properly monitored. This risk has grown with the shift to the cloud over the last few years. Many web applications and their associated APIs are improperly configured to listen on port 443 while not actually encrypting the data as expected.
When we discuss web applications and web-based traffic, the two primary TCP ports to consider are 80 (HTTP) and 443 (HTTPS). These are the most common ports used for web traffic but it is also common to see variations such as 8000, 8080, 8443, etc... At a high level, the primary difference between HTTP and HTTPS is how the data is transmitted across the wire. HTTP will transmit in cleartext while HTTPS will transmit encrypted data when configured properly. Today, most web traffic should use HTTPS.
Below is a high-level overview of what we expect HTTPS to look like. When configured properly we should not be able to see the contents of the payload as it transmits over the wire. For the purposes of this blog post, we will focus on typical encrypted HTTPS traffic.
Now, what if we were transmitting over port 443 but we could see the contents of the payload? In that case, we have some form of improperly configured web server. Listening on TCP 443 does not automatically mean that the data is properly encrypted in transit.
A service/protocol mismatch exists when the destination port of the traffic does not match the protocol identified by the NetWitness parsers. When we identify a situation where a service/protocol mismatch exists the traffic must be investigated to ensure there is not a business risk with the data in transit.
Identifying misconfigured web applications is simple within NetWitness. From a hunting perspective, we may have a few different hypotheses to test in order explain why the traffic exists and most importantly, what risk may be present. More often than not, this type of traffic is the result of a known web application that is not functioning as expected. The two bullets below are examples of hypotheses an analyst could create when observing this behavior. There may be more that could be created but for the purposes of this blog post, we'll stay with the two.
A web application is not functioning as expected which may create risk for the business if data is being leaked in the clear. This scenario would be considered an 'Enabler of Compromise (EOC)'. An EOC is the existence of any artifact(s) that are not malicious in nature but could be abused by a threat actor. An EOC is a business risk that should be reviewed.
A bad actor could be exfiltrating data. It would not be the first time that external threat actor infrastructure has been set up hastily and misconfigured. When hunting for a threat actor, it is important to remember that no breach is perfect. Threat actors are prone to mistakes and we can identify those mistakes when hunting.
An example in the UI is shown below. In the example we have a simple query:
medium = 1 AND tcp.dstport = 443 AND service = 80
The above query can easily be modified to fit your hunting needs. In your environment, try identifying meta keys that align with the hypothesis you are investigating for. For example, if you wanted to solely focus on files leaking then you may decide to append the 'filename.all' key to the search query as follows:
medium = 1 AND tcp.dstport = 443 AND service = 80 AND filename.all exists
The keys used to start this hunt were:
medium: Being an XDR platform, we have many customers who are ingesting data from all three major data planes (EDR, NDR, SIEM). The medium meta key is used to carve out specific types of data. In this case, medium=1 will return only packet data.
tcp.dstport: As the meta key implies, this key includes the destination port value of TCP connections. This value is parsed directly out of the packet at ingestion and is not an inferred value.
service: This key is populated with protocol names identified by the parsers. The key itself is populated with the standard port number of the identified traffic. For example, if the HTTP parser identifies a packet as being HTTP, it will tag that traffic as service=80. Even if the TCP port was 8080, the service key would display as 80 because that is the identified protocol. The values are solely based on the identified protocol, not on the port leveraged in the traffic itself. This functionality allows us to easily identify service/protocol mismatches.
When traffic, like what is shown above, is identified we can review other common artifacts such as domain name(s), source and destination IP addresses, destination countries, etc... By using more artifacts, we can effectively data carve until we are presented with a tangible number of events to investigate. The artifacts you use will vary in your environment and deployment. We highly recommend incorporating business intelligence into your environment to help provide meaningful context to the analysts as they conduct investigations.
In this scenario, we'll review the alias.host meta key next. The alias.host key contains domain information, including subdomains. When we review this meta key, there are a couple of paths we can take.
First, look for values that are known, such as the subdomain to a known web portal. If you identify a known web application, does the context of this investigation present a business risk? In other words, is it OK for the traffic to/from the web application to be in the clear? If the answer is no, you have identified a risk and should proceed according to your internal procedures.
Next, look for values that are unknown. In the screenshot below, there are multiple subdomains belonging to giphy[.]com. The traffic stems from a gif keyboard on a mobile device and is normal. The gorosei[.]net domain is not a typical domain and stands out. If we identify an abnormal artifact such as unknown domain, we can investigate it by simply clicking on the value.
We want to data carve until we are left with a reasonable number of events to investigate. We can work through the table shown in the Events view in a couple of ways. The first being to look at the meta values without a full reconstruction of the packet. This allows us to go through the table of events quickly so that we only take time to reconstruct the events that are interesting to us. An example is shown below.
Once we have identified an event to reconstruct, we need to click the 'Show/Hide' event details button as shown below. Selecting this option is persistent. Meaning, if you show/hide event details, you do not have tediously keep clicking it for each new event.
By analyzing the the meta and raw payload in this view, we can investigate exactly what occurred in this session. We can also identify red flags within the traffic. In the screenshot below we can clearly see that the traffic was indeed destined for port 443 so the person who set this up most likely intended for the traffic to be encrypted. However, the traffic is obviously standard HTTP and we can see alarming artifacts such as a 'client directory' in the GET and Referer fields along with 'Client_Data.csv' tagged in the filename meta key.
This use case may not always be about a leaked sensitive file. It is very common that leaked data such as credentials are observable within the requests and/or response as well. If the web application is known, it will be crucial that the business purpose behind the application is identified as that may help the analyst more effectively investigate the activity.
We can continue to collect artifacts until we have created a timeline that depicts what transpired and how that activity translates to business risk. This is the most important aspect of threat hunting. Any identified risky activity needs to effectively translate to the business so that the appropriate measures are taken as a response.
Now, if we inspect the raw packet, we can move forward with a deeper analysis. In the Packet view, we have the ability to inspect individual characteristics of the traffic, including the raw payload. You have the ability to alter this view to show only the payload if you wish. A CSV file transferred in the clear would generally show some if not all data in this view. After scrolling through the data, we can determine the next course of action.
As the situation will change with each investigation, the next steps will also change. In the case of a potentially malicious file, you may be inclined to download the file(s) from the File tab for further analysis and/or detonation (NetWitness will make extracted files available for download). In this case, we are able to see the contents of the CSV file in the payload.
If we want to streamline this investigation for next time, we can convert our query into a Springboard. This provides a quick-pivot into an investigation straight from the analyst's landing page.
A green success banner will appear when completed and you can click on the NetWitness logo in the top left hand corner to go to the Springboard view.
You can create new Springboard views as you see fit. Once the visual is populated you can click into the table, and you will automatically pivot into an investigation where the value you clicked on is appended to the original query.
Misconfigured web applications are common and easy to identify within NetWitness. As more companies migrate infrastructure to the cloud, this valuable hunt will aid the identification of a potentially dangerous EOC. The ability to reconstruct the raw payload provides an analyst with the full picture so no artifacts are missed. We walked through the steps to begin a hunt for web application problems and reviewed how a query may be altered to change the scope of the hunt as required. Hunting is generally an adhoc manual task but with features such as converting queries to Springboard, we can make our investigations more efficient and reduce the overall time-to-detection (TTD).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.