Automating Threat Hunting in Web Proxy Logs
LogicHub is the leading security automation platform that offers security teams a powerful threat detection engine combined with a flexible workflow engine so that organizations can tailor automated security solutions to their exact needs. Proven to deliver 10x the performance of traditional Security Orchestration Automation and Response (SOAR) platforms, the LogicHub SOAR+ platform delivers analysis and decision-making automation that exponentially improves alert triage, incident response, and threat hunting.
One of the many use cases that LogicHub customers benefit from is automating threat hunting in web proxy logs. By running the LogicHub Playbook for Web Proxy Threat Hunting, Security Operations Centers (SOCs) can discover surreptitious communications with remote Command and Control (C2) applications that play an integral part in security attacks.
Web Proxy Threat Hunting Challenges:
- Hundreds of millions of log entries are produced in mid- to large-sized companies each day, requiring timely analysis by SOC teams
- Human security analysts are understaffed and overworked
- Searching through a day’s worth of data requires multiple days of detailed work
- SOCs lack an easy way to retain learned threat intelligence and improve institutional knowledge
Automated threat hunting of proxy logs with LogicHub is a powerful and easy start to your threat hunting campaigns by focusing on a smaller subset of important events. LogicHub is capable of reducing the “noise” of high-volume alerts by identifying a smaller subset of riskier entries. The LogicHub Playbook for Web Proxy Threat Hunting combines automated steps for data collection, data enrichment, threat analysis, and threat remediation in a fast, efficient, easy-to-customize format.
Data Collection and Investigation
To reduce the volume of log entries deserving scrutiny, this playbook searches the data for specific data points:
- User agents (software that acts on behalf of a user) can be a telling indicator when assessing proxy traffic. Generally, it is a great way to catch an unsophisticated or sloppy attack.
- Rare user agents, i.e., those used by a very small number of users, can be a good indicator of malicious actions
- Blank/unknown user agents may be a good starting place
- Blacklisted agents from known malware, penetration testing tools, and even native tools
- Rare domains and destination IP addresses
- As a best practice for discovering early stage Advanced Persistent Threats (APT), the playbook looks for rare domains and IPs with a Zscaler risk score above 0.
Bad Neighborhood check
- Working from the premise that multiple malicious servers will often be hosted on the same /24 subnet, the playbook combines several threat intelligence sources to identify “bad neighborhoods” or subnets in which malicious IPs are known to be located. The playbook then correlates the curated list to the proxy traffic being analyzed to identify traffic in these “bad neighborhoods.”
- Base64 Vm0wd
- If a string is base64 encoded enough times, the first 5 characters will become “Vm0wd”. There are few, if any, legitimate reasons for this pattern to appear. Hence any traffic with this pattern is flagged for further investigation.
- Sudden spikes in a particular hosts traffic, especially when going to a small range of destination IPs, can sometimes indicate an infection of malware. Identifying users with a recent spike in traffic can be a good place to start a hunt.
Once suspicious events are identified, the LogicHub playbooks begins its analysis phase, identifying malware beacons and conducting additional analysis on the logs based on URLs.
- Botnets consist of a Command and Control (C2) server and the infected bots. To avoid noisy traffic detection, many bots send out a “beacon” on a predetermined interval, such as every 60 minutes. The C2 server will send any commands it wants the bot to execute at the time, otherwise the beacon sleeps again for 60 minutes.
- Because these beacons are cyclical in nature, patterns will emerge in the datasets when analyzed in certain ways. One such method works as such:
- Identify the time between each request, segmented by the destination IP or URL (we’ll call this the “time delta”).
- Take the average and standard deviation for each segment
- If the standard deviation is equal to 0 then you have found a potential beacon, as this signifies communication at some regular interval
- In the screenshot below, this uniformity can be seen in the last 4 columns in the red box, where the “stddev_timedelta” columns (the standard deviation for the dataset) is 0 and the timedelta column is equal to the “avg_timedelta” (the average of the dataset):
- Advanced attackers have adapted to make the hunt more difficult by “jittering” the beacon. The jitter is a predetermined number, typically measured in percentage, to randomly adjust each beacon’s call back time. As a result, the standard deviation of the dataset will increase past 0, making it more difficult for security teams to detect beacons.
- In the screenshot below, notice the standard deviation column is no longer 0 and the time delta and average columns are no longer equal. However, in the red box, notice the data only varies by 3.7%. This data was originally set to jitter at 10%, however varies only 3.7% across 10 beacons.
- In order to adjust to this adaptation, LogicHub can calculate the coefficient of variation (CV) for each segment. The CV measures the degree of variation in the dataset. Because a jitter will cause a specified variation in the time delta, analyzing the CV can help identify beaconing traffic that is jittering. To minimize the “noise” of false positives, the playbook next filters data to track only potential beacons with a 30% jitter or less. Over time as false positives are identified and removed, this filter can be adjusted.
To gain further insight into whether the traffic being processed is malicious or even potentially malicious, the playbook automatically collects and analyzes additional data points.
First, the playbook collects the following insights from a WhoIs lookup:
- Domain age: If a domain has been registered for less than 365 days, points are added to the total score
- Domain lifespan: If number of days between the registration and expiration dates is less than 365 days, points are added to the total score
- Registering email: If the domain of the registering email is matched against a list of known disposable email addresses, points are added to the total score
In addition, the playbook uses a LogicHub operator to identify whether the domain has been seen in the last 30 days. Young domains and domains registered only for a year may indicate that an attacker has set up a domain for a temporary purpose: to attack. As many attackers are financially motivated, keeping costs low is key. If any of these flags are hit, the playbook adds to the score used for prioritizing the final results of analysis.
Lastly, the playbook aggregates a unique list of domain names after data has been filtered. These URLs can be run through any number of several third-party tools, such as VirusTotal, to further enrich the data.
Correlation and Threat Ranking
Using the results from data enrichment phase, the LogicHub platform will rank events by risk, thereby prioritizing events so that threat hunters can focus on the riskiest items in the proxy data. Beaconing events that use young domains, for example, will be ranked higher than the events whose enrichment data is benign.
Automating threat hunting in web proxy logs with LogicHub is powerful, easy, and invaluable for detecting malware and other threats that might otherwise be missed in a mountain of alert data. Using LogicHub, SOC teams are able to improve their productivity and response times, while minimizing false positives and false negatives. For more information, www.logichub.com/contact.