Intelligent Security Automation Use Case

Automating Threat Hunting in Web Proxy Logs

Download PDF


LogicHub is the leading Intelligent Security Automation Platform that marries a powerful Decision Engine to a flexible Workflow Engine. Proven to deliver 10x the performance of traditional Security Orchestration Automation and Response (SOAR) solutions, it is the only platform of its kind to deliver analysis and decision-making automation to exponentially improve alert triage, incident response, and threat hunting.

One of the many use cases that LogicHub customers have implemented and benefited from is that of automating threat hunting in web proxy logs.

Web Proxy Threat Hunting Challenges:

  • Hundreds of millions of log entries per day to analyze for mid to large sized companies
  • Human analysts are limited in both quantity and availability
  • It would take days to search through a day’s worth of data
  • No easy way to retain learned threat intelligence and improve institutional knowledge

LogicHub Solution:

Automated threat hunting of proxy logs with LogicHub is a powerful and easy start to your threat hunting campaigns by focusing on a smaller subset of important events. LogicHub is capable of reducing the noise by identifying a smaller subset of riskier entries.

Data Collection and Investigation

To further reduce logs, this use case searches the data for specific data points:

  • User agents can be a very telling indicator when assessing proxy traffic. Generally, it is a great way to catch an unsophisticated or sloppy attack.
    • Rare user agents, i.e. those used by a very small number of users, can be a good indicator of malicious actions.
    • Blank/Unknown agents may be a good starting place
    • Blacklisted agents from known malware, penetration testing tools, and even native tools
  • Rare domains and destination IP addresses
    • Inline with the concept of discovering early stage APT, we can focus on rare Domains and IPs with a Zscaler risk score above 0.
  • Bad Neighborhood check
    • Using the premise that multiple malicious servers will often be hosted on the same /24 subnet, we can combine several threat intel sources to identify “bad neighborhoods” or subnets in which malicious IPs are known to be located. We can then correlate the curated list to the proxy traffic we are analyzing to identify traffic in these “bad neighborhoods.”
  • Base64 Vm0wd
    • If a string is base64 encoded enough times, the first 5 characters will become “Vm0wd”. There are few, if any, legitimate reasons for this. Hence any traffic with this pattern is flagged for further investigation.
  • Top talkers
    • Sudden spikes in a particular hosts traffic, especially when going to a small range of destination IPs, can sometimes indicate an infection of malware. Identifying users with a recent spike in traffic can be a good place to start a hunt.

Once identified, LogicHub will go into the analysis phase where it will identify malware beacons and conduct additional analysis on the logs based on the URL.

Malware Beacons
  • Botnets are made up a command and control (C2) server and the infected bots. Often to avoid noisy traffic detection, the bots send out a “beacon” on a predetermined interval, such as every 60 minutes. The C2 server will send any commands it wants the bot to execute at the time, otherwise the beacon sleeps again for 60 minutes.
How to find beacons:
  • Because these beacons are cyclical in nature, patterns will emerge in the datasets when analyzed in certain ways. One such method works as such:
    • Identify the time between each request, segmented by the destination IP or URL (we’ll call this the “time delta”).
    • Take the average and standard deviation for each segment
    • If the standard deviation is equal to 0 then you have found a potential beacon, as this signifies communication at some regular interval
    • In the screenshot below, this uniformity can be seen in the last 4 columns in the red box, where the “stddev_timedelta” columns (the standard deviation for the dataset) is 0 and the timedelta column is equal to the “avg_timedelta” (the average of the dataset):

  • Advanced attackers have adapted to make the hunt more difficult by “jittering” the beacon. The jitter is a predetermined number, typically measured in percentage, to randomly adjust each beacon’s call back time. As a result, the standard deviation of the dataset will increase past 0, making this this method to find beacons more difficult.
    • In the screenshot below, notice the standard deviation column is no longer 0 and the time delta and average columns are no longer equal. However, in the red box, notice the data only varies by 3.7%. This data was originally set to jitter at 10%, however varies only 3.7% across 10 beacons.

  • In order to adjust to this adaptation, LogicHub can calculate the coefficient of variation (CV) for each segment. The CV measures the degree of variation in the dataset. As a jitter will cause a specified variation in the time delta, analyzing the CV can help identify beaconing traffic that is jittering. To keep the noise of false positives down, we can now filter our data to only keep potential beacons with a 30% jitter or less. Over time as false positives are identified and removed, this filter can be adjusted.
Data Enrichment

In order to gain further insight on whether the traffic being processed is malicious, or even potentially malicious, we add additional data points in order to automate this process.

First, we gain the following insights from a WhoIs lookup -

  • Domain Age: If registered for less than 365, points are added to the total score
  • Domain lifespan: If number of days between the registration and expiration dates is less than 365, points are added to the total score
  • Registering email: If the domain of the registering email is matched against a list of known disposable email addresses, points are added to the total score

In addition, we also use a LogicHub operator to identify whether the domain has been seen in the last 30 days. Young domains, and domains only reserved for a year are can be indicative of an attacker that just set up a domain for a one-time purpose, to attack. As many attackers are financially motivated, keeping costs low is key. If any of these flags are hit, points are added to the score in order to prioritize the final list of results.

Lastly, a unique list of domain names can be obtained from data post filtering stage. These URLs can be run through any number of third-party tools, such as VirusTotal, in order to further enrich the data.

Correlation and Threat Ranking

Using the results from data enrichment phase, LogicHub will risk rank, and thereby prioritize the events so that threat hunters can focus on the riskier items from within the proxy data. Beaconing events that use young domains, for example, will be ranked higher than the events whose enrichment data is benign.


Automating threat hunting in web proxy logs with LogicHub is powerful, easy, and can help you detect malware and threats otherwise easily missed in the mountain of data. SOC teams are able to improve their productivity and response times, while minimizing false positives and false negatives. For more information, visit

Request a Demo

Request a Demo