October 25, 2018 Kumar Saurabh
Across the IT security industry, there’s a growing recognition that enterprises can no longer detect and stop security attacks without the help of automation.
Today’s security teams are dedicated, highly trained, and equipped with more advanced tools than ever before. But they’re facing a steady stream of attacks that are carefully crafted by determined and well-funded adversaries. Detecting, let alone stopping, attacks against a vast and varied IT infrastructure is increasingly difficult.
Automation can help enterprise Security Operations Centers (SOCs) with three critical activities:
The most difficult of these activities is threat hunting, which today is usually performed manually by security analysts as they pore through data, log into various systems, consults data feeds, and try to determine—as quickly as possible—whether a threat exists.
All three activities require decision making. To perform alert triage, analysts have to decide whether or not an alert indicates a real incident. Then, in the threat hunting stage, analysts have to decide whether an incident constitutes a real threat. Finally, in the incident response stage, analysts end up spending a good deal of time confirming that the incident is a real incident and deciding how best to respond to it.
When it comes to automating alert triage, threat hunting, and incident response, the hardest nut to crack turns out to be automating decision making. SIEMs use rules to perform some rudimentary automation, but they are not powerful or sophisticated enough to replicate a skilled analyst’s decision making process. Just ask security teams who get alerts from SIEMs: a vast percentage of these alerts out to be false positives.
Beyond the most rudimentary level, the logic required for decision making is too sophisticated and complex to automate with a scripting language. Fortunately, the gap in what an analyst can do and what can be easily scripted can be bridged by applying machine learning techniques. Machine learning is an artificial intelligence (AI) technique that uses statistical models to learn pattern recognition without having been explicitly programmed to do so.
To see just how quickly and easily machine learning can be applied in a SOC, let’s consider an example of a common type of triage performed in a SOC: determining whether or not suspicious emails are spam (annoying but probably harmless) or phishing attacks (definitely harmful to the organization).
In this example, we’ll take a collection of email messages forwarded to an enterprise SOC for analysis and use machine learning to determine which of these messages really contain phishing attempts.
The emails arrive at the SOC from users who suspect it to be phishing. The job for the SOC is to evaluate each email more rigorously to determine which are genuinely dangerous.
In this example, we’ll use the Flow Builder in the LogicHub Intelligent Security Automation platform to train an automated flow for analyzing email. We’ll train the system to characterize email as spam or phishing using one set of labeled data, and then testing it against other set of labeled data.
In the image below, you can see the flow we’ve built for an automated analysis on the left. This flow was built using the drag-and-drop interface in the LogicHub platform. No Python scripting was required.
On the right, you can see examples of incoming emails tentatively classified as spam. There’s a similar set of emails suspected of being phishing. (We’ve obscured the name of the company that received this email.)
In the next step, we apply labels in the context of the flow: suspected spam is labeled ‘spam’ and suspected phishing is labeled ‘phishing.’
Next, we mix these two streams, compiling a single collection containing ‘All Emails.’ Why? Because we’re to re-evaluate all of them to learn whether or not their original classification was correct.
To perform our automated analysis, we need to create a new tag for sorting the email, so we assign a random tag to each email.
Next, we split the email again into two streams. This time, they’re sorted by their random tags, not by their preliminary identification as spam or phishing.
Why are we creating two streams? We’re going to use one stream to train our machine learning algorithm to distinguish spam from phishing on the basis of internal characteristics of the email. Machine learning works by training an analytical “machine” to discover patterns, including similarities and differences, from a collection of test data. So, to build our analytical model, we’re going to use some of the email received from the SOC as test data to train our machine learning model.
Then we’ll apply the machine learning model to the test data, which comprises the other email messages we’ve received, and assess the accuracy of the original classification of the email.
(Incidentally, it’s important that we not use the training data as test data. If we use the same data set for both, then we’re testing the machine on the same data we should trained it on. Its results won’t vary, and they won’t suggest the accuracy of the model when dealing with unfamiliar data.)
In the image below, you can see an example of our test data and how our machine learning model analyzed it. Each message is still tagged with its original label (‘spam’ or ‘phishing’). But now there’s an additional label (lhub_label) that presents the machine learning model’s analysis of the message, taking into account all the characteristics of the message.
For the first two emails shown below, the predicted label (spam) matches the lhub_label (spam). But for the third email, the machine learning analysis produced a different result, showing the analysis to be phishing.
From there, our flow divides the messages into four categories:
If we examine some of the messages in the True Positives stream (shown below), we can see that they really are phishing messages.
And if we view the Results of the complete flow, we can see how well our machine learning model performed.
Our machine learning model, which can easily be assembled using built-in classifiers of the LogicHub platform, found:
Overall, the model performed with a precision of 0.86 and an accuracy of 0.95. (Precision refers to the consistency of the analysis. Accuracy refers to the analysis’ ability to genuinely detect threats.)
From our work with SOCs, we know that these results match or exceed the results achieved by security analysts. And they were achieved with the benefit of speed, the analysis being completed in seconds not minutes or hours.
And if the model needs fine tuning? Analysts can use the LogicHub platform to correct the results of the analysis, retraining the algorithm and enabling the machine learning model to perform with even greater accuracy in the future.
I hope this example gives you a sense of the advantages of machine learning for automating the threat analysis performed every day in enterprise SOCs.
The LogicHub Intelligent Security Platform’s machine learning model was able to quickly distinguish harmful phishing messages from spam, enabling the SOC team to take appropriate corrective actions.
The model performed with 95% accuracy. To achieve these results, no Python programming was needed. Instead, using a drag-and-drop interface, an analyst was able to build an analytical flow, which was trained on live data in the SOC. Once built, this flow can be used repeatedly, copied, modified, and elaborated on as needed, enabling analysts to respond more quickly to genuine threats.
This shows the power of machine learning: fast, accurate results were achieved with minimal programming, and analysts were spared the need to evaluate the data entirely.
In any overworked SOC today, LogicHub’s machine learning approach can eliminate the vast majority of false positives received by the SOC and enable to the SOC to perform 10X more threat hunting.
May 20, 2022 Willy Leichter
Demystifying the technology with case studies of AI security in action Many automation tools, such...
Learn MoreMay 17, 2022 Willy Leichter
While we’ve been talking about and imagining artificial intelligence for years, it only has...
Learn MoreMay 15, 2022 Tessa Mishoe
Hello, and welcome to the latest edition of the LogicHub Monthly Update! Each month we’ll be...
Learn MoreMay 9, 2022 Tessa Mishoe
Blackcat Ransomware On April 19th of 2022, the FBI Cyber Division released a flash bulletin...
Learn MoreMay 6, 2022 Kumar Saurabh
LogicHub’s unique decision automation technology can build clients the ultimate security playbook...
Learn MoreMay 3, 2022 Kumar Saurabh
Automating a threat-hunting playbook with the help of AI Many threat-hunting playbooks we build for...
Learn MoreApril 29, 2022 Tessa Mishoe
Introduction Within the realm of security, there are many different toolsets and opinions on what...
Learn MoreApril 27, 2022 Willy Leichter
SOAR Playbooks Outside of football, the term “playbook” is well understood by a relatively small...
Learn MoreApril 21, 2022 Willy Leichter
When updating your systems from a pure Security Information Event Management (SIEM), choosing the...
Learn MoreApril 15, 2022 Tessa Mishoe
Hello, and welcome to the latest edition of the LogicHub Monthly Update! Each month we’ll be...
Learn More© 2017-2022 LogicHub®
All Rights Reserved
Privacy Policy
Terms of Use
Sitemap
© 2017-2022 LogicHub®
All Rights Reserved
Privacy Policy
Terms of Use
Sitemap