Across the IT security industry, there’s a growing recognition that enterprises can no longer detect and stop security attacks without the help of automation.

Today’s security teams are dedicated, highly trained, and equipped with more advanced tools than ever before. But they’re facing a steady stream of attacks that are carefully crafted by determined and well-funded adversaries. Detecting, let alone stopping, attacks against a vast and varied IT infrastructure is increasingly difficult.

Automation can help enterprise Security Operations Centers (SOCs) with three critical activities:

  • Alert Triage (sorting through data that suggests the presence of a threat)
  • Threat Hunting (analyzing data carefully to confirm the existence of a threat and discover its characteristics)
  • Incident Response (responding to and, if necessary, recovering from a threat or incident)

The most difficult of these activities is threat hunting, which today is usually performed manually by security analysts as they pore through data, log into various systems, consults data feeds, and try to determine—as quickly as possible—whether a threat exists.

All three activities require decision making. To perform alert triage, analysts have to decide whether or not an alert indicates a real incident. Then, in the threat hunting stage, analysts have to decide whether an incident constitutes a real threat. Finally, in the incident response stage, analysts end up spending a good deal of time confirming that the incident is a real incident and deciding how best to respond to it.

When it comes to automating alert triage, threat hunting, and incident response, the hardest nut to crack turns out to be automating decision making. SIEMs use rules to perform some rudimentary automation, but they are not powerful or sophisticated enough to replicate a skilled analyst’s decision making process. Just ask security teams who get alerts from SIEMs: a vast percentage of these alerts out to be false positives.

Beyond the most rudimentary level, the logic required for decision making is too sophisticated and complex to automate with a scripting language. Fortunately, the gap in what an analyst can do and what can be easily scripted can be bridged by applying machine learning techniques. Machine learning is an artificial intelligence (AI) technique that uses statistical models to learn pattern recognition without having been explicitly programmed to do so.

To see just how quickly and easily machine learning can be applied in a SOC, let’s consider an example of a common type of triage performed in a SOC: determining whether or not suspicious emails are spam (annoying but probably harmless) or phishing attacks (definitely harmful to the organization).

Putting Machine Learning to Work for Phishing Triage

In this example, we’ll take a collection of email messages forwarded to an enterprise SOC for analysis and use machine learning to determine which of these messages really contain phishing attempts.

The emails arrive at the SOC from users who suspect it to be phishing. The job for the SOC is to evaluate each email more rigorously to determine which are genuinely dangerous.

Building a Machine Learning Workflow with LogicHub

In this example, we’ll use the Flow Builder in the LogicHub Intelligent Security Automation platform to train an automated flow for analyzing email. We’ll train the system to characterize email as spam or phishing using one set of labeled data, and then testing it against other set of labeled data.

In the image below, you can see the flow we’ve built for an automated analysis on the left. This flow was built using the drag-and-drop interface in the LogicHub platform. No Python scripting was required.

On the right, you can see examples of incoming emails tentatively classified as spam. There’s a similar set of emails suspected of being phishing. (We’ve obscured the name of the company that received this email.)

Ml1-1

In the next step, we apply labels in the context of the flow: suspected spam is labeled ‘spam’ and suspected phishing is labeled ‘phishing.’

Next, we mix these two streams, compiling a single collection containing ‘All Emails.’ Why? Because we’re to re-evaluate all of them to learn whether or not their original classification was correct.

To perform our automated analysis, we need to create a new tag for sorting the email, so we assign a random tag to each email.

Separating Training Data from Test Data

Next, we split the email again into two streams. This time, they’re sorted by their random tags, not by their preliminary identification as spam or phishing.

Why are we creating two streams? We’re going to use one stream to train our machine learning algorithm to distinguish spam from phishing on the basis of internal characteristics of the email.  Machine learning works by training an analytical “machine” to discover patterns, including similarities and differences, from a collection of test data. So, to build our analytical model, we’re going to use some of the email received from the SOC as test data to train our machine learning model.

Then we’ll apply the machine learning model to the test data, which comprises the other email messages we’ve received, and assess the accuracy of the original classification of the email.

(Incidentally, it’s important that we not use the training data as test data. If we use the same data set for both, then we’re testing the machine on the same data we should trained it on. Its results won’t vary, and they won’t suggest the accuracy of the model when dealing with unfamiliar data.)

In the image below, you can see an example of our test data and how our machine learning model analyzed it. Each message is still tagged with its original label (‘spam’ or ‘phishing’). But now there’s an additional label (lhub_label) that presents the machine learning model’s analysis of the message, taking into account all the characteristics of the message.

Accurate, Real-Time Analysis for Phishing Triage

For the first two emails shown below, the predicted label (spam) matches the lhub_label (spam). But for the third email, the machine learning analysis produced a different result, showing the analysis to be phishing.

ML22

From there, our flow divides the messages into four categories:

  • True positives (emails that were genuinely phishing threats)
  • True negatives (emails that were genuinely spam; that is, not phishing threats)
  • False positives (emails that were spam but that had been misidentified as phishing)
  • False negatives (emails that were phishing but that had been misidentified as spam)

If we examine some of the messages in the True Positives stream (shown below), we can see that they really are phishing messages.

ML3

And if we view the Results of the complete flow, we can see how well our machine learning model performed.

ML4

Our machine learning model, which can easily be assembled using built-in classifiers of the LogicHub platform, found:

  • 6 phishing threats
  • 31 spam messages
  • 1 false positive classification (spam incorrectly flagged as phishing)
  • 1 false negative classification (phishing incorrectly flagged as spam)

Overall, the model performed with a precision of 0.86 and an accuracy of 0.95. (Precision refers to the consistency of the analysis. Accuracy refers to the analysis’ ability to genuinely detect threats.)

From our work with SOCs, we know that these results match or exceed the results achieved by security analysts. And they were achieved with the benefit of speed, the analysis being completed in seconds not minutes or hours.

And if the model needs fine tuning? Analysts can use the LogicHub platform to correct the results of the analysis, retraining the algorithm and enabling the machine learning model to perform with even greater accuracy in the future.

The Importance of Machine Learning for Security Automation

I hope this example gives you a sense of the advantages of machine learning for automating the threat analysis performed every day in enterprise SOCs.

The LogicHub Intelligent Security Platform’s machine learning model was able to quickly distinguish harmful phishing messages from spam, enabling the SOC team to take appropriate corrective actions.

The model performed with 95% accuracy. To achieve these results, no Python programming was needed. Instead, using a drag-and-drop interface, an analyst was able to build an analytical flow, which was trained on live data in the SOC. Once built, this flow can be used repeatedly, copied, modified, and elaborated on as needed, enabling analysts to respond more quickly to genuine threats.

This shows the power of machine learning: fast, accurate results were achieved with minimal programming, and analysts were spared the need to evaluate the data entirely.

In any overworked SOC today, LogicHub’s machine learning approach can eliminate the vast majority of false positives received by the SOC and enable to the SOC to perform 10X more threat hunting.

Blog

Related Posts

May 20, 2022 Willy Leichter

Automating Threat Detection: Three Case Studies

Demystifying the technology with case studies of AI security in action Many automation tools, such...

Learn More

May 17, 2022 Willy Leichter

It's Time to Put AI to Work in Security

While we’ve been talking about and imagining artificial intelligence for years, it only has...

Learn More

May 15, 2022 Tessa Mishoe

LogicHub Security RoundUp: May 2022

Hello, and welcome to the latest edition of the LogicHub Monthly Update! Each month we’ll be...

Learn More

May 9, 2022 Tessa Mishoe

Bad Luck: BlackCat Ransomware Bulletin

Blackcat Ransomware On April 19th of 2022, the FBI Cyber Division released a flash bulletin...

Learn More

May 6, 2022 Kumar Saurabh

Let Humans Be Humans and AI Be AI

LogicHub’s unique decision automation technology can build clients the ultimate security playbook...

Learn More

May 3, 2022 Kumar Saurabh

How to Build a Threat Detection Playbook In 15 Minutes or Less

Automating a threat-hunting playbook with the help of AI Many threat-hunting playbooks we build for...

Learn More

April 29, 2022 Tessa Mishoe

Integrating Better: What Can Integrations Do For Me?

Introduction Within the realm of security, there are many different toolsets and opinions on what...

Learn More

April 27, 2022 Willy Leichter

Beyond No-Code: Using AI for Guided Security Automation

SOAR Playbooks Outside of football, the term “playbook” is well understood by a relatively small...

Learn More

April 21, 2022 Willy Leichter

Goodbye Lonely SIEM, Hello MDR

When updating your systems from a pure Security Information Event Management (SIEM), choosing the...

Learn More

April 15, 2022 Tessa Mishoe

LogicHub Security Roundup: April 2022

Hello, and welcome to the latest edition of the LogicHub Monthly Update! Each month we’ll be...

Learn More