The emergence of artificial intelligence from the platform OpenAI has shown some extremely promising results. We have done a presentation recently at NVISO which shows that we can achieve an 89.2% accuracy rate for classifying emails. I am not convinced you can solely rely on AI for such determinations, however, you can add a tremendous amount of efficiency into the SOC in other areas. You can use OpenAI to curate metrics to enhance certain portions of the incident life-cycle. In this blog post we will go over the following topics:
OpenAI Phishing and Email Classification
If you hadn't seen it yet, I advise you take a look at our presentation from December 16th where we walk through our results. However, I will explain in some details below what we discovered and how we plan to use OpenAI.
In the aforementioned talk we spoke about an experiment to test OpenAI's abilitiy to classify emails and to assess it's risk score. I highly recommend watching the above talk before moving forward since I won't repeat it entirely.
We had quite some impressive results, however, not something that would totally eliminate manual efforts. A quick overview of the results:
True Detections: 89.2%
58.6% were True Negative
41.4% were True Positive
False Detections: 10.8%
4.5% were False Negative
6.25% were False Positive
OpenAI can be bypassed due to Ethical constraints
OpenAI can be used as a tool to enrich events but not completely automate them
Below is the prompt which was used for testing. Feel free to copy and paste it to run your own experiments or even in production.
Please give me a precise score from 0 to 100 on if this email is risky with 100 being high risk. Additionally, please explain your analysis. Please give me a precise score from 0 to 100 on your confidence of analysis. Lastly, I would like you classify this email from only one of items from the following list as precisely as possible: CEO Fraud, Phishing, Junk, Other or Social Engineering. Please do not use a category which does not exist in the aforementioned list I would like this to be in valid JSON format. The JSON format should be as follows { "Confidence": Value, "Explanation": "Value", "Category": "Value", "Risk Score":"Value" }
Here is the email:
Hallo (REDACTED)
Ik loop nu een vergadering achter gesloten deuren binnen, en ik wil
dat je een korte maar dringende taak afhandelt. Reageer met je
mobiele nummer en wacht op mijn sms. Bedankt
Beste wensen.
{ "Confidence": 80, "Explanation": "This email appears to be a social engineering attempt as it is asking for personal information and does not provide any context or explanation for why the information is needed. The email is also not signed by anyone, which is suspicious. The email does not contain any links or attachments, so there is no risk of malicious code being downloaded.", "Category": "Social Engineering", "Risk Score": 90 }
The goal is to generate email classifications for additional metrics. This is quite important for user-reported phishing because users normally send a tremendous amount of junk into the SOC. Most of it seems like a sincere attempt to identify a phishing email. However, being able to quickly generate a report to determine what types of emails that users are sending such as surveys, promotion or marketing emails and others gives us a better chance at educating the user-base. During the talk mentioned before we showcased several classifications:
Social Engineering
Phishing
CEO Fraud
Junk
Other
However, in future tests we will add survey and promotional as a classification. This is not an attempt at fully analysing phishing emails, but again, to generate metrics to take additional decisions. Lastly, we will display risk scores and other results to our SOC analysts as an additional piece of context. However, at this moment the risk of using AI for taking a decision is far too great since the AI is extremely confident in making terrible assessments.
In the next section we will talk about using AI to classify Host types and User types and the possibility of altering severity levels and refocusing on specific closed incidents to cover any possible security gaps.
OpenAI Host Classification
Since analysing emails gave us some pretty intriguing results I decided to go ahead and test it in other places for the means of classification and context generation. The first and most obvious place to me would be an attempt to distinguish the difference between a workstation and a critical asset such as a server or domain controller. Here is an example of the OpenAI prompt and input we gave it:
We are in a security operation center and receive thousands of alerts. We need you to attempt to categorize a host name as only one of the following as precisely as possible: Workstation, Server or Domain Controller. The host name was provided in the context of a SIEM alert in the SOC.
Please return the information in this format:
Explanation: Thorough Explanation
Type: Classification
Here is the hostname:azseawebsmart01.companyA.local
Explanation: The hostname provided, azseawebsmart01.companyA.local, is a server. This is because the hostname contains the word "web" which indicates that it is a web server. Additionally, the hostname ends with the domain ".local" which is a common domain for servers.
Type: Server
As you can see the AI was spot on with its analysis. You can also opt to return the data in JSON format like we will so that you can immediately store the metric "server" and or "explanation."
There are some cases however, where it's not able to provide you with an accurate analysis and it defaults to "unknown."
We are in a security operation center and receive thousands of alerts. We need you to categorize a host name as only one of the following as precisely as possible: Workstation, Server or Domain Controller. The host name was provided in the context of a SIEM alert in the SOC.
Please return the information in this format:
Explanation: Thorough Explanation
Type: Classification
Here is the hostname:DAGJCG5
Explanation: DAGJCG5 is not a recognizable hostname, so it is impossible to accurately classify it as a workstation, server, or domain controller.
Type: Unknown
From what I have seen before in SOC is that hostnames which belong to a servers and domain controllers are more easily identifiable for both humans and the AI. This metric will not always be useful, however, it can add a very interesting metric as you'll be able to run a query in XSOAR our which every SOC platform you use to filter by incidents which relate to a Domain Controller or Server. You may even be able to take some "Incident Management" decisions such as altering severity levels in the future. Another interesting idea is to make this data visible to the analyst so that they are able to ingest a new piece of information about the incident.
Examples
OpenAI User Classification
While host classification can lend us interesting insights and new metrics to utilize in SOC, openAI is also able to make some determinations on user names.
Here is an example prompt for the user analysis:
We are in a security operation center and receive thousands of alerts. We need you to categorize a user name as only one of the following as precisely as possible: User, Administrator or Service Account. The user name was provided in the context of a SIEM alert in the SOC.
Please return the information in this format:
Explanation: Thorough Explanation
Type: Classification
Here is the user:SVCWEBINTELSQL
Explanation: This user name appears to be a service account, as it is composed of three distinct words that are typically associated with service accounts. The first word, "SVC," is a common abbreviation for "service," the second word, "WEBINTEL," is likely a combination of "web" and "intelligence," and the third word, "SQL," is a common abbreviation for Structured Query Language.
Type: Service Account
In the above example OpenAI was able to correctly classify the username. Although the user name is relatively easy to figure out for an analyst in a few seconds. The point is metric generation to display specific results in your dashboards or to run certain queries to hunt for potentially missed or incorrectly handled events.
After implementation you can choose to expand the classifications you want to use. However, I would advise testing it before moving to production. What ends up happening with OpenAI is that depending on your prompt it slightly changes the perspective and alters the results.
With the appropriate metrics in place it will enhance your decision making abilities and you don't always need to rely on your SOC to build those metrics for you. This will always frustrate your SOC and cause an additional amount of labour to curate the extra data for you.
Σχόλια