Explore ATT&CK Data Sources
Contents
Explore ATT&CK Data Sources#
Goals:#
Access ATT&CK data sources in STIX format via a public TAXII server
Learn to interact with ATT&CK data all at once
Explore and idenfity patterns in the data retrieved
Learn more about ATT&CK data sources
Import ATT&CK API Client#
from attackcti import attack_client
Import Extra Libraries#
from pandas import *
import numpy as np
import json
import altair as alt
alt.renderers.enable('default')
import itertools
import logging
logging.getLogger('taxii2client').setLevel(logging.CRITICAL)
Initialize ATT&CK Client Class#
lift = attack_client()
Get All Techniques#
all_techniques = lift.get_techniques()
Convert Techniques to Dataframe and Update Techniques Objects#
Normalizing semi-structured JSON data into a flat table via pandas.io.json.json_normalize
Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html
temp_list = []
for t in all_techniques:
matrix_name = t['external_references'][0]['source_name']
technique_number = t['external_references'][0]['external_id']
if 'x_mitre_data_sources' in t.keys():
data_sources = list(set([ds.split(':')[0] for ds in t['x_mitre_data_sources']]))
t = t.new_version(x_mitre_data_sources = data_sources)
t = t.new_version(matrix = matrix_name)
t = t.new_version(technique_id = technique_number)
temp_list.append(json.loads(t.serialize()))
techniques = pandas.json_normalize(temp_list)
techniques.rename(columns = {'x_mitre_platforms':'platform', 'kill_chain_phases':'tactic', 'name':'technique', 'x_mitre_data_sources':'data_sources'}, inplace = True)
techniques = techniques.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
techniques.head()
matrix | platform | tactic | technique | technique_id | data_sources | |
---|---|---|---|---|---|---|
0 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | [Command, Process, File] |
1 | mitre-attack | [Windows, Linux, macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | [Process, Command] |
2 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Login Items | T1547.015 | [Process, File] |
3 | mitre-attack | [macOS, Linux, Windows] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Reflective Code Loading | T1620 | [Module, Process, Script] |
4 | mitre-attack | [IaaS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Cloud Storage Object Discovery | T1619 | [Cloud Storage] |
print('A total of ',len(techniques),' techniques')
A total of 736 techniques
Techniques Per Matrix#
Using altair python library we can start showing a few charts stacking the number of techniques with or without data sources. Reference: https://altair-viz.github.io/
data = techniques
data_2 = data.groupby(['matrix'])['technique'].count()
data_3 = data_2.to_frame().reset_index()
data_3
matrix | technique | |
---|---|---|
0 | mitre-attack | 566 |
1 | mitre-ics-attack | 78 |
2 | mitre-mobile-attack | 92 |
alt.Chart(data_3).mark_bar().encode(x='technique', y='matrix', color='matrix').properties(height = 200)
Techniques With and Without Data Sources#
data_source_distribution = pandas.DataFrame({
'Techniques': ['Without DS','With DS'],
'Count of Techniques': [techniques['data_sources'].isna().sum(),techniques['data_sources'].notna().sum()]})
bars = alt.Chart(data_source_distribution).mark_bar().encode(x='Techniques',y='Count of Techniques',color='Techniques').properties(width=200,height=300)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text
What is the distribution of techniques based on ATT&CK Matrix?
data = techniques
data['Count_DS'] = data['data_sources'].str.len()
data['Ind_DS'] = np.where(data['Count_DS']>0,'With DS','Without DS')
data_2 = data.groupby(['matrix','Ind_DS'])['technique'].count()
data_3 = data_2.to_frame().reset_index()
data_3
matrix | Ind_DS | technique | |
---|---|---|---|
0 | mitre-attack | With DS | 520 |
1 | mitre-attack | Without DS | 46 |
2 | mitre-ics-attack | With DS | 63 |
3 | mitre-ics-attack | Without DS | 15 |
4 | mitre-mobile-attack | Without DS | 92 |
alt.Chart(data_3).mark_bar().encode(x='technique', y='Ind_DS', color='matrix').properties(height = 200)
What are those mitre-attack techniques without data sources?
data[(data['matrix']=='mitre-attack') & (data['Ind_DS']=='Without DS')][0:5]
matrix | platform | tactic | technique | technique_id | data_sources | Count_DS | Ind_DS | |
---|---|---|---|---|---|---|---|---|
58 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Vulnerabilities | T1588.006 | NaN | NaN | Without DS |
66 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Purchase Technical Data | T1597.002 | NaN | NaN | Without DS |
67 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Threat Intel Vendors | T1597.001 | NaN | NaN | Without DS |
68 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Search Closed Sources | T1597 | NaN | NaN | Without DS |
69 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Scan Databases | T1596.005 | NaN | NaN | Without DS |
Techniques without data sources#
techniques_without_data_sources=techniques[techniques.data_sources.isnull()].reset_index(drop=True)
techniques_without_data_sources.head()
matrix | platform | tactic | technique | technique_id | data_sources | Count_DS | Ind_DS | |
---|---|---|---|---|---|---|---|---|
0 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Vulnerabilities | T1588.006 | NaN | NaN | Without DS |
1 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Purchase Technical Data | T1597.002 | NaN | NaN | Without DS |
2 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Threat Intel Vendors | T1597.001 | NaN | NaN | Without DS |
3 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Search Closed Sources | T1597 | NaN | NaN | Without DS |
4 | mitre-attack | [PRE] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Scan Databases | T1596.005 | NaN | NaN | Without DS |
print('There are ',techniques['data_sources'].isna().sum(),' techniques without data sources (',"{0:.0%}".format(techniques['data_sources'].isna().sum()/len(techniques)),' of ',len(techniques),' techniques)')
There are 153 techniques without data sources ( 21% of 736 techniques)
Techniques With Data Sources#
techniques_with_data_sources=techniques[techniques.data_sources.notnull()].reset_index(drop=True)
techniques_with_data_sources.head()
matrix | platform | tactic | technique | technique_id | data_sources | Count_DS | Ind_DS | |
---|---|---|---|---|---|---|---|---|
0 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | [Command, Process, File] | 3.0 | With DS |
1 | mitre-attack | [Windows, Linux, macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | [Process, Command] | 2.0 | With DS |
2 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Login Items | T1547.015 | [Process, File] | 2.0 | With DS |
3 | mitre-attack | [macOS, Linux, Windows] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Reflective Code Loading | T1620 | [Module, Process, Script] | 3.0 | With DS |
4 | mitre-attack | [IaaS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Cloud Storage Object Discovery | T1619 | [Cloud Storage] | 1.0 | With DS |
print('There are ',techniques['data_sources'].notna().sum(),' techniques with data sources (',"{0:.0%}".format(techniques['data_sources'].notna().sum()/len(techniques)),' of ',len(techniques),' techniques)')
There are 583 techniques with data sources ( 79% of 736 techniques)
Grouping Techniques With Data Sources By Matrix#
Let’s create a graph to represent the number of techniques per matrix:
matrix_distribution = pandas.DataFrame({
'Matrix': list(techniques_with_data_sources.groupby(['matrix'])['matrix'].count().keys()),
'Count of Techniques': techniques_with_data_sources.groupby(['matrix'])['matrix'].count().tolist()})
bars = alt.Chart(matrix_distribution).mark_bar().encode(y='Matrix',x='Count of Techniques').properties(width=300,height=100)
text = bars.mark_text(align='center',baseline='middle',dx=10,dy=0).encode(text='Count of Techniques')
bars + text
All the techniques belong to mitre-attack matrix which is the main Enterprise matrix. Reference: https://attack.mitre.org/wiki/Main_Page
Grouping Techniques With Data Sources by Platform#
First, we need to split the platform column values because a technique might be mapped to more than one platform
techniques_platform=techniques_with_data_sources
attributes_1 = ['platform'] # In attributes we are going to indicate the name of the columns that we need to split
for a in attributes_1:
s = techniques_platform.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
# "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
s.name = a
# We name "s" with the same name of "a".
techniques_platform=techniques_platform.drop(a, axis=1).join(s).reset_index(drop=True)
# We drop the column "a" from "techniques_platform", and then join "techniques_platform" with "s"
# Let's re-arrange the columns from general to specific
techniques_platform_2=techniques_platform.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
We can now show techniques with data sources mapped to one platform at the time
techniques_platform_2.head()
matrix | platform | tactic | technique | technique_id | data_sources | |
---|---|---|---|---|---|---|
0 | mitre-attack | macOS | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | [Command, Process, File] |
1 | mitre-attack | Windows | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | [Process, Command] |
2 | mitre-attack | Linux | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | [Process, Command] |
3 | mitre-attack | macOS | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | [Process, Command] |
4 | mitre-attack | macOS | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Login Items | T1547.015 | [Process, File] |
Let’s create a visualization to show the number of techniques grouped by platform:
platform_distribution = pandas.DataFrame({
'Platform': list(techniques_platform_2.groupby(['platform'])['platform'].count().keys()),
'Count of Techniques': techniques_platform_2.groupby(['platform'])['platform'].count().tolist()})
bars = alt.Chart(platform_distribution,height=300).mark_bar().encode(x ='Platform',y='Count of Techniques',color='Platform').properties(width=200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text
In the bar chart above we can see that there are more techniques with data sources mapped to the Windows platform.
Defende-evasion and Persistence are tactics with the highest nummber of techniques with data sources
Grouping Techniques With Data Sources by Data Source#
We need to split the data source column values because a technique might be mapped to more than one data source:
techniques_data_source=techniques_with_data_sources
attributes_3 = ['data_sources'] # In attributes we are going to indicate the name of the columns that we need to split
for a in attributes_3:
s = techniques_data_source.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
# "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
s.name = a
# We name "s" with the same name of "a".
techniques_data_source = techniques_data_source.drop(a, axis=1).join(s).reset_index(drop=True)
# We drop the column "a" from "techniques_data_source", and then join "techniques_data_source" with "s"
# Let's re-arrange the columns from general to specific
techniques_data_source_2 = techniques_data_source.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_source_3 = techniques_data_source_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])
We can now show techniques with data sources mapped to one data source at the time
techniques_data_source_3.head()
matrix | platform | tactic | technique | technique_id | data_sources | |
---|---|---|---|---|---|---|
0 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | Command |
1 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | Process |
2 | mitre-attack | [macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Resource Forking | T1564.009 | File |
3 | mitre-attack | [Windows, Linux, macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | Process |
4 | mitre-attack | [Windows, Linux, macOS] | [{'kill_chain_name': 'mitre-attack', 'phase_na... | Downgrade Attack | T1562.010 | Command |
Let’s create a visualization to show the number of techniques grouped by data sources:
data_source_distribution = pandas.DataFrame({
'Data Source': list(techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().keys()),
'Count of Techniques': techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().tolist()})
bars = alt.Chart(data_source_distribution,width=800,height=300).mark_bar().encode(x ='Data Source',y='Count of Techniques',color='Data Source').properties(width=1200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text
A few interesting things from the bar chart above:
Process Monitoring, File Monitoring, and Process Command-line parameters are the Data Sources with the highest number of techniques
There are some data source names that include string references to Windows such as PowerShell, Windows and wmi
Most Relevant Groups Of Data Sources Per Technique#
Number Of Data Sources Per Technique#
Although identifying the data sources with the highest number of techniques is a good start, they usually do not work alone. You might be collecting Process Monitoring already but you might be still missing a lot of context from a data perspective.
data_source_distribution_2 = pandas.DataFrame({
'Techniques': list(techniques_data_source_3.groupby(['technique'])['technique'].count().keys()),
'Count of Data Sources': techniques_data_source_3.groupby(['technique'])['technique'].count().tolist()})
data_source_distribution_3 = pandas.DataFrame({
'Number of Data Sources': list(data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().keys()),
'Count of Techniques': data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().tolist()})
bars = alt.Chart(data_source_distribution_3).mark_bar().encode(x ='Number of Data Sources',y='Count of Techniques').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text
The image above shows you the number data sources needed per techniques according to ATT&CK:
There are 71 techniques that require 3 data sources as enough context to validate the detection of them according to ATT&CK
Only one technique has 12 data sources
One data source only applies to 19 techniques
Let’s create subsets of data sources with the data source column defining and using a python function:
# https://stackoverflow.com/questions/26332412/python-recursive-function-to-display-all-subsets-of-given-set
def subs(l):
res = []
for i in range(1, len(l) + 1):
for combo in itertools.combinations(l, i):
res.append(list(combo))
return res
Before applying the function, we need to use lowercase data sources names and sort data sources names to improve consistency:
df = techniques_with_data_sources[['data_sources']]
for index, row in df.iterrows():
row["data_sources"]=[x.lower() for x in row["data_sources"]]
row["data_sources"].sort()
df.head()
data_sources | |
---|---|
0 | [command, file, process] |
1 | [command, process] |
2 | [file, process] |
3 | [module, process, script] |
4 | [cloud storage] |
Let’s apply the function and split the subsets column:
df['subsets']=df['data_sources'].apply(subs)
<ipython-input-32-9765a9dc0b2f>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['subsets']=df['data_sources'].apply(subs)
df.head()
data_sources | subsets | |
---|---|---|
0 | [command, file, process] | [[command], [file], [process], [command, file]... |
1 | [command, process] | [[command], [process], [command, process]] |
2 | [file, process] | [[file], [process], [file, process]] |
3 | [module, process, script] | [[module], [process], [script], [module, proce... |
4 | [cloud storage] | [[cloud storage]] |
We need to split the subsets column values:
techniques_with_data_sources_preview = df
attributes_4 = ['subsets']
for a in attributes_4:
s = techniques_with_data_sources_preview.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
s.name = a
techniques_with_data_sources_preview = techniques_with_data_sources_preview.drop(a, axis=1).join(s).reset_index(drop=True)
techniques_with_data_sources_subsets = techniques_with_data_sources_preview.reindex(['data_sources','subsets'], axis=1)
techniques_with_data_sources_subsets.head()
data_sources | subsets | |
---|---|---|
0 | [command, file, process] | [command] |
1 | [command, file, process] | [file] |
2 | [command, file, process] | [process] |
3 | [command, file, process] | [command, file] |
4 | [command, file, process] | [command, process] |
Let’s add three columns to analyse the dataframe: subsets_name (Changing Lists to Strings), subsets_number_elements ( Number of data sources per subset) and number_data_sources_per_technique
techniques_with_data_sources_subsets['subsets_name']=techniques_with_data_sources_subsets['subsets'].apply(lambda x: ','.join(map(str, x)))
techniques_with_data_sources_subsets['subsets_number_elements']=techniques_with_data_sources_subsets['subsets'].str.len()
techniques_with_data_sources_subsets['number_data_sources_per_technique']=techniques_with_data_sources_subsets['data_sources'].str.len()
techniques_with_data_sources_subsets.head()
data_sources | subsets | subsets_name | subsets_number_elements | number_data_sources_per_technique | |
---|---|---|---|---|---|
0 | [command, file, process] | [command] | command | 1 | 3 |
1 | [command, file, process] | [file] | file | 1 | 3 |
2 | [command, file, process] | [process] | process | 1 | 3 |
3 | [command, file, process] | [command, file] | command,file | 2 | 3 |
4 | [command, file, process] | [command, process] | command,process | 2 | 3 |
As it was described above, we need to find grups pf data sources, so we are going to filter out all the subsets with only one data source:
subsets = techniques_with_data_sources_subsets
subsets_ok=subsets[subsets.subsets_number_elements != 1]
subsets_ok.head()
data_sources | subsets | subsets_name | subsets_number_elements | number_data_sources_per_technique | |
---|---|---|---|---|---|
3 | [command, file, process] | [command, file] | command,file | 2 | 3 |
4 | [command, file, process] | [command, process] | command,process | 2 | 3 |
5 | [command, file, process] | [file, process] | file,process | 2 | 3 |
6 | [command, file, process] | [command, file, process] | command,file,process | 3 | 3 |
9 | [command, process] | [command, process] | command,process | 2 | 2 |
Finally, we calculate the most relevant groups of data sources (Top 15):
subsets_graph = subsets_ok.groupby(['subsets_name'])['subsets_name'].count().to_frame(name='subsets_count').sort_values(by='subsets_count',ascending=False)[0:15]
subsets_graph
subsets_count | |
---|---|
subsets_name | |
command,process | 206 |
command,file | 131 |
file,process | 118 |
command,file,process | 90 |
command,windows registry | 60 |
process,windows registry | 59 |
command,process,windows registry | 53 |
application log,network traffic | 48 |
command,network traffic | 45 |
network traffic,process | 44 |
module,process | 44 |
file,network traffic | 39 |
file,windows registry | 37 |
file,process,windows registry | 33 |
command,module | 31 |
subsets_graph_2 = pandas.DataFrame({
'Data Sources': list(subsets_graph.index),
'Count of Techniques': subsets_graph['subsets_count'].tolist()})
bars = alt.Chart(subsets_graph_2).mark_bar().encode(x ='Data Sources', y ='Count of Techniques', color='Data Sources').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx= 0,dy=-5).encode(text='Count of Techniques')
bars + text
Group (Process Monitoring - Process Command-line parameters) is the is the group of data sources with the highest number of techniques. This group of data sources are suggested to hunt 78 techniques
Let’s Split all the Information About Techniques With Data Sources Defined: Matrix, Platform, Tactic and Data Source#
Let’s split all the relevant columns of the dataframe:
techniques_data = techniques_with_data_sources
attributes = ['platform','tactic','data_sources'] # In attributes we are going to indicate the name of the columns that we need to split
for a in attributes:
s = techniques_data.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
# "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
s.name = a
# We name "s" with the same name of "a".
techniques_data=techniques_data.drop(a, axis=1).join(s).reset_index(drop=True)
# We drop the column "a" from "techniques_data", and then join "techniques_data" with "s"
# Let's re-arrange the columns from general to specific
techniques_data_2=techniques_data.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_3 = techniques_data_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])
techniques_data_3.head()
matrix | platform | tactic | technique | technique_id | data_sources | |
---|---|---|---|---|---|---|
0 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | Command |
1 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | Process |
2 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | File |
3 | mitre-attack | Windows | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Downgrade Attack | T1562.010 | Process |
4 | mitre-attack | Windows | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Downgrade Attack | T1562.010 | Command |
Do you remember data sources names with a reference to Windows? After splitting the dataframe by platforms, tactics and data sources, are there any macOC or linux techniques that consider windows data sources? Let’s identify those rows:
# After splitting the rows of the dataframe, there are some values that relate windows data sources with platforms like linux and masOS.
# We need to identify those rows
conditions = [(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
(techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
(techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True),
(techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True)]
# In conditions we indicate a logical test
choices = ['NO OK','NO OK','NO OK','NO OK','NO OK','NO OK']
# In choices, we indicate the result when the logical test is true
techniques_data_3['Validation'] = np.select(conditions,choices,default='OK')
# We add a column "Validation" to "techniques_data_3" with the result of the logical test. The default value is going to be "OK"
What is the inconsistent data?
techniques_analysis_data_no_ok = techniques_data_3[techniques_data_3.Validation == 'NO OK']
# Finally, we are filtering all the values with NO OK
techniques_analysis_data_no_ok.head()
matrix | platform | tactic | technique | technique_id | data_sources | Validation | |
---|---|---|---|---|---|---|---|
31 | mitre-attack | Linux | {'kill_chain_name': 'mitre-attack', 'phase_nam... | System Language Discovery | T1614.001 | Windows Registry | NO OK |
34 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | System Language Discovery | T1614.001 | Windows Registry | NO OK |
68 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Code Signing Policy Modification | T1553.006 | Windows Registry | NO OK |
335 | mitre-attack | Linux | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Run Virtual Instance | T1564.006 | Windows Registry | NO OK |
340 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Run Virtual Instance | T1564.006 | Windows Registry | NO OK |
print('There are ',len(techniques_analysis_data_no_ok),' rows with inconsistent data')
There are 85 rows with inconsistent data
What is the impact of this inconsistent data from a platform and data sources perspective?
df = techniques_with_data_sources
attributes = ['platform','data_sources']
for a in attributes:
s = df.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
s.name = a
df=df.drop(a, axis=1).join(s).reset_index(drop=True)
df_2=df.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
df_3 = df_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])
conditions = [(df_3['data_sources'].str.contains('windows',case=False)== True),
(df_3['data_sources'].str.contains('powershell',case=False)== True),
(df_3['data_sources'].str.contains('wmi',case=False)== True)]
choices = ['Windows','Windows','Windows']
df_3['Validation'] = np.select(conditions,choices,default='Other')
df_3['Num_Tech'] = 1
df_4 = df_3[df_3.Validation == 'Windows']
df_5 = df_4.groupby(['data_sources','platform'])['technique'].nunique()
df_6 = df_5.to_frame().reset_index()
alt.Chart(df_6).mark_bar().encode(x=alt.X('technique', stack="normalize"), y='data_sources', color='platform').properties(height=200)
There are techniques that consider Windows Error Reporting, Windows Registry, and Windows event logs as data sources and they also consider platforms like Linux and masOS. We do not need to consider this rows because those data sources can only be managed at a Windows environment. These are the techniques that we should not consider in our data base:
techniques_analysis_data_no_ok[['technique','data_sources']].drop_duplicates().sort_values(by='data_sources',ascending=True)
technique | data_sources | |
---|---|---|
2558 | Event Triggered Execution | WMI |
31 | System Language Discovery | Windows Registry |
3911 | Input Capture | Windows Registry |
3818 | Indicator Removal on Host | Windows Registry |
3543 | Two-Factor Authentication Interception | Windows Registry |
3374 | Browser Extensions | Windows Registry |
3123 | Service Stop | Windows Registry |
3108 | Inhibit System Recovery | Windows Registry |
2700 | Create or Modify System Process | Windows Registry |
2560 | Event Triggered Execution | Windows Registry |
2519 | Boot or Logon Autostart Execution | Windows Registry |
2241 | Abuse Elevation Control Mechanism | Windows Registry |
2084 | Unsecured Credentials | Windows Registry |
2007 | Subvert Trust Controls | Windows Registry |
4103 | Boot or Logon Initialization Scripts | Windows Registry |
1815 | Keylogging | Windows Registry |
1692 | Adversary-in-the-Middle | Windows Registry |
1396 | Impair Defenses | Windows Registry |
1350 | Disable or Modify Tools | Windows Registry |
1322 | Disable or Modify System Firewall | Windows Registry |
1313 | Install Root Certificate | Windows Registry |
1177 | Hide Artifacts | Windows Registry |
951 | System Services | Windows Registry |
831 | Hijack Execution Flow | Windows Registry |
730 | Hidden Users | Windows Registry |
415 | Indicator Blocking | Windows Registry |
348 | Hidden File System | Windows Registry |
335 | Run Virtual Instance | Windows Registry |
68 | Code Signing Policy Modification | Windows Registry |
1740 | Modify Authentication Process | Windows Registry |
4286 | OS Credential Dumping | Windows Registry |
Without considering this inconsistent data, the final dataframe is:
techniques_analysis_data_ok = techniques_data_3[techniques_data_3.Validation == 'OK']
techniques_analysis_data_ok.head()
matrix | platform | tactic | technique | technique_id | data_sources | Validation | |
---|---|---|---|---|---|---|---|
0 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | Command | OK |
1 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | Process | OK |
2 | mitre-attack | macOS | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Resource Forking | T1564.009 | File | OK |
3 | mitre-attack | Windows | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Downgrade Attack | T1562.010 | Process | OK |
4 | mitre-attack | Windows | {'kill_chain_name': 'mitre-attack', 'phase_nam... | Downgrade Attack | T1562.010 | Command | OK |
print('There are ',len(techniques_analysis_data_ok),' rows of data that you can play with')
There are 4703 rows of data that you can play with
Getting Techniques by Data Sources#
This function gets techniques’ information that includes specific data sources
data_source = 'PROCESS'
results = lift.get_techniques_by_data_sources(data_source)
len(results)
280
type(results)
list
results2 = lift.get_techniques_by_data_sources('pRoceSS','commAnd')
len(results2)
344
results2[1]
AttackPattern(type='attack-pattern', id='attack-pattern--824add00-99a1-4b15-9a2d-6c5683b7b497', created_by_ref='identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', created='2021-10-08T14:06:28.212Z', modified='2021-10-15T00:48:06.723Z', name='Downgrade Attack', description='Adversaries may downgrade or use a version of system features that may be outdated, vulnerable, and/or does not support updated security controls such as logging. For example, [PowerShell](https://attack.mitre.org/techniques/T1059/001) versions 5+ includes Script Block Logging (SBL) which can record executed script content. However, adversaries may attempt to execute a previous version of PowerShell that does not support SBL with the intent to [Impair Defenses](https://attack.mitre.org/techniques/T1562) while running malicious scripts that may have otherwise been detected.(Citation: CrowdStrike BGH Ransomware 2021)(Citation: Mandiant BYOL 2018)\n\nAdversaries may downgrade and use less-secure versions of various features of a system, such as [Command and Scripting Interpreter](https://attack.mitre.org/techniques/T1059)s or even network protocols that can be abused to enable [Adversary-in-the-Middle](https://attack.mitre.org/techniques/T1557).(Citation: Praetorian TLS Downgrade Attack 2014)', kill_chain_phases=[KillChainPhase(kill_chain_name='mitre-attack', phase_name='defense-evasion')], revoked=False, external_references=[ExternalReference(source_name='mitre-attack', url='https://attack.mitre.org/techniques/T1562/010', external_id='T1562.010'), ExternalReference(source_name='CrowdStrike BGH Ransomware 2021', description='Falcon Complete Team. (2021, May 11). Response When Minutes Matter: Rising Up Against Ransomware. Retrieved October 8, 2021.', url='https://www.crowdstrike.com/blog/how-falcon-complete-stopped-a-big-game-hunting-ransomware-attack/'), ExternalReference(source_name='Mandiant BYOL 2018', description='Kirk, N. (2018, June 18). Bring Your Own Land (BYOL) – A Novel Red Teaming Technique. Retrieved October 8, 2021.', url='https://www.mandiant.com/resources/bring-your-own-land-novel-red-teaming-technique'), ExternalReference(source_name='Praetorian TLS Downgrade Attack 2014', description='Praetorian. (2014, August 19). Man-in-the-Middle TLS Protocol Downgrade Attack. Retrieved October 8, 2021.', url='https://www.praetorian.com/blog/man-in-the-middle-tls-ssl-protocol-downgrade-attack/')], object_marking_refs=['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'], x_mitre_data_sources=['Command: Command Execution', 'Process: Process Metadata', 'Process: Process Creation'], x_mitre_detection='Monitor for commands or other activity that may be indicative of attempts to abuse older or deprecated technologies (ex: <code>powershell –v 2</code>). Also monitor for other abnormal events, such as execution of and/or processes spawning from a version of a tool that is not expected in the environment.', x_mitre_is_subtechnique=True, x_mitre_permissions_required=['User'], x_mitre_platforms=['Windows', 'Linux', 'macOS'], x_mitre_version='1.0')