Explore ATT&CK Data Sources


Goals:

  • Access ATT&CK data sources in STIX format via a public TAXII server

  • Learn to interact with ATT&CK data all at once

  • Explore and idenfity patterns in the data retrieved

  • Learn more about ATT&CK data sources

1. ATT&CK Python Client Installation

You can install it via PIP: pip install attackcti

2. Import ATT&CK API Client

from attackcti import attack_client

3. Import Extra Libraries

from pandas import *
import numpy as np

import altair as alt
alt.renderers.enable('default')

import itertools

4. Initialize ATT&CK Client Class

lift = attack_client()

5. Getting Information About Techniques

Getting ALL ATT&CK Techniques

all_techniques = lift.get_techniques(stix_format=False)

Showing the first technique in our list

all_techniques[0]
{'external_references': [{'source_name': 'mitre-attack',
   'external_id': 'T1553.006',
   'url': 'https://attack.mitre.org/techniques/T1553/006'},
  {'source_name': 'Microsoft DSE June 2017',
   'url': 'https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn653559(v=vs.85)?redirectedfrom=MSDN',
   'description': 'Microsoft. (2017, June 1). Digital Signatures for Kernel Modules on Windows. Retrieved April 22, 2021.'},
  {'source_name': 'Apple Disable SIP',
   'url': 'https://developer.apple.com/documentation/security/disabling_and_enabling_system_integrity_protection',
   'description': 'Apple. (n.d.). Disabling and Enabling System Integrity Protection. Retrieved April 22, 2021.'},
  {'source_name': 'Microsoft Unsigned Driver Apr 2017',
   'url': 'https://docs.microsoft.com/en-us/windows-hardware/drivers/install/installing-an-unsigned-driver-during-development-and-test',
   'description': 'Microsoft. (2017, April 20). Installing an Unsigned Driver during Development and Test. Retrieved April 22, 2021.'},
  {'source_name': 'Microsoft TESTSIGNING Feb 2021',
   'url': 'https://docs.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-option',
   'description': 'Microsoft. (2021, February 15). Enable Loading of Test Signed Drivers. Retrieved April 22, 2021.'},
  {'source_name': 'FireEye HIKIT Rootkit Part 2',
   'url': 'https://www.fireeye.com/blog/threat-research/2012/08/hikit-rootkit-advanced-persistent-attack-techniques-part-2.html',
   'description': 'Glyer, C., Kazanciyan, R. (2012, August 22). The “Hikit” Rootkit: Advanced and Persistent Attack Techniques (Part 2). Retrieved May 4, 2020.'},
  {'source_name': 'GitHub Turla Driver Loader',
   'url': 'https://github.com/hfiref0x/TDL',
   'description': 'TDL Project. (2016, February 4). TDL (Turla Driver Loader). Retrieved April 22, 2021.'},
  {'url': 'https://blog-assets.f-secure.com/wp-content/uploads/2019/10/15163408/BlackEnergy_Quedagh.pdf',
   'description': 'F-Secure Labs. (2014). BlackEnergy & Quedagh: The convergence of crimeware and APT attacks. Retrieved March 24, 2016.',
   'source_name': 'F-Secure BlackEnergy 2014'},
  {'source_name': 'Unit42 AcidBox June 2020',
   'url': 'https://unit42.paloaltonetworks.com/acidbox-rare-malware/',
   'description': 'Reichel, D. and Idrizovic, E. (2020, June 17). AcidBox: Rare Malware Repurposing Turla Group Exploit Targeted Russian Organizations. Retrieved March 16, 2021.'}],
 'kill_chain_phases': [{'kill_chain_name': 'mitre-attack',
   'phase_name': 'defense-evasion'}],
 'x_mitre_version': '1.0',
 'x_mitre_is_subtechnique': True,
 'url': 'https://attack.mitre.org/techniques/T1553/006',
 'matrix': 'mitre-attack',
 'technique_id': 'T1553.006',
 'object_marking_refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],
 'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',
 'technique': 'Code Signing Policy Modification',
 'technique_description': 'Adversaries may modify code signing policies to enable execution of unsigned or self-signed code. Code signing provides a level of authenticity on a program from a developer and a guarantee that the program has not been tampered with. Security controls can include enforcement mechanisms to ensure that only valid, signed code can be run on an operating system. \n\nSome of these security controls may be enabled by default, such as Driver Signature Enforcement (DSE) on Windows or System Integrity Protection (SIP) on macOS.(Citation: Microsoft DSE June 2017)(Citation: Apple Disable SIP) Other such controls may be disabled by default but are configurable through application controls, such as only allowing signed Dynamic-Link Libraries (DLLs) to execute on a system. Since it can be useful for developers to modify default signature enforcement policies during the development and testing of applications, disabling of these features may be possible with elevated permissions.(Citation: Microsoft Unsigned Driver Apr 2017)(Citation: Apple Disable SIP)\n\nAdversaries may modify code signing policies in a number of ways, including through use of command-line or GUI utilities, [Modify Registry](https://attack.mitre.org/techniques/T1112), rebooting the computer in a debug/recovery mode, or by altering the value of variables in kernel memory.(Citation: Microsoft TESTSIGNING Feb 2021)(Citation: Apple Disable SIP)(Citation: FireEye HIKIT Rootkit Part 2)(Citation: GitHub Turla Driver Loader) Examples of commands that can modify the code signing policy of a system include <code>bcdedit.exe -set TESTSIGNING ON</code> on Windows and <code>csrutil disable</code> on macOS.(Citation: Microsoft TESTSIGNING Feb 2021)(Citation: Apple Disable SIP) Depending on the implementation, successful modification of a signing policy may require reboot of the compromised system. Additionally, some implementations can introduce visible artifacts for the user (ex: a watermark in the corner of the screen stating the system is in Test Mode). Adversaries may attempt to remove such artifacts.(Citation: F-Secure BlackEnergy 2014)\n\nTo gain access to kernel memory to modify variables related to signature checks, such as modifying <code>g_CiOptions</code> to disable Driver Signature Enforcement, adversaries may conduct [Exploitation for Privilege Escalation](https://attack.mitre.org/techniques/T1068) using a signed, but vulnerable driver.(Citation: Unit42 AcidBox June 2020)(Citation: GitHub Turla Driver Loader)',
 'id': 'attack-pattern--565275d5-fcc3-4b66-b4e7-928e4cac6b8c',
 'type': 'attack-pattern',
 'tactic': ['defense-evasion'],
 'modified': '2021-04-26T15:41:39.155Z',
 'created': '2021-04-23T01:04:57.161Z',
 'technique_detection': 'Monitor processes and command-line arguments for actions that could be taken to modify the code signing policy of a system, such as <code>bcdedit.exe -set TESTSIGNING ON</code>.(Citation: Microsoft TESTSIGNING Feb 2021) Consider monitoring for modifications made to Registry keys associated with code signing policies, such as <code>HKCU\\Software\\Policies\\Microsoft\\Windows NT\\Driver Signing</code>. Modifications to the code signing policy of a system are likely to be rare.',
 'data_sources': ['Windows Registry: Windows Registry Key Modification',
  'Command: Command Execution',
  'Process: Process Creation'],
 'contributors': ['Abel Morales, Exabeam'],
 'defense_bypassed': ['Application control',
  'User Mode Signature Validation',
  'Digital Certificate Validation'],
 'permissions_required': ['Administrator'],
 'platform': ['Windows', 'macOS']}

Normalizing semi-structured JSON data into a flat table via pandas.io.json.json_normalize

techniques_normalized = pandas.json_normalize(all_techniques)
techniques_normalized[0:1]
external_references kill_chain_phases x_mitre_version x_mitre_is_subtechnique url matrix technique_id object_marking_refs created_by_ref technique ... effective_permissions impact_type revoked x_mitre_deprecated x_mitre_old_attack_id difficulty_explanation difficulty_for_adversary detectable_explanation detectable_by_common_defenses tactic_type
0 [{'source_name': 'mitre-attack', 'external_id'... [{'kill_chain_name': 'mitre-attack', 'phase_na... 1.0 True https://attack.mitre.org/techniques/T1553/006 mitre-attack T1553.006 [marking-definition--fa42a846-8d90-4e51-bc29-7... identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5 Code Signing Policy Modification ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

1 rows × 37 columns

6. Re-indexing Dataframe

techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
techniques.head()
matrix platform tactic technique technique_id data_sources
0 mitre-attack [Windows, macOS] [defense-evasion] Code Signing Policy Modification T1553.006 [Windows Registry: Windows Registry Key Modifi...
1 mitre-attack [Windows, Linux, macOS, IaaS] [discovery] System Location Discovery T1614 [Instance: Instance Metadata, Process: Process...
2 mitre-attack [Containers] [discovery] Container and Resource Discovery T1613 [Cluster: Cluster Metadata, Container: Contain...
3 mitre-attack [Containers] [credential-access] Container API T1552.007 [Command: Command Execution, File: File Access...
4 mitre-attack [Containers] [defense-evasion] Build Image on Host T1612 [Image: Image Creation, Network Traffic: Netwo...
print('A total of ',len(techniques),' techniques')
A total of  1062  techniques

7. Removing Revoked Techniques

all_techniques_no_revoked = lift.remove_revoked(all_techniques)
print('A total of ',len(all_techniques_no_revoked),' techniques')
A total of  916  techniques

8. Extractinng Revoked Techniques

all_techniques_revoked = lift.extract_revoked(all_techniques)
print('A total of ',len(all_techniques_revoked),' techniques that have been revoked')
A total of  146  techniques that have been revoked

The revoked techniques are the following ones:

for t in all_techniques_revoked:
    print(t['technique'])
Web Session Cookie
Emond
Cloud Instance Metadata API
Revert Cloud Instance
Application Access Token
Elevated Execution with Prompt
Credentials from Web Browsers
PowerShell Profile
Parent PID Spoofing
Compile After Delivery
Systemd Service
Runtime Data Manipulation
Transmitted Data Manipulation
Stored Data Manipulation
Disk Content Wipe
Disk Structure Wipe
Domain Generation Algorithms
Compiled HTML File
SIP and Trust Provider Hijacking
Time Providers
CMSTP
Credentials in Registry
Control Panel Items
Kernel Modules and Extensions
Spearphishing Link
Sudo Caching
Spearphishing Attachment
Kerberoasting
Spearphishing via Service
LSASS Driver
Password Filter DLL
Screensaver
AppCert DLLs
Domain Fronting
Mshta
Dynamic Data Exchange
Hooking
Image File Execution Options Injection
Extra Window Memory Injection
LLMNR/NBT-NS Poisoning and Relay
Multi-hop Proxy
SSH Hijacking
Process Doppelgänging
SID-History Injection
Application Shimming
Hidden Window
Login Item
Plist Modification
Re-opened Applications
Setuid and Setgid
AppleScript
Bash History
Clear Command History
Hidden Users
Input Prompt
Launchctl
LC_LOAD_DYLIB Addition
Launch Agent
Local Job Scheduling
Malicious Shell Modification
Rc.common
Space after Filename
Hidden Files and Directories
Keychain
Launch Daemon
Dylib Hijacking
Gatekeeper Bypass
HISTCONTROL
Trap
Private Keys
Securityd Memory
Startup Items
Sudo
Authentication Package
Install Root Certificate
Netsh Helper DLL
Network Share Connection Removal
Component Object Model Hijacking
Regsvcs/Regasm
InstallUtil
Regsvr32
Code Signing
Component Firmware
File Deletion
AppInit DLLs
Security Support Provider
Web Shell
Timestomp
Pass the Ticket
NTFS File Attributes
Custom Command and Control Protocol
Process Hollowing
Disabling Security Tools
Bypass User Account Control
PowerShell
Rundll32
Windows Management Instrumentation Event Subscription
Credentials in Files
Multilayer Encryption
Windows Admin Shares
Remote Desktop Protocol
Pass the Hash
DLL Side-Loading
Bootkit
Indicator Removal from Tools
Uncommonly Used Port
Security Software Discovery
Registry Run Keys / Startup Folder
Service Registry Permissions Weakness
Indicator Blocking
New Service
Software Packing
File System Permissions Weakness
Change Default File Association
DLL Search Order Hijacking
Service Execution
Standard Cryptographic Protocol
Modify Existing Service
Windows Remote Management
Custom Cryptographic Protocol
Shortcut Modification
Data Encrypted
System Firmware
Application Deployment Software
Accessibility Features
Port Monitors
Binary Padding
Winlogon Helper DLL
Data Compressed
Remotely Install Application
Insecure Third-Party Libraries
Fake Developer Accounts
Device Type Discovery
Detect App Analysis Environment
Malicious Software Development Tools
Biometric Spoofing
Device Unlock Code Guessing or Brute Force
Malicious Media Content
URL Scheme Hijacking
Abuse of iOS Enterprise App Signing Key
App Delivered via Web Download
App Delivered via Email Attachment
Malicious or Vulnerable Built-in Device Functionality
Malicious SMS Message
Exploit Baseband Vulnerability
Stolen Developer Credentials or Signing Keys

9. Updating our Dataframe

techniques_normalized = pandas.json_normalize(all_techniques_no_revoked)
techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

10. Techniques Per Matrix

Using altair python library we can start showing a few charts stacking the number of techniques with or without data sources. Reference: https://altair-viz.github.io/

data = techniques
data_2 = data.groupby(['matrix'])['technique'].count()
data_3 = data_2.to_frame().reset_index()
data_3
matrix technique
0 mitre-attack 563
1 mitre-ics-attack 89
2 mitre-mobile-attack 90
3 mitre-pre-attack 174
alt.Chart(data_3).mark_bar().encode(x='technique', y='matrix', color='matrix').properties(height = 200)

11. Techniques With and Without Data Sources

data_source_distribution = pandas.DataFrame({
    'Techniques': ['Without DS','With DS'],
    'Count of Techniques': [techniques['data_sources'].isna().sum(),techniques['data_sources'].notna().sum()]})
bars = alt.Chart(data_source_distribution).mark_bar().encode(x='Techniques',y='Count of Techniques',color='Techniques').properties(width=200,height=300)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

What is the distribution of techniques based on ATT&CK Matrix?

data = techniques
data['Count_DS'] = data['data_sources'].str.len()
data['Ind_DS'] = np.where(data['Count_DS']>0,'With DS','Without DS')
data_2 = data.groupby(['matrix','Ind_DS'])['technique'].count()
data_3 = data_2.to_frame().reset_index()
data_3
matrix Ind_DS technique
0 mitre-attack With DS 472
1 mitre-attack Without DS 91
2 mitre-ics-attack With DS 72
3 mitre-ics-attack Without DS 17
4 mitre-mobile-attack Without DS 90
5 mitre-pre-attack Without DS 174
alt.renderers.enable('default') 
RendererRegistry.enable('default')
alt.Chart(data_3).mark_bar().encode(x='technique', y='Ind_DS', color='matrix').properties(height = 200)

What are those mitre-attack techniques without data sources?

data[(data['matrix']=='mitre-attack') & (data['Ind_DS']=='Without DS')]
matrix platform tactic technique technique_id data_sources Count_DS Ind_DS
10 mitre-attack [PRE] [resource-development] Link Target T1608.005 NaN NaN Without DS
11 mitre-attack [PRE] [resource-development] Drive-by Target T1608.004 NaN NaN Without DS
12 mitre-attack [PRE] [resource-development] Install Digital Certificate T1608.003 NaN NaN Without DS
13 mitre-attack [PRE] [resource-development] Upload Tool T1608.002 NaN NaN Without DS
14 mitre-attack [PRE] [resource-development] Upload Malware T1608.001 NaN NaN Without DS
... ... ... ... ... ... ... ... ...
524 mitre-attack [Linux, macOS, Windows] [execution] Graphical User Interface T1061 NaN NaN Without DS
531 mitre-attack [Windows] [lateral-movement] Shared Webroot T1051 NaN NaN Without DS
536 mitre-attack [Linux, macOS, Windows] [command-and-control] Commonly Used Port T1043 NaN NaN Without DS
542 mitre-attack [Windows] [persistence, privilege-escalation] Path Interception T1034 NaN NaN Without DS
547 mitre-attack [Linux, macOS, Windows] [command-and-control] Multiband Communication T1026 NaN NaN Without DS

91 rows × 8 columns

Techniques without data sources

techniques_without_data_sources=techniques[techniques.data_sources.isnull()].reset_index(drop=True)
techniques_without_data_sources.head()
matrix platform tactic technique technique_id data_sources Count_DS Ind_DS
0 mitre-attack [PRE] [resource-development] Link Target T1608.005 NaN NaN Without DS
1 mitre-attack [PRE] [resource-development] Drive-by Target T1608.004 NaN NaN Without DS
2 mitre-attack [PRE] [resource-development] Install Digital Certificate T1608.003 NaN NaN Without DS
3 mitre-attack [PRE] [resource-development] Upload Tool T1608.002 NaN NaN Without DS
4 mitre-attack [PRE] [resource-development] Upload Malware T1608.001 NaN NaN Without DS
print('There are ',techniques['data_sources'].isna().sum(),' techniques without data sources (',"{0:.0%}".format(techniques['data_sources'].isna().sum()/len(techniques)),' of ',len(techniques),' techniques)')
There are  372  techniques without data sources ( 41%  of  916  techniques)

Techniques With Data Sources

techniques_with_data_sources=techniques[techniques.data_sources.notnull()].reset_index(drop=True)
techniques_with_data_sources.head()
matrix platform tactic technique technique_id data_sources Count_DS Ind_DS
0 mitre-attack [Windows, macOS] [defense-evasion] Code Signing Policy Modification T1553.006 [Windows Registry: Windows Registry Key Modifi... 3.0 With DS
1 mitre-attack [Windows, Linux, macOS, IaaS] [discovery] System Location Discovery T1614 [Instance: Instance Metadata, Process: Process... 4.0 With DS
2 mitre-attack [Containers] [discovery] Container and Resource Discovery T1613 [Cluster: Cluster Metadata, Container: Contain... 6.0 With DS
3 mitre-attack [Containers] [credential-access] Container API T1552.007 [Command: Command Execution, File: File Access... 3.0 With DS
4 mitre-attack [Containers] [defense-evasion] Build Image on Host T1612 [Image: Image Creation, Network Traffic: Netwo... 4.0 With DS
print('There are ',techniques['data_sources'].notna().sum(),' techniques with data sources (',"{0:.0%}".format(techniques['data_sources'].notna().sum()/len(techniques)),' of ',len(techniques),' techniques)')
There are  544  techniques with data sources ( 59%  of  916  techniques)

12. Grouping Techniques With Data Sources By Matrix

Let’s create a graph to represent the number of techniques per matrix:

matrix_distribution = pandas.DataFrame({
    'Matrix': list(techniques_with_data_sources.groupby(['matrix'])['matrix'].count().keys()),
    'Count of Techniques': techniques_with_data_sources.groupby(['matrix'])['matrix'].count().tolist()})
bars = alt.Chart(matrix_distribution).mark_bar().encode(y='Matrix',x='Count of Techniques').properties(width=300,height=100)
text = bars.mark_text(align='center',baseline='middle',dx=10,dy=0).encode(text='Count of Techniques')
bars + text

All the techniques belong to mitre-attack matrix which is the main Enterprise matrix. Reference: https://attack.mitre.org/wiki/Main_Page

13. Grouping Techniques With Data Sources by Platform

First, we need to split the platform column values because a technique might be mapped to more than one platform

techniques_platform=techniques_with_data_sources

attributes_1 = ['platform'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_1:
    s = techniques_platform.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_platform=techniques_platform.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_platform", and then join "techniques_platform" with "s"

# Let's re-arrange the columns from general to specific
techniques_platform_2=techniques_platform.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

We can now show techniques with data sources mapped to one platform at the time

techniques_platform_2.head()
matrix platform tactic technique technique_id data_sources
0 mitre-attack Windows [defense-evasion] Code Signing Policy Modification T1553.006 [Windows Registry: Windows Registry Key Modifi...
1 mitre-attack macOS [defense-evasion] Code Signing Policy Modification T1553.006 [Windows Registry: Windows Registry Key Modifi...
2 mitre-attack Windows [discovery] System Location Discovery T1614 [Instance: Instance Metadata, Process: Process...
3 mitre-attack Linux [discovery] System Location Discovery T1614 [Instance: Instance Metadata, Process: Process...
4 mitre-attack macOS [discovery] System Location Discovery T1614 [Instance: Instance Metadata, Process: Process...

Let’s create a visualization to show the number of techniques grouped by platform:

platform_distribution = pandas.DataFrame({
    'Platform': list(techniques_platform_2.groupby(['platform'])['platform'].count().keys()),
    'Count of Techniques': techniques_platform_2.groupby(['platform'])['platform'].count().tolist()})
bars = alt.Chart(platform_distribution,height=300).mark_bar().encode(x ='Platform',y='Count of Techniques',color='Platform').properties(width=200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

In the bar chart above we can see that there are more techniques with data sources mapped to the Windows platform.

14. Grouping Techniques With Data Sources by Tactic

Again, first we need to split the tactic column values because a technique might be mapped to more than one tactic:

techniques_tactic=techniques_with_data_sources

attributes_2 = ['tactic'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_2:
    s = techniques_tactic.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_tactic = techniques_tactic.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_tactic", and then join "techniques_tactic" with "s"

# Let's re-arrange the columns from general to specific
techniques_tactic_2=techniques_tactic.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

We can now show techniques with data sources mapped to one tactic at the time

techniques_tactic_2.head()
matrix platform tactic technique technique_id data_sources
0 mitre-attack [Windows, macOS] defense-evasion Code Signing Policy Modification T1553.006 [Windows Registry: Windows Registry Key Modifi...
1 mitre-attack [Windows, Linux, macOS, IaaS] discovery System Location Discovery T1614 [Instance: Instance Metadata, Process: Process...
2 mitre-attack [Containers] discovery Container and Resource Discovery T1613 [Cluster: Cluster Metadata, Container: Contain...
3 mitre-attack [Containers] credential-access Container API T1552.007 [Command: Command Execution, File: File Access...
4 mitre-attack [Containers] defense-evasion Build Image on Host T1612 [Image: Image Creation, Network Traffic: Netwo...

Let’s create a visualization to show the number of techniques grouped by tactic:

tactic_distribution = pandas.DataFrame({
    'Tactic': list(techniques_tactic_2.groupby(['tactic'])['tactic'].count().keys()),
    'Count of Techniques': techniques_tactic_2.groupby(['tactic'])['tactic'].count().tolist()}).sort_values(by='Count of Techniques',ascending=True)
bars = alt.Chart(tactic_distribution,width=800,height=300).mark_bar().encode(x ='Tactic',y='Count of Techniques',color='Tactic').properties(width=400)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Defende-evasion and Persistence are tactics with the highest nummber of techniques with data sources

15. Grouping Techniques With Data Sources by Data Source

We need to split the data source column values because a technique might be mapped to more than one data source:

techniques_data_source=techniques_with_data_sources

attributes_3 = ['data_sources'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_3:
    s = techniques_data_source.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_data_source = techniques_data_source.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_data_source", and then join "techniques_data_source" with "s"

# Let's re-arrange the columns from general to specific
techniques_data_source_2 = techniques_data_source.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_source_3 = techniques_data_source_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

We can now show techniques with data sources mapped to one data source at the time

techniques_data_source_3.head()
matrix platform tactic technique technique_id data_sources
0 mitre-attack [Windows, macOS] [defense-evasion] Code Signing Policy Modification T1553.006 Windows Registry: Windows Registry Key Modific...
1 mitre-attack [Windows, macOS] [defense-evasion] Code Signing Policy Modification T1553.006 Command: Command Execution
2 mitre-attack [Windows, macOS] [defense-evasion] Code Signing Policy Modification T1553.006 Process: Process Creation
3 mitre-attack [Windows, Linux, macOS, IaaS] [discovery] System Location Discovery T1614 Instance: Instance Metadata
4 mitre-attack [Windows, Linux, macOS, IaaS] [discovery] System Location Discovery T1614 Process: Process Creation

Let’s create a visualization to show the number of techniques grouped by data sources:

data_source_distribution = pandas.DataFrame({
    'Data Source': list(techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().keys()),
    'Count of Techniques': techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().tolist()})
bars = alt.Chart(data_source_distribution,width=800,height=300).mark_bar().encode(x ='Data Source',y='Count of Techniques',color='Data Source').properties(width=1200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

A few interesting things from the bar chart above:

  • Process Monitoring, File Monitoring, and Process Command-line parameters are the Data Sources with the highest number of techniques

  • There are some data source names that include string references to Windows such as PowerShell, Windows and wmi

16. Most Relevant Groups Of Data Sources Per Technique

Number Of Data Sources Per Technique

Although identifying the data sources with the highest number of techniques is a good start, they usually do not work alone. You might be collecting Process Monitoring already but you might be still missing a lot of context from a data perspective.

data_source_distribution_2 = pandas.DataFrame({
    'Techniques': list(techniques_data_source_3.groupby(['technique'])['technique'].count().keys()),
    'Count of Data Sources': techniques_data_source_3.groupby(['technique'])['technique'].count().tolist()})

data_source_distribution_3 = pandas.DataFrame({
    'Number of Data Sources': list(data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().keys()),
    'Count of Techniques': data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().tolist()})

bars = alt.Chart(data_source_distribution_3).mark_bar().encode(x ='Number of Data Sources',y='Count of Techniques').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

The image above shows you the number data sources needed per techniques according to ATT&CK:

  • There are 71 techniques that require 3 data sources as enough context to validate the detection of them according to ATT&CK

  • Only one technique has 12 data sources

  • One data source only applies to 19 techniques

Let’s create subsets of data sources with the data source column defining and using a python function:

# https://stackoverflow.com/questions/26332412/python-recursive-function-to-display-all-subsets-of-given-set
def subs(l):
    res = []
    for i in range(1, len(l) + 1):
        for combo in itertools.combinations(l, i):
            res.append(list(combo))
    return res

Before applying the function, we need to use lowercase data sources names and sort data sources names to improve consistency:

df = techniques_with_data_sources[['data_sources']]
for index, row in df.iterrows():
    row["data_sources"]=[x.lower() for x in row["data_sources"]]
    row["data_sources"].sort()
df.head()
data_sources
0 [command: command execution, process: process ...
1 [command: command execution, instance: instanc...
2 [application log: application log content, clu...
3 [command: command execution, file: file access...
4 [image: image creation, network traffic: netwo...

Let’s apply the function and split the subsets column:

df['subsets']=df['data_sources'].apply(subs)
<ipython-input-45-9765a9dc0b2f>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['subsets']=df['data_sources'].apply(subs)
df.head()
data_sources subsets
0 [command: command execution, process: process ... [[command: command execution], [process: proce...
1 [command: command execution, instance: instanc... [[command: command execution], [instance: inst...
2 [application log: application log content, clu... [[application log: application log content], [...
3 [command: command execution, file: file access... [[command: command execution], [file: file acc...
4 [image: image creation, network traffic: netwo... [[image: image creation], [network traffic: ne...

We need to split the subsets column values:

techniques_with_data_sources_preview = df
attributes_4 = ['subsets']

for a in attributes_4:
    s = techniques_with_data_sources_preview.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    s.name = a
    techniques_with_data_sources_preview = techniques_with_data_sources_preview.drop(a, axis=1).join(s).reset_index(drop=True)
    
techniques_with_data_sources_subsets = techniques_with_data_sources_preview.reindex(['data_sources','subsets'], axis=1)
techniques_with_data_sources_subsets.head()
data_sources subsets
0 [command: command execution, process: process ... [command: command execution]
1 [command: command execution, process: process ... [process: process creation]
2 [command: command execution, process: process ... [windows registry: windows registry key modifi...
3 [command: command execution, process: process ... [command: command execution, process: process ...
4 [command: command execution, process: process ... [command: command execution, windows registry:...

Let’s add three columns to analyse the dataframe: subsets_name (Changing Lists to Strings), subsets_number_elements ( Number of data sources per subset) and number_data_sources_per_technique

techniques_with_data_sources_subsets['subsets_name']=techniques_with_data_sources_subsets['subsets'].apply(lambda x: ','.join(map(str, x)))
techniques_with_data_sources_subsets['subsets_number_elements']=techniques_with_data_sources_subsets['subsets'].str.len()
techniques_with_data_sources_subsets['number_data_sources_per_technique']=techniques_with_data_sources_subsets['data_sources'].str.len()
techniques_with_data_sources_subsets.head()
data_sources subsets subsets_name subsets_number_elements number_data_sources_per_technique
0 [command: command execution, process: process ... [command: command execution] command: command execution 1 3
1 [command: command execution, process: process ... [process: process creation] process: process creation 1 3
2 [command: command execution, process: process ... [windows registry: windows registry key modifi... windows registry: windows registry key modific... 1 3
3 [command: command execution, process: process ... [command: command execution, process: process ... command: command execution,process: process cr... 2 3
4 [command: command execution, process: process ... [command: command execution, windows registry:... command: command execution,windows registry: w... 2 3

As it was described above, we need to find grups pf data sources, so we are going to filter out all the subsets with only one data source:

subsets = techniques_with_data_sources_subsets

subsets_ok=subsets[subsets.subsets_number_elements != 1]
subsets_ok.head()
data_sources subsets subsets_name subsets_number_elements number_data_sources_per_technique
3 [command: command execution, process: process ... [command: command execution, process: process ... command: command execution,process: process cr... 2 3
4 [command: command execution, process: process ... [command: command execution, windows registry:... command: command execution,windows registry: w... 2 3
5 [command: command execution, process: process ... [process: process creation, windows registry: ... process: process creation,windows registry: wi... 2 3
6 [command: command execution, process: process ... [command: command execution, process: process ... command: command execution,process: process cr... 3 3
11 [command: command execution, instance: instanc... [command: command execution, instance: instanc... command: command execution,instance: instance ... 2 4

Finally, we calculate the most relevant groups of data sources (Top 15):

subsets_graph = subsets_ok.groupby(['subsets_name'])['subsets_name'].count().to_frame(name='subsets_count').sort_values(by='subsets_count',ascending=False)[0:15]
subsets_graph
subsets_count
subsets_name
command: command execution,process: process creation 167
network traffic: network traffic content,network traffic: network traffic flow 62
file: file modification,process: process creation 60
file: file creation,process: process creation 57
command: command execution,file: file modification 54
file: file creation,file: file modification 51
command: command execution,file: file modification,process: process creation 47
command: command execution,process: os api execution 47
command: command execution,file: file creation 47
command: command execution,windows registry: windows registry key modification 45
command: command execution,file: file creation,process: process creation 44
command: command execution,file: file access 42
process: process creation,windows registry: windows registry key modification 40
file: file creation,file: file modification,process: process creation 40
command: command execution,process: process creation,windows registry: windows registry key modification 39
subsets_graph_2 = pandas.DataFrame({
    'Data Sources': list(subsets_graph.index),
    'Count of Techniques': subsets_graph['subsets_count'].tolist()})

bars = alt.Chart(subsets_graph_2).mark_bar().encode(x ='Data Sources', y ='Count of Techniques', color='Data Sources').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx= 0,dy=-5).encode(text='Count of Techniques')
bars + text

Group (Process Monitoring - Process Command-line parameters) is the is the group of data sources with the highest number of techniques. This group of data sources are suggested to hunt 78 techniques

17. Let’s Split all the Information About Techniques With Data Sources Defined: Matrix, Platform, Tactic and Data Source

Let’s split all the relevant columns of the dataframe:

techniques_data = techniques_with_data_sources

attributes = ['platform','tactic','data_sources'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes:
    s = techniques_data.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_data=techniques_data.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_data", and then join "techniques_data" with "s"

# Let's re-arrange the columns from general to specific
techniques_data_2=techniques_data.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_3 = techniques_data_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

techniques_data_3.head()
matrix platform tactic technique technique_id data_sources
0 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Windows Registry: Windows Registry Key Modific...
1 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Command: Command Execution
2 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Process: Process Creation
3 mitre-attack macOS defense-evasion Code Signing Policy Modification T1553.006 Windows Registry: Windows Registry Key Modific...
4 mitre-attack macOS defense-evasion Code Signing Policy Modification T1553.006 Command: Command Execution

Do you remember data sources names with a reference to Windows? After splitting the dataframe by platforms, tactics and data sources, are there any macOC or linux techniques that consider windows data sources? Let’s identify those rows:

# After splitting the rows of the dataframe, there are some values that relate windows data sources with platforms like linux and masOS.
# We need to identify those rows
conditions = [(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True)]
# In conditions we indicate a logical test

choices = ['NO OK','NO OK','NO OK','NO OK','NO OK','NO OK']
# In choices, we indicate the result when the logical test is true

techniques_data_3['Validation'] = np.select(conditions,choices,default='OK')
# We add a column "Validation" to "techniques_data_3" with the result of the logical test. The default value is going to be "OK"

What is the inconsistent data?

techniques_analysis_data_no_ok = techniques_data_3[techniques_data_3.Validation == 'NO OK']
# Finally, we are filtering all the values with NO OK

techniques_analysis_data_no_ok.head()
matrix platform tactic technique technique_id data_sources Validation
3 mitre-attack macOS defense-evasion Code Signing Policy Modification T1553.006 Windows Registry: Windows Registry Key Modific... NO OK
307 mitre-attack Linux defense-evasion Run Virtual Instance T1564.006 Windows Registry: Windows Registry Key Modific... NO OK
312 mitre-attack macOS defense-evasion Run Virtual Instance T1564.006 Windows Registry: Windows Registry Key Modific... NO OK
318 mitre-attack Linux defense-evasion Hidden File System T1564.005 Windows Registry: Windows Registry Key Modific... NO OK
321 mitre-attack macOS defense-evasion Hidden File System T1564.005 Windows Registry: Windows Registry Key Modific... NO OK
print('There are ',len(techniques_analysis_data_no_ok),' rows with inconsistent data')
There are  100  rows with inconsistent data

What is the impact of this inconsistent data from a platform and data sources perspective?

df = techniques_with_data_sources

attributes = ['platform','data_sources']

for a in attributes:
    s = df.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    s.name = a
    df=df.drop(a, axis=1).join(s).reset_index(drop=True)
    
df_2=df.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
df_3 = df_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

conditions = [(df_3['data_sources'].str.contains('windows',case=False)== True),
              (df_3['data_sources'].str.contains('powershell',case=False)== True),
              (df_3['data_sources'].str.contains('wmi',case=False)== True)]

choices = ['Windows','Windows','Windows']

df_3['Validation'] = np.select(conditions,choices,default='Other')
df_3['Num_Tech'] = 1
df_4 = df_3[df_3.Validation == 'Windows']
df_5 = df_4.groupby(['data_sources','platform'])['technique'].nunique()
df_6 = df_5.to_frame().reset_index()
alt.Chart(df_6).mark_bar().encode(x=alt.X('technique', stack="normalize"),    y='data_sources',    color='platform').properties(height=200)

There are techniques that consider Windows Error Reporting, Windows Registry, and Windows event logs as data sources and they also consider platforms like Linux and masOS. We do not need to consider this rows because those data sources can only be managed at a Windows environment. These are the techniques that we should not consider in our data base:

techniques_analysis_data_no_ok[['technique','data_sources']].drop_duplicates().sort_values(by='data_sources',ascending=True)
technique data_sources
2956 Event Triggered Execution WMI: WMI Creation
5094 OS Credential Dumping Windows Registry: Windows Registry Key Access
2415 Unsecured Credentials Windows Registry: Windows Registry Key Access
4024 Browser Extensions Windows Registry: Windows Registry Key Creation
3155 Create or Modify System Process Windows Registry: Windows Registry Key Creation
2892 Boot or Logon Autostart Execution Windows Registry: Windows Registry Key Creation
4851 Boot or Logon Initialization Scripts Windows Registry: Windows Registry Key Creation
2326 Subvert Trust Controls Windows Registry: Windows Registry Key Creation
1533 Install Root Certificate Windows Registry: Windows Registry Key Creation
4499 Indicator Removal on Host Windows Registry: Windows Registry Key Deletion
1573 Disable or Modify Tools Windows Registry: Windows Registry Key Deletion
1630 Impair Defenses Windows Registry: Windows Registry Key Deletion
1534 Install Root Certificate Windows Registry: Windows Registry Key Modific...
4604 Input Capture Windows Registry: Windows Registry Key Modific...
4498 Indicator Removal on Host Windows Registry: Windows Registry Key Modific...
4196 Two-Factor Authentication Interception Windows Registry: Windows Registry Key Modific...
307 Run Virtual Instance Windows Registry: Windows Registry Key Modific...
3701 Service Stop Windows Registry: Windows Registry Key Modific...
3683 Inhibit System Recovery Windows Registry: Windows Registry Key Modific...
3156 Create or Modify System Process Windows Registry: Windows Registry Key Modific...
318 Hidden File System Windows Registry: Windows Registry Key Modific...
388 Indicator Blocking Windows Registry: Windows Registry Key Modific...
2952 Event Triggered Execution Windows Registry: Windows Registry Key Modific...
912 Hijack Execution Flow Windows Registry: Windows Registry Key Modific...
2577 Abuse Elevation Control Mechanism Windows Registry: Windows Registry Key Modific...
1052 System Services Windows Registry: Windows Registry Key Modific...
2327 Subvert Trust Controls Windows Registry: Windows Registry Key Modific...
1365 Hide Artifacts Windows Registry: Windows Registry Key Modific...
2109 Keylogging Windows Registry: Windows Registry Key Modific...
2032 Modify Authentication Process Windows Registry: Windows Registry Key Modific...
1971 Man-in-the-Middle Windows Registry: Windows Registry Key Modific...
1629 Impair Defenses Windows Registry: Windows Registry Key Modific...
1572 Disable or Modify Tools Windows Registry: Windows Registry Key Modific...
1546 Disable or Modify System Firewall Windows Registry: Windows Registry Key Modific...
2893 Boot or Logon Autostart Execution Windows Registry: Windows Registry Key Modific...
3 Code Signing Policy Modification Windows Registry: Windows Registry Key Modific...

Without considering this inconsistent data, the final dataframe is:

techniques_analysis_data_ok = techniques_data_3[techniques_data_3.Validation == 'OK']
techniques_analysis_data_ok.head()
matrix platform tactic technique technique_id data_sources Validation
0 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Windows Registry: Windows Registry Key Modific... OK
1 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Command: Command Execution OK
2 mitre-attack Windows defense-evasion Code Signing Policy Modification T1553.006 Process: Process Creation OK
4 mitre-attack macOS defense-evasion Code Signing Policy Modification T1553.006 Command: Command Execution OK
5 mitre-attack macOS defense-evasion Code Signing Policy Modification T1553.006 Process: Process Creation OK
print('There are ',len(techniques_analysis_data_ok),' rows of data that you can play with')
There are  5693  rows of data that you can play with

18. Getting Techniques by Data Sources

This function gets techniques’ information that includes specific data sources

data_source = 'PROCESS MONITORING'
results = lift.get_techniques_by_datasources(data_source)
len(results)
21
type(results)
list
results2 = lift.get_techniques_by_datasources('pRoceSS MoniTorinG','process commAnd-linE parameters')
len(results2)
22
results2[1]
AttackPattern(type='attack-pattern', id='attack-pattern--d67adac8-e3b9-44f9-9e6d-6c2a7d69dbe4', created_by_ref='identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', created='2020-05-21T17:43:26.506Z', modified='2020-05-21T17:43:26.506Z', name='Connection Proxy', description='Adversaries may use a connection proxy to direct network traffic between systems or act as an intermediary for network communications.\n\nThe definition of a proxy can also be expanded to encompass trust relationships between networks in peer-to-peer, mesh, or trusted connections between networks consisting of hosts or systems that regularly communicate with each other.\n\nThe network may be within a single organization or across multiple organizations with trust relationships. Adversaries could use these types of relationships to manage command and control communications, to reduce the number of simultaneous outbound network connections, to provide resiliency in the face of connection loss, or to ride over existing trusted communications paths between victims to avoid suspicion. (Citation: EAttack Connection Proxy)\n\nDetection: Processes utilizing the network that do not normally have network communication or have never been seen before are suspicious. Network activities disassociated from user-driven actions from processes that normally require user direction are suspicious.\n\nAnalyze network data for uncommon data flows (e.g., a client sending significantly more data than it receives from a server or between clients that should not or often do not communicate with one another). Processes utilizing the network that do not normally have network communication or have never been seen before are suspicious. Analyze packet contents to detect communications that do not follow the expected protocol behavior for the port that is being used. (Citation: University of Birmingham C2)', kill_chain_phases=[KillChainPhase(kill_chain_name='mitre-ics-attack', phase_name='command-and-control-ics')], external_references=[ExternalReference(source_name='mitre-ics-attack', url='https://collaborate.mitre.org/attackics/index.php/Technique/T0884', external_id='T0884'), ExternalReference(source_name='EAttack Connection Proxy', description='Enterprise ATT&CK. (2018, January 11). Connection Proxy. Retrieved May 17, 2018.', url='https://attack.mitre.org/wiki/Technique/T1090'), ExternalReference(source_name='University of Birmingham C2', description='Gardiner, J.,  Cova, M., Nagaraja, S. (2014, February). Command & Control Understanding, Denying and Detecting. Retrieved April 20, 2016.', url='https://www.cpni.gov.uk/Documents/Publications/2014/2014-04-23-c2-report-birmingham.pdf')], object_marking_refs=['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'], x_mitre_data_sources=['Process use of network', 'Process monitoring', 'Packet capture', 'Netflow/Enclave netflow', 'Network protocol analysis'], x_mitre_platforms=['Windows'])