Threat intelligence (TI) transforms raw data from security incidents, breach disclosures, and adversary activity into actionable knowledge that organizations can use to defend themselves proactively. OSINT-based threat intelligence leverages publicly available sources — including breach databases, dark web forums, paste sites, and adversary infrastructure — to identify risks, track threat actors, and monitor for indicators of compromise. This article covers the professional threat intelligence lifecycle and the OSINT techniques that power it.
The Threat Intelligence Lifecycle
Professional threat intelligence follows a structured cycle to ensure collected data becomes actionable intelligence:
- Planning and Direction: Define intelligence requirements. What threats are you monitoring? Which threat actors target your sector? What assets need protection?
- Collection: Gather raw data from OSINT sources, dark web, technical feeds, and human intelligence
- Processing: Normalize, deduplicate, and structure collected data for analysis
- Analysis: Apply context, correlation, and expert judgment to produce intelligence from data
- Dissemination: Share intelligence in appropriate formats (STIX/TAXII, PDF reports, SIEM rules)
- Feedback: Evaluate intelligence quality and refine collection requirements
Intelligence Types
| Type | Description | Consumers | Example |
|---|---|---|---|
| Strategic | Long-term trends, threat actor campaigns | C-suite, board | Nation-state APT targeting your sector |
| Operational | Upcoming or ongoing attacks | Security management | Ransomware group targeting critical infrastructure |
| Tactical | TTPs, MITRE ATT&CK techniques | SOC, IR teams | LockBit uses specific persistence mechanism |
| Technical | Specific IOCs: IPs, hashes, domains | SIEM, EDR, firewall | C2 IP: 198.51.100.1, Hash: abc123... |
Breach Data Sources and Analysis
HaveIBeenPwned API
# HaveIBeenPwned (HIBP) API — check if accounts appear in breaches
# API key required for bulk queries: https://haveibeenpwned.com/API/Key
# Check if an email was breached
curl -H "hibp-api-key: YOUR_API_KEY" \
"https://haveibeenpwned.com/api/v3/breachedaccount/[email protected]"
# Response includes breach names, dates, data classes
# Example response:
# [{"Name":"Adobe","Title":"Adobe","Domain":"adobe.com","BreachDate":"2013-10-04",
# "DataClasses":["Email addresses","Password hints","Passwords","Usernames"]}]
# Check all breaches
curl -H "hibp-api-key: YOUR_API_KEY" \
"https://haveibeenpwned.com/api/v3/breaches"
# Check pastes (Pastebin, GitHub, etc.)
curl -H "hibp-api-key: YOUR_API_KEY" \
"https://haveibeenpwned.com/api/v3/pasteaccount/[email protected]"
# k-Anonymity password check (privacy-preserving)
# Hash the password with SHA1, send first 5 chars
echo -n "password123" | sha1sum | head -c 5 | tr 'a-z' 'A-Z'
# → CBFDA
curl "https://api.pwnedpasswords.com/range/CBFDA"
# Response contains all hashes starting with CBFDA and breach counts
# Match against your full hash to check if password was breached
# Python implementation
import hashlib, requests
def is_password_pwned(password):
sha1 = hashlib.sha1(password.encode()).hexdigest().upper()
prefix, suffix = sha1[:5], sha1[5:]
response = requests.get(f"https://api.pwnedpasswords.com/range/{prefix}")
return suffix in response.text
Dehashed — Leaked Credential Search
# Dehashed aggregates breach data with search by email, username, domain, IP
# API: https://www.dehashed.com/docs
# Search by email
curl -H "Authorization: Basic BASE64(email:api_key)" \
"https://api.dehashed.com/search?query=email:[email protected]"
# Search by domain (finds all credentials with @target.com emails)
curl -H "Authorization: Basic BASE64(email:api_key)" \
"https://api.dehashed.com/search?query=domain:target.com"
# Search by username
curl -H "Authorization: Basic BASE64(email:api_key)" \
"https://api.dehashed.com/search?query=username:johndoe"
# Search by IP address (find other accounts from same IP)
curl -H "Authorization: Basic BASE64(email:api_key)" \
"https://api.dehashed.com/search?query=ip_address:192.168.1.1"
# Response fields:
# id, email, username, password, hashed_password, name,
# vin, address, phone, database_name, leaked_date
Analyzing Leaked Credential Dumps
# After obtaining breach data (through authorized CTI work):
# Extract domain-specific credentials
grep "@target.com" breach_data.txt | cut -d':' -f1,2 > target_creds.txt
# Count by password type
grep "@target.com" breach_data.txt | awk -F: '{print $2}' | sort | uniq -c | sort -rn
# Find plaintext passwords vs hashes
grep -E ":[a-fA-F0-9]{32}$" target_creds.txt # MD5 hashes
grep -E ":[a-fA-F0-9]{40}$" target_creds.txt # SHA1 hashes
grep -E ":\$2[aby]\$" target_creds.txt # bcrypt hashes
grep -E ":[^:$]{6,20}$" target_creds.txt # likely plaintext (6-20 chars)
# Identify password patterns for threat intelligence
# Common patterns reveal organizational password policies
# e.g., CompanyName2023! = basic seasonal pattern
Dark Web Monitoring
Tor Network Fundamentals
The Tor network provides anonymized access to .onion services. These hidden services host both legitimate privacy tools and criminal marketplaces, forums, and data leak sites.
Tor OPSEC for Investigators
# Essential OPSEC before accessing dark web:
# 1. Use a dedicated, air-gapped or VM-based environment
# 2. Never access dark web from work or home network
# 3. Use Tails OS (amnesic OS, leaves no traces)
# 4. Never login to personal accounts while on Tor
# 5. Use Tor Browser — never regular browser over Tor
# 6. Disable JavaScript where possible (Security Level: Safest)
# 7. Use a bridge if in a censored country
# 8. Document everything for legal compliance
# Tails OS setup:
# Download from https://tails.boum.org (verify GPG signature)
# Boot from USB — runs in RAM, leaves no disk traces
# All traffic automatically routed through Tor
# Tor Browser configuration for investigators:
# Security Level: Safest (NoScript blocks JS by default)
# New Identity when switching investigations (Ctrl+Shift+U)
# Never maximize browser window (prevents screen size fingerprinting)
Dark Web Search Engines and Indexes
# Dark web search engines (accessible via Tor Browser):
# Ahmia: http://ahmia.fi (surface web) or ahmia.fi .onion version
# Torch: http://xmh57jrknzkhv6y3ls3ubitzfqnkrwxhopf5ayieeo2through7t5k6uyd.onion
# Haystak: Haystak .onion — indexes millions of .onion pages
# DarkSearch: https://darksearch.io (surface clearnet version)
# Ahmia from command line
curl "https://ahmia.fi/search/?q=target+company+breach"
# Dark web threat intelligence sources:
# RaidForums archive sites (after seizure)
# BreachForums.is (clearnet mirror often available)
# Ransomware group leak sites (LockBit, ALPHV/BlackCat, etc.)
# Stolen credentials markets
# Underground hacking forums
# Ransomware leak sites monitoring:
# Most ransomware groups maintain public-facing .onion sites listing victims
# Monitor for your organization or clients: DDoSecrets, ransomwatch.telemetry.ltd
Monitoring Ransomware Leak Sites
# RansomWatch — automated ransomware group monitoring
# https://ransomwatch.telemetry.ltd (clearnet)
# Aggregates posts from 60+ ransomware group sites
# Manual monitoring workflow:
# 1. Check known ransomware leak sites weekly
# 2. Search for target organization name
# 3. Alert: Any mention before public disclosure = early warning
# 4. Document: Screenshot, timestamp, URL
# Known ransomware .onion sites (change frequently — verify current status)
# LockBit, ALPHV/BlackCat, Cl0p, RansomHub, Akira, etc.
# Vx-underground and Krebs on Security track new groups
Paste Site Monitoring
# Paste sites are frequently used to distribute leaked credentials and data
# PasteHunter — automated paste site monitoring
# https://github.com/kevthehermit/PasteHunter
pip install pastehunter
pastehunter --config config.json # searches pastes for keywords
# psbdmp.ws — archive of removed Pastebin pastes
curl "https://psbdmp.ws/api/search/TARGET_KEYWORD"
# Pastebin monitoring (requires account for some features)
# https://pastebin.com/api — search API for Pro accounts
# Sites to monitor:
# pastebin.com, paste.ee, ghostbin.com
# hastebin.com, privatebin.net
# rentry.co, paste2.org
# Manual search strings for each target:
"@target.com" password
"target.com" "confidential"
"TARGET_COMPANY" "leaked"
"api_key" "target.com"
TARGET_DOMAIN credentials
# Commercial solutions for automated monitoring:
# Intel471, Recorded Future, Digital Shadows, Flashpoint
# BreachIQ, SpyCloud (focuses on employee credentials)
OSINT Intelligence Frameworks
MISP (Malware Information Sharing Platform)
# MISP is an open-source threat intelligence platform
# Install MISP: https://www.misp-project.org/download/
# Docker: https://github.com/MISP/misp-docker
# Key MISP concepts:
# Events: Container for threat intelligence (one incident/campaign)
# Attributes: Individual IOCs (IP, domain, hash, email, URL)
# Tags: Taxonomy labels (TLP, PAP, threat actor, sector)
# Galaxies: Structured knowledge (MITRE ATT&CK, threat actors)
# Feeds: External threat intelligence feeds (CIRCL, Abuse.ch, etc.)
# MISP feeds to enable (Settings > Feeds):
# CIRCL OSINT Feed
# Abuse.ch URLhaus
# Abuse.ch MalwareBazaar
# PhishTank
# Emerging Threats
# Python API usage
from pymisp import ExpandedPyMISP, MISPEvent, MISPAttribute
misp = ExpandedPyMISP('https://your-misp-instance.com', 'YOUR_API_KEY')
# Create event
event = MISPEvent()
event.info = "Target Corp phishing campaign 2024-03"
event.distribution = 1 # Community
event.threat_level_id = 2 # Medium
# Add attributes
attr = MISPAttribute()
attr.type = 'ip-dst'
attr.value = '198.51.100.42'
attr.comment = 'C2 server'
event.add_attribute(attr)
# Add domain
event.add_attribute('domain', 'malicious-domain.example.com')
# Add hash
event.add_attribute('md5', 'd41d8cd98f00b204e9800998ecf8427e')
# Create event
result = misp.add_event(event)
OpenCTI — Open Cyber Threat Intelligence Platform
# OpenCTI uses STIX2 as its data model
# Install via Docker: https://github.com/OpenCTI-Platform/docker
# OpenCTI connectors for data ingestion:
# MITRE ATT&CK (built-in)
# VirusTotal
# Shodan
# OpenCTI datasets
# AlienVault OTX
# MISP
# Python client
from pycti import OpenCTIApiClient
api = OpenCTIApiClient('https://your-opencti.com', 'YOUR_API_KEY')
# Create threat actor
threat_actor = api.threat_actor.create(
name="APT Group Name",
description="Nation-state threat actor",
first_seen="2020-01-01T00:00:00.000Z",
last_seen="2024-01-01T00:00:00.000Z",
sophistication="advanced",
resource_level="government",
primary_motivation="national-security"
)
# Add indicator
indicator = api.indicator.create(
name="Malicious IP",
pattern="[ipv4-addr:value = '198.51.100.42']",
pattern_type="stix",
valid_from="2024-01-01T00:00:00.000Z",
main_observable_type="IPv4-Addr"
)
Attribution Techniques
Attribution — determining who is responsible for a cyberattack — is one of the most complex and contested activities in threat intelligence. It requires correlating multiple data points across technical and non-technical dimensions.
Technical Attribution Indicators
# Infrastructure overlap
# Check if known malicious IPs/domains share registration data
# Tools: DomainTools Iris, Maltego, Passivetotal (Recorded Future)
# Passive DNS: Find other domains that resolved to same IP
# https://api.passivetotal.org/v2/dns/passive
curl -u "user:apikey" "https://api.passivetotal.org/v2/dns/passive?query=198.51.100.42"
# Malware code similarity
# Find code reuse between known malware families and new samples
# Tools: BinDiff, TLSH, ssdeep fuzzy hashing
# Fuzzy hash comparison
ssdeep -b sample1.exe > hash1.txt
ssdeep -bm hash1.txt sample2.exe # compare sample2 against hash1
# TLSH (trend micro locality sensitive hash)
tlsh -f sample1.exe
tlsh -c sample1.exe sample2.exe # compare
# TTP correlation (MITRE ATT&CK)
# If a new campaign uses same techniques as known APT, correlate in ATT&CK Navigator
# https://mitre-attack.github.io/attack-navigator/
# Language artifacts
# Error messages, debug strings, PDB paths, code comments in specific language
# Keyboard layout fingerprinting (charset, encoding)
# Time zone of compilation timestamps / activity
Open Source Attribution Resources
# MITRE ATT&CK Groups
# https://attack.mitre.org/groups/
# Comprehensive TTP profiles for 100+ tracked threat actors
# Malpedia (malware family encyclopedia)
# https://malpedia.caad.fkie.fraunhofer.de
# Free API access with registration
# Vx-underground (malware samples and reports)
# https://vx-underground.org
# Malware sample repository and threat actor papers
# Mandiant Advantage Threat Intelligence
# Public reports: https://www.mandiant.com/resources/blog
# CISA Known Exploited Vulnerabilities
curl "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json" | \
jq '.vulnerabilities[] | select(.vendorProject == "Microsoft") | .cveID, .vulnerabilityName'
Integrating Threat Intelligence into Defense
# STIX/TAXII — standardized sharing format
# Import CTI feeds into your SIEM via TAXII
# Sigma rules from CTI (GitHub.com/SigmaHQ/sigma)
# Convert threat actor TTPs to detection rules
# Sigma → Splunk, Elastic, Microsoft Sentinel
# IOC enrichment workflow:
# 1. Receive alert with suspicious IP/domain/hash
# 2. Query MISP/OpenCTI for existing intelligence
# 3. Query VirusTotal, Shodan for additional context
# 4. Query HIBP if user credentials potentially compromised
# 5. Add new intelligence back to platform for future correlation
# Automated enrichment with Python
import requests
def enrich_ip(ip):
results = {}
# VirusTotal
vt = requests.get(f"https://www.virustotal.com/api/v3/ip_addresses/{ip}",
headers={"x-apikey": "VT_API_KEY"})
results['virustotal'] = vt.json()
# Shodan
shodan = requests.get(f"https://api.shodan.io/shodan/host/{ip}?key=SHODAN_KEY")
results['shodan'] = shodan.json()
# AbuseIPDB
abuse = requests.get("https://api.abuseipdb.com/api/v2/check",
headers={"Key": "ABUSEIPDB_KEY", "Accept": "application/json"},
params={"ipAddress": ip, "maxAgeInDays": "90"})
results['abuseipdb'] = abuse.json()
return results