Open Source Intelligence (OSINT) is the process of collecting and analyzing information from publicly available sources to produce actionable intelligence. In the context of penetration testing and red teaming, OSINT forms the reconnaissance phase — the foundation upon which all subsequent attack phases are built. Quality OSINT reduces attack surface uncertainty, identifies high-value targets, uncovers forgotten assets, and dramatically increases the probability of finding exploitable vulnerabilities. This guide covers the complete OSINT methodology used in professional engagements.
Passive vs Active Reconnaissance
| Type | Definition | Target Visibility | Examples |
|---|---|---|---|
| Passive | Gathering information without directly contacting the target | Target cannot detect | WHOIS, Google dorking, Shodan, certificate transparency |
| Semi-passive | Normal-looking traffic to target infrastructure | May appear in logs as normal user | Visiting the target website, DNS resolution |
| Active | Direct interaction that could trigger alerts | Target likely logs | Port scanning, web crawling, DNS brute force |
Domain and Infrastructure Enumeration
WHOIS Analysis
# Domain registration information
whois target.com
whois -h whois.arin.net 8.8.8.8 # IP WHOIS
# Online WHOIS with history (historical WHOIS reveals past registrant data)
# https://lookup.icann.org
# https://www.domaintools.com
# https://viewdns.info
# Extract useful data:
# - Registrant email (pivot to other domains registered with same email)
# - Name servers (identify DNS hosting provider)
# - Registrant organization (verify company scope)
# - Registration/expiry dates (recently registered = possibly new infrastructure)
# Reverse WHOIS: Find all domains registered with same email/org
# DomainTools, ViewDNS Reverse Whois
whois -h whois.arin.net "n target organization" # ARIN org lookup
DNS Enumeration
# Basic DNS queries
dig target.com A # IPv4 address
dig target.com AAAA # IPv6 address
dig target.com MX # Mail servers
dig target.com NS # Name servers
dig target.com TXT # TXT records (SPF, DKIM, verification tokens)
dig target.com SOA # Start of Authority (admin email, serial)
dig target.com CNAME # Canonical name
dig target.com ANY # All record types
# Zone transfer attempt (often disabled but worth trying)
dig axfr @ns1.target.com target.com
host -l target.com ns1.target.com
fierce --domain target.com
# Reverse DNS lookups
dig -x 192.168.1.100
for ip in $(seq 1 254); do host 192.168.1.$ip; done | grep -v "not found"
# DNS cache snooping (passive)
dig @TARGET_DNS_SERVER target.com A +norecurse
Subdomain Enumeration
# Subfinder — passive subdomain discovery
subfinder -d target.com
subfinder -d target.com -o subdomains.txt -all -recursive
# Amass — comprehensive subdomain enumeration
amass enum -passive -d target.com
amass enum -active -d target.com -brute -w /usr/share/wordlists/subdomains.txt
amass enum -d target.com -o amass_output.txt -json amass.json
# dnsx — DNS resolution and validation
subfinder -d target.com -silent | dnsx -silent -a -resp
# Resolves each subdomain and returns IP addresses
# ffuf / gobuster for DNS brute force
ffuf -u http://FUZZ.target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt
gobuster dns -d target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt
# Combine all tools
subfinder -d target.com -silent > subs.txt
amass enum -passive -d target.com -o amass.txt
cat subs.txt amass.txt | sort -u | dnsx -silent > live_subdomains.txt
Google Dorks
Google dorks use advanced search operators to find specific types of information indexed by search engines. These are among the most powerful passive OSINT techniques.
Essential Dork Syntax
| Operator | Function | Example |
|---|---|---|
site: | Limit to domain | site:target.com |
inurl: | String in URL | inurl:admin |
intitle: | String in page title | intitle:"index of" |
intext: | String in body | intext:"password" |
filetype: | File extension | filetype:pdf |
ext: | Same as filetype | ext:sql |
cache: | Cached version | cache:target.com |
link: | Pages linking to URL | link:target.com |
- | Exclude | site:target.com -www |
"exact" | Exact phrase match | "internal use only" |
High-Value Dork Collection
# Directory listings (open file indexes)
site:target.com intitle:"index of"
site:target.com intitle:"index of /" "parent directory"
# Configuration and sensitive files
site:target.com ext:xml OR ext:conf OR ext:cnf OR ext:reg OR ext:inf OR ext:rdp OR ext:cfg OR ext:txt OR ext:ora
site:target.com ext:env
site:target.com filetype:sql
site:target.com ext:log
# Authentication pages
site:target.com inurl:login
site:target.com inurl:admin
site:target.com inurl:panel
site:target.com inurl:portal
site:target.com inurl:wp-admin
# Exposed credentials / API keys
site:target.com "api_key" OR "apikey" OR "api key"
site:target.com "secret_key" OR "private_key"
site:target.com "password" filetype:txt
site:github.com target.com "password" OR "api_key" OR "secret"
# Database exposure
site:target.com ext:sql "INSERT INTO"
site:target.com "phpMyAdmin" inurl:/phpmyadmin/
site:target.com inurl:db.php OR inurl:database.php
# Backup and temporary files
site:target.com ext:bak OR ext:backup OR ext:old OR ext:orig OR ext:temp OR ext:tmp
site:target.com intitle:"backup" filetype:zip OR filetype:tar.gz
# Error messages revealing info
site:target.com "Warning: mysql_connect()"
site:target.com "ORA-" intext:"Oracle error"
site:target.com "Microsoft OLE DB Provider for SQL Server"
# Exposed documents
site:target.com filetype:pdf "confidential" OR "internal"
site:target.com filetype:xlsx OR filetype:xls
site:target.com filetype:pptx employee names
# Version control exposure
site:target.com inurl:.git
site:target.com inurl:.svn
# Webcam / SCADA / IoT
inurl:/view/index.shtml camera
inurl:ViewerFrame?Mode= webcam
intitle:"Network Camera" inurl:/view/index.shtml
# LinkedIn employee enumeration
site:linkedin.com "target company" engineer
site:linkedin.com/in "target.com" developer
# Paste sites
site:pastebin.com target.com
site:paste.ee target.com
site:hastebin.com target.com
theHarvester — Email and Infrastructure Discovery
# theHarvester gathers emails, subdomains, hosts, names, and IPs
# Uses multiple data sources: Google, Bing, LinkedIn, Shodan, etc.
# Basic usage
theHarvester -d target.com -b google
theHarvester -d target.com -b bing
theHarvester -d target.com -b linkedin
# Combine multiple sources
theHarvester -d target.com -b google,bing,yahoo,linkedin,twitter
# All sources
theHarvester -d target.com -b all
# With proxy (for OPSEC)
theHarvester -d target.com -b google --proxy 127.0.0.1:8080
# Save results
theHarvester -d target.com -b all -f output
# Limit results
theHarvester -d target.com -b google -l 500
# Output: email addresses, subdomains, virtual hosts, IP addresses, employee names
Shodan and Censys
Shodan and Censys continuously scan the internet and index banner/certificate information from open ports — providing passive reconnaissance of any internet-facing asset.
Shodan CLI and Search Operators
# Install Shodan CLI
pip install shodan
shodan init YOUR_API_KEY
# Search Shodan
shodan search "target.com"
shodan search 'hostname:target.com'
shodan search 'org:"Target Corporation"'
shodan search 'ssl:"target.com"'
# Host information
shodan host 8.8.8.8
# Count results
shodan count 'org:"Target Corp" http.title:"Login"'
# Download all results
shodan download results.json 'org:"Target Corp"'
shodan parse results.json --fields ip_str,port,org
# Shodan web search operators:
# hostname:target.com -- search by hostname/domain
# org:"Company Name" -- by organization
# net:192.168.0.0/24 -- by CIDR range
# port:8080 -- specific port
# http.title:"Router Admin" -- by HTTP title
# ssl.cert.subject.cn:target.com -- SSL certificate CN
# product:"Apache httpd" -- by software product
# version:"2.4.49" -- specific version (find vulnerable)
# vuln:CVE-2021-41773 -- known vulnerable hosts
# "default password" -- devices with default passwords
# country:US city:"New York" -- geographic filtering
# before:2024-01-01 -- indexed before date
# after:2023-01-01 -- indexed after date
Censys Search
# Censys CLI
pip install censys
censys config # Set API credentials
# Search hosts
censys search "target.com" --index-type hosts
censys view 8.8.8.8 --index-type hosts
# Search certificates
censys search "target.com" --index-type certificates
# Censys web operators:
# parsed.names:target.com -- certificate name
# parsed.subject.organization: -- organization in cert
# services.port: -- open port
# services.transport_protocol: -- TCP/UDP
# ip:192.168.1.100 -- specific IP
Certificate Transparency Logs
Certificate Transparency (CT) logs are public records of every TLS certificate issued. This is one of the most reliable sources for discovering subdomains — every subdomain with an HTTPS cert appears here.
# crt.sh — web interface and API
# https://crt.sh/?q=%.target.com
# API query
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u
# More structured query
curl -s "https://crt.sh/?q=%.target.com&output=json" | \
jq -r '.[].name_value' | \
sed 's/\*\.//g' | \
sort -u | \
grep -v "target.com$" # exclude main domain
# Censys CT search
censys search "parsed.names:target.com" --index-type certificates \
--fields parsed.names,parsed.subject.common_name,metadata.added_at
# Facebook Certificate Transparency API
curl -s "https://graph.facebook.com/certificates?query=target.com&fields=cert_hash_sha256,domains,not_before,not_after&limit=100&access_token=ACCESS_TOKEN"
LinkedIn and Social Media Recon
# LinkedIn employee enumeration
# Useful for:
# - Building username wordlists
# - Identifying technology stack (job descriptions)
# - Finding employees for phishing
# - Discovering organizational structure
# Manual LinkedIn search:
# Search: "target company" security engineer
# Filter by: Company > Target Company > Current > Employee
# Cross-reference email format:
# Find a few known emails (email hunter, VT metadata)
# Extrapolate format: [email protected] or [email protected]
# hunter.io — email format discovery
curl "https://api.hunter.io/v2/domain-search?domain=target.com&api_key=YOUR_KEY" | jq .
# Email format verification
curl "https://api.hunter.io/v2/[email protected]&api_key=YOUR_KEY"
# Social media recon targets:
# Twitter/X: Employee posts mentioning company tech
# GitHub: Employees with company email, public repos
# Stack Overflow: Technical questions revealing stack
# Job postings: Technology stack, security tools used
Wayback Machine and Web Archive Analysis
# Wayback Machine: https://web.archive.org
# Reveals historical content, removed pages, old endpoints
# Waybackurls — extract all archived URLs
waybackurls target.com | sort -u > wayback_urls.txt
# Filter for interesting paths
cat wayback_urls.txt | grep -E "\.(php|asp|aspx|jsp|cgi)$"
cat wayback_urls.txt | grep -E "(admin|login|upload|api|debug)"
cat wayback_urls.txt | grep -E "\.(bak|sql|env|conf|zip|tar|gz)$"
# GAU (Get All URLs) — combines multiple archive sources
gau target.com --subs | sort -u > gau_urls.txt
# Combine tools
(waybackurls target.com; gau target.com) | sort -u | grep -v "png\|jpg\|gif\|ico\|css\|js"
Building the Attack Surface Map
# Comprehensive OSINT workflow
# Step 1: Seed information
TARGET="target.com"
ORG="Target Corporation"
# Step 2: Domain/subdomain discovery
subfinder -d $TARGET -o subs_subfinder.txt
amass enum -passive -d $TARGET -o subs_amass.txt
curl -s "https://crt.sh/?q=%.${TARGET}&output=json" | jq -r '.[].name_value' | sort -u > subs_crt.txt
cat subs_*.txt | sort -u > all_subs.txt
# Step 3: Live host detection
cat all_subs.txt | dnsx -silent -a -resp -o live_subs.txt
httpx -l live_subs.txt -status-code -title -tech-detect -o live_web.txt
# Step 4: Port scanning
nmap -sV -T4 --top-ports 1000 -iL live_ips.txt -oA nmap_results
# Step 5: Web crawling
gospider -S live_web.txt -o crawl_output -c 10 -d 3
# Step 6: Screenshot for visual review
gowitness file -f live_web.txt --screenshot-path screenshots/
# Final output: Complete asset inventory
# - All domains and subdomains
# - Open ports and services
# - Technology stack per host
# - Screenshots for visual triage
# - Historical URLs for additional attack surface