OSINT Intelligence Gathering: Building a Complete Target Profile

Master OSINT reconnaissance — theHarvester, Maltego, Shodan, subdomain enumeration, 20+ Google dorks, social media recon, certificate transparency and attack surface mapping.

lazyhackers
Mar 27, 2026 · 17 min read · 10 views

Open Source Intelligence (OSINT) is the process of collecting and analyzing information from publicly available sources to produce actionable intelligence. In the context of penetration testing and red teaming, OSINT forms the reconnaissance phase — the foundation upon which all subsequent attack phases are built. Quality OSINT reduces attack surface uncertainty, identifies high-value targets, uncovers forgotten assets, and dramatically increases the probability of finding exploitable vulnerabilities. This guide covers the complete OSINT methodology used in professional engagements.

Passive vs Active Reconnaissance

TypeDefinitionTarget VisibilityExamples
PassiveGathering information without directly contacting the targetTarget cannot detectWHOIS, Google dorking, Shodan, certificate transparency
Semi-passiveNormal-looking traffic to target infrastructureMay appear in logs as normal userVisiting the target website, DNS resolution
ActiveDirect interaction that could trigger alertsTarget likely logsPort scanning, web crawling, DNS brute force
Always confirm your rules of engagement before conducting active reconnaissance. Passive OSINT from public sources is generally legal (verify in your jurisdiction), but active reconnaissance against systems you don't own may violate the Computer Fraud and Abuse Act (CFAA) or equivalent laws.

Domain and Infrastructure Enumeration

WHOIS Analysis

# Domain registration information
whois target.com
whois -h whois.arin.net 8.8.8.8    # IP WHOIS

# Online WHOIS with history (historical WHOIS reveals past registrant data)
# https://lookup.icann.org
# https://www.domaintools.com
# https://viewdns.info

# Extract useful data:
# - Registrant email (pivot to other domains registered with same email)
# - Name servers (identify DNS hosting provider)
# - Registrant organization (verify company scope)
# - Registration/expiry dates (recently registered = possibly new infrastructure)

# Reverse WHOIS: Find all domains registered with same email/org
# DomainTools, ViewDNS Reverse Whois
whois -h whois.arin.net "n target organization"   # ARIN org lookup

DNS Enumeration

# Basic DNS queries
dig target.com A          # IPv4 address
dig target.com AAAA       # IPv6 address
dig target.com MX         # Mail servers
dig target.com NS         # Name servers
dig target.com TXT        # TXT records (SPF, DKIM, verification tokens)
dig target.com SOA        # Start of Authority (admin email, serial)
dig target.com CNAME      # Canonical name
dig target.com ANY        # All record types

# Zone transfer attempt (often disabled but worth trying)
dig axfr @ns1.target.com target.com
host -l target.com ns1.target.com
fierce --domain target.com

# Reverse DNS lookups
dig -x 192.168.1.100
for ip in $(seq 1 254); do host 192.168.1.$ip; done | grep -v "not found"

# DNS cache snooping (passive)
dig @TARGET_DNS_SERVER target.com A +norecurse

Subdomain Enumeration

# Subfinder — passive subdomain discovery
subfinder -d target.com
subfinder -d target.com -o subdomains.txt -all -recursive

# Amass — comprehensive subdomain enumeration
amass enum -passive -d target.com
amass enum -active -d target.com -brute -w /usr/share/wordlists/subdomains.txt
amass enum -d target.com -o amass_output.txt -json amass.json

# dnsx — DNS resolution and validation
subfinder -d target.com -silent | dnsx -silent -a -resp
# Resolves each subdomain and returns IP addresses

# ffuf / gobuster for DNS brute force
ffuf -u http://FUZZ.target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt
gobuster dns -d target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt

# Combine all tools
subfinder -d target.com -silent > subs.txt
amass enum -passive -d target.com -o amass.txt
cat subs.txt amass.txt | sort -u | dnsx -silent > live_subdomains.txt

Google Dorks

Google dorks use advanced search operators to find specific types of information indexed by search engines. These are among the most powerful passive OSINT techniques.

Essential Dork Syntax

OperatorFunctionExample
site:Limit to domainsite:target.com
inurl:String in URLinurl:admin
intitle:String in page titleintitle:"index of"
intext:String in bodyintext:"password"
filetype:File extensionfiletype:pdf
ext:Same as filetypeext:sql
cache:Cached versioncache:target.com
link:Pages linking to URLlink:target.com
-Excludesite:target.com -www
"exact"Exact phrase match"internal use only"

High-Value Dork Collection

# Directory listings (open file indexes)
site:target.com intitle:"index of"
site:target.com intitle:"index of /" "parent directory"

# Configuration and sensitive files
site:target.com ext:xml OR ext:conf OR ext:cnf OR ext:reg OR ext:inf OR ext:rdp OR ext:cfg OR ext:txt OR ext:ora
site:target.com ext:env
site:target.com filetype:sql
site:target.com ext:log

# Authentication pages
site:target.com inurl:login
site:target.com inurl:admin
site:target.com inurl:panel
site:target.com inurl:portal
site:target.com inurl:wp-admin

# Exposed credentials / API keys
site:target.com "api_key" OR "apikey" OR "api key"
site:target.com "secret_key" OR "private_key"
site:target.com "password" filetype:txt
site:github.com target.com "password" OR "api_key" OR "secret"

# Database exposure
site:target.com ext:sql "INSERT INTO"
site:target.com "phpMyAdmin" inurl:/phpmyadmin/
site:target.com inurl:db.php OR inurl:database.php

# Backup and temporary files
site:target.com ext:bak OR ext:backup OR ext:old OR ext:orig OR ext:temp OR ext:tmp
site:target.com intitle:"backup" filetype:zip OR filetype:tar.gz

# Error messages revealing info
site:target.com "Warning: mysql_connect()"
site:target.com "ORA-" intext:"Oracle error"
site:target.com "Microsoft OLE DB Provider for SQL Server"

# Exposed documents
site:target.com filetype:pdf "confidential" OR "internal"
site:target.com filetype:xlsx OR filetype:xls
site:target.com filetype:pptx employee names

# Version control exposure
site:target.com inurl:.git
site:target.com inurl:.svn

# Webcam / SCADA / IoT
inurl:/view/index.shtml camera
inurl:ViewerFrame?Mode= webcam
intitle:"Network Camera" inurl:/view/index.shtml

# LinkedIn employee enumeration
site:linkedin.com "target company" engineer
site:linkedin.com/in "target.com" developer

# Paste sites
site:pastebin.com target.com
site:paste.ee target.com
site:hastebin.com target.com

theHarvester — Email and Infrastructure Discovery

# theHarvester gathers emails, subdomains, hosts, names, and IPs
# Uses multiple data sources: Google, Bing, LinkedIn, Shodan, etc.

# Basic usage
theHarvester -d target.com -b google
theHarvester -d target.com -b bing
theHarvester -d target.com -b linkedin

# Combine multiple sources
theHarvester -d target.com -b google,bing,yahoo,linkedin,twitter

# All sources
theHarvester -d target.com -b all

# With proxy (for OPSEC)
theHarvester -d target.com -b google --proxy 127.0.0.1:8080

# Save results
theHarvester -d target.com -b all -f output

# Limit results
theHarvester -d target.com -b google -l 500

# Output: email addresses, subdomains, virtual hosts, IP addresses, employee names

Shodan and Censys

Shodan and Censys continuously scan the internet and index banner/certificate information from open ports — providing passive reconnaissance of any internet-facing asset.

Shodan CLI and Search Operators

# Install Shodan CLI
pip install shodan
shodan init YOUR_API_KEY

# Search Shodan
shodan search "target.com"
shodan search 'hostname:target.com'
shodan search 'org:"Target Corporation"'
shodan search 'ssl:"target.com"'

# Host information
shodan host 8.8.8.8

# Count results
shodan count 'org:"Target Corp" http.title:"Login"'

# Download all results
shodan download results.json 'org:"Target Corp"'
shodan parse results.json --fields ip_str,port,org

# Shodan web search operators:
# hostname:target.com           -- search by hostname/domain
# org:"Company Name"            -- by organization
# net:192.168.0.0/24            -- by CIDR range
# port:8080                     -- specific port
# http.title:"Router Admin"     -- by HTTP title
# ssl.cert.subject.cn:target.com -- SSL certificate CN
# product:"Apache httpd"        -- by software product
# version:"2.4.49"              -- specific version (find vulnerable)
# vuln:CVE-2021-41773           -- known vulnerable hosts
# "default password"            -- devices with default passwords
# country:US city:"New York"    -- geographic filtering
# before:2024-01-01             -- indexed before date
# after:2023-01-01              -- indexed after date

Censys Search

# Censys CLI
pip install censys
censys config   # Set API credentials

# Search hosts
censys search "target.com" --index-type hosts
censys view 8.8.8.8 --index-type hosts

# Search certificates
censys search "target.com" --index-type certificates

# Censys web operators:
# parsed.names:target.com        -- certificate name
# parsed.subject.organization:   -- organization in cert
# services.port:                 -- open port
# services.transport_protocol:   -- TCP/UDP
# ip:192.168.1.100               -- specific IP

Certificate Transparency Logs

Certificate Transparency (CT) logs are public records of every TLS certificate issued. This is one of the most reliable sources for discovering subdomains — every subdomain with an HTTPS cert appears here.

# crt.sh — web interface and API
# https://crt.sh/?q=%.target.com

# API query
curl -s "https://crt.sh/?q=%.target.com&output=json" | jq -r '.[].name_value' | sort -u

# More structured query
curl -s "https://crt.sh/?q=%.target.com&output=json" | \
  jq -r '.[].name_value' | \
  sed 's/\*\.//g' | \
  sort -u | \
  grep -v "target.com$"    # exclude main domain

# Censys CT search
censys search "parsed.names:target.com" --index-type certificates \
  --fields parsed.names,parsed.subject.common_name,metadata.added_at

# Facebook Certificate Transparency API
curl -s "https://graph.facebook.com/certificates?query=target.com&fields=cert_hash_sha256,domains,not_before,not_after&limit=100&access_token=ACCESS_TOKEN"

LinkedIn and Social Media Recon

# LinkedIn employee enumeration
# Useful for:
# - Building username wordlists
# - Identifying technology stack (job descriptions)
# - Finding employees for phishing
# - Discovering organizational structure

# Manual LinkedIn search:
# Search: "target company" security engineer
# Filter by: Company > Target Company > Current > Employee

# Cross-reference email format:
# Find a few known emails (email hunter, VT metadata)
# Extrapolate format: [email protected] or [email protected]

# hunter.io — email format discovery
curl "https://api.hunter.io/v2/domain-search?domain=target.com&api_key=YOUR_KEY" | jq .

# Email format verification
curl "https://api.hunter.io/v2/[email protected]&api_key=YOUR_KEY"

# Social media recon targets:
# Twitter/X: Employee posts mentioning company tech
# GitHub: Employees with company email, public repos
# Stack Overflow: Technical questions revealing stack
# Job postings: Technology stack, security tools used

Wayback Machine and Web Archive Analysis

# Wayback Machine: https://web.archive.org
# Reveals historical content, removed pages, old endpoints

# Waybackurls — extract all archived URLs
waybackurls target.com | sort -u > wayback_urls.txt

# Filter for interesting paths
cat wayback_urls.txt | grep -E "\.(php|asp|aspx|jsp|cgi)$"
cat wayback_urls.txt | grep -E "(admin|login|upload|api|debug)"
cat wayback_urls.txt | grep -E "\.(bak|sql|env|conf|zip|tar|gz)$"

# GAU (Get All URLs) — combines multiple archive sources
gau target.com --subs | sort -u > gau_urls.txt

# Combine tools
(waybackurls target.com; gau target.com) | sort -u | grep -v "png\|jpg\|gif\|ico\|css\|js"

Building the Attack Surface Map

# Comprehensive OSINT workflow

# Step 1: Seed information
TARGET="target.com"
ORG="Target Corporation"

# Step 2: Domain/subdomain discovery
subfinder -d $TARGET -o subs_subfinder.txt
amass enum -passive -d $TARGET -o subs_amass.txt
curl -s "https://crt.sh/?q=%.${TARGET}&output=json" | jq -r '.[].name_value' | sort -u > subs_crt.txt
cat subs_*.txt | sort -u > all_subs.txt

# Step 3: Live host detection
cat all_subs.txt | dnsx -silent -a -resp -o live_subs.txt
httpx -l live_subs.txt -status-code -title -tech-detect -o live_web.txt

# Step 4: Port scanning
nmap -sV -T4 --top-ports 1000 -iL live_ips.txt -oA nmap_results

# Step 5: Web crawling
gospider -S live_web.txt -o crawl_output -c 10 -d 3

# Step 6: Screenshot for visual review
gowitness file -f live_web.txt --screenshot-path screenshots/

# Final output: Complete asset inventory
# - All domains and subdomains
# - Open ports and services
# - Technology stack per host
# - Screenshots for visual triage
# - Historical URLs for additional attack surface
Amass and Subfinder are complementary — run both. Amass is more thorough but slower; Subfinder is faster for passive sources. Always validate discovered subdomains with dnsx or httpx to confirm they're live before reporting. Dead subdomains may be candidates for subdomain takeover if their DNS records point to deprovisioned cloud resources.
Reactions

Related Articles