API Security

API Reconnaissance

Mapping the API attack surface before you send a single attack: OpenAPI spec discovery, JS bundle mining, recovering historical URLs from the Wayback Machine, subdomain enum, version sprawl, GraphQL introspection, and wordlist brute-force — seven recon workflows, walked through one at a time.

openapi · js mining · wayback · subdomain · version sprawl · graphql · ffuf

Overview — recon is half the engagement

Recon is the first hour of every API engagement and the last hour you should ever skip. Before sending a single attack, you want to know: how many endpoints exist, what auth they expect, what versions are still alive, what dev/staging/legacy deployments share the same code, what hidden parameters change behaviour, what the team forgot to take down. Get this right and the attacks that follow are targeted; skip it and you're fuzzing the wrong surface for a week.

We'll go through seven recon techniques in roughly the order you'd reach for them: spec discovery, JS bundle mining, wayback historical URLs, subdomain enumeration, version sweeping, GraphQL introspection, and wordlist brute-force. None of them need an authenticated session, and a few don't even send a packet to the target.

What "API recon" produces


Output	Why it matters
Endpoint inventory	Every URL × method the target exposes — official + forgotten + dev/staging. Feed into Postman / Burp for systematic testing.
Auth scheme map	Which endpoints expect API key vs Bearer vs mTLS vs nothing. Where weak auth is permitted (older versions, debug routes).
Schema awareness	Field names — including fields the UI never displays. Often more sensitive than discovered endpoints (passwordHash, internalNotes, etc.).
Tech fingerprint	Framework (Django REST, Express, FastAPI, Spring), proxy (nginx, Cloudflare), language. Drives which CVEs and behaviours to test for.
Secrets / leaks	Hardcoded keys in JS bundles, deprecated dev URLs in old commits, debug logging output. Often the engagement ends here.
Subdomain + version sprawl	Same API deployed on 5 different hosts at 3 different versions. The weakest deployment dictates the strength of the system.

Recon hygiene

⚠ Only fully passive recon is truly free — crt.sh, the Wayback Machine, public GitHub. The moment you start active probing (ffuf, kiterunner, introspection queries) you're showing up in the target's logs, so make sure your engagement scope actually covers it. Bug-bounty programs usually allow active recon, but check before you assume. And rate-limit yourself — your job is to find bugs, not to knock the thing over.

Spec discovery — find the OpenAPI, get a free inventory

If the target ships an OpenAPI / Swagger spec, recon shortcuts by ~10×. Every endpoint, parameter, auth scheme, and example payload is in one JSON file. Worth a few minutes of focused fuzzing to find.

Common spec paths to try

shell

# OpenAPI / Swagger
/openapi.json
/openapi.yaml
/swagger.json
/swagger.yaml
/v3/api-docs            # springfox / springdoc default
/v2/api-docs            # older springfox
/api-docs
/api/swagger.json
/api/v1/swagger.json
/api/v2/openapi.json
/docs/openapi.yaml
/docs/api/spec.json
/_api/spec
/_openapi
/spec
/api/v3/spec.yaml

# UI-driven explorers (often unprotected)
/swagger-ui/
/swagger-ui/index.html
/redoc
/redoc/index.html
/api/docs
/docs
/scalar
/rapidoc

# GraphQL playgrounds
/graphql
/graphiql
/playground

# RAML / WSDL legacy
/api.raml
/services?wsdl

Wordlist + ffuf

bash

# fuzz a curated wordlist of API doc paths
ffuf -u https://api.example.com/FUZZ \
     -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \
     -mc 200,401,403 \
     -fs 0 \
     -t 50 \
     -o spec-discovery.json

# kiterunner (smarter — knows API patterns + verbs)
kr scan https://api.example.com -w routes-large.kite

# or curl all at once
for p in openapi.json swagger.json v3/api-docs api-docs docs/openapi.yaml ; do
  echo -n "$p: "
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/$p
done

What to do with a spec


#	Action
1. Import to Postman / Bruno	Every endpoint becomes a runnable saved request. See Postman / Pentesting deep-dive.
2. jq grep for keywords	`jq '..\|.paths?\|keys?'` → all paths. Then `grep -E "(admin\|internal\|debug\|test\|legacy)"`.
3. Extract field names	`jq '..\|objects\|.properties?\|keys?'` → response schema fields. Look for passwordHash, ssn, apiKey, internalToken.
4. Auth scheme audit	`jq .components.securitySchemes` — does ANY endpoint allow weaker auth (Basic alongside Bearer)?
5. servers section	Sometimes the "servers" block lists internal URLs (http://10.x.x.x, http://internal.lan). Pivot targets.
6. "x-" extensions	`x-internal: true`, `x-hidden: true`, `x-experimental` often mark sensitive endpoints the docs site hides — but the spec still lists them.

💡 Save the spec locally and never re-download it during the engagement. Public targets serve specs with different versions / endpoints depending on user-agent, geo, or CDN cache state. One frozen copy gives you a stable baseline.

JS bundle mining — the most-shipped doc in your target

When there's no spec — or even when there is — the most useful API documentation in the target is the JS bundle. Frontend code makes every API call your target supports; greppable strings reveal endpoints, auth headers, hardcoded keys, and feature flags. Modern SPAs ship 1-5 MB of minified JS per page — full app logic, ready to be mined.

Collecting bundles

bash

# 1. browse the app in a clean browser profile
# 2. DevTools → Network tab → filter: JS
# 3. select all → right-click → "Save all as HAR"
#    OR per-file: right-click → "Save as"

# 4. headless alternative — pull every JS the index references
curl -s https://app.example.com/ \
  | grep -oE 'src="[^"]+\.js[^"]*"' \
  | grep -oE '[^"]+\.js' \
  | while read url; do
      [[ "$url" == http* ]] || url="https://app.example.com$url"
      wget -q -P bundles/ "$url"
    done

# 5. or use a tool that does the SPA-aware fetch (waits for runtime chunks)
katana -u https://app.example.com -d 3 -jc -o crawl.txt
gospider -s https://app.example.com -d 3 -c 10 -t 20 --js

Mining patterns

bash

# 1. endpoints — fetch/axios/jquery
grep -rEoh 'fetch\(\s*"[^"]+"' bundles/  | sort -u
grep -rEoh 'axios\.\w+\(\s*"[^"]+"' bundles/ | sort -u
grep -rEoh '\$.\w+\(\s*"[^"]+"' bundles/  | sort -u

# 2. relative URL strings starting with /api or /v\d
grep -rEoh '"[a-zA-Z./-]*?/api/[a-zA-Z./_-]+' bundles/ | sort -u
grep -rEoh '"/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u

# 3. secrets — Stripe keys, AWS, GitHub, Slack tokens
grep -rEoh 'sk_live_[a-zA-Z0-9]{24,}' bundles/
grep -rEoh 'AKIA[0-9A-Z]{16}' bundles/
grep -rEoh 'ghp_[a-zA-Z0-9]{36}' bundles/
grep -rEoh 'xox[bpsa]-[a-zA-Z0-9-]+' bundles/

# 4. config blocks — API_BASE, FEATURE_FLAGS, REGION
grep -rEoh '(apiKey|API_KEY|secret|token|baseUrl|API_BASE)[\s:=]+"[^"]+"' bundles/

Dedicated tools


Tool	Use
LinkFinder	Python tool with regex-tuned for endpoint extraction. `python linkfinder.py -i bundles/ -o results.html`.
SecretFinder	LinkFinder's sibling for hardcoded secrets. Many overlapping regexes with gitleaks.
nuclei -t exposures/tokens	Nuclei templates for live URL secret scanning.
katana / hakrawler / gospider	SPA-aware crawlers that wait for runtime chunks.
js-beautify / prettier	Beautify before grep — sometimes regex patterns need newlines to work right.
source-map-explorer / unmin	If a .js.map shipped with the bundle (see our SPA Security article!) you can get original .ts/.tsx source back.

💡 Lazy-loaded chunks are the goldmine. Main bundle ships the public app; chunks load only when the user opens an admin / settings / billing page. Grep the chunks separately — they expose features the public spec never mentions.

Wayback Machine — endpoints the team forgot existed

The Internet Archive Wayback Machine crawls public web continuously. For any target with a 5+ year history, it has snapshots of URLs that today's site never links to — old API versions, legacy endpoints, backup files, .json dumps left there during a migration in 2019. Many of those URLs still respond on production today; nobody removed them.

Pulling historical URLs

bash

# waybackurls — easiest
echo "api.example.com" | waybackurls | sort -u > historical-urls.txt

# more sources at once
echo "api.example.com" | gau --providers wayback,otx,commoncrawl,urlscan > all-sources.txt

# CDX API directly (no tool)
curl -s "http://web.archive.org/cdx/search/cdx?url=api.example.com/*&output=text&fl=original&collapse=urlkey" \
  | sort -u > historical-urls.txt

# common-crawl — bigger / older
curl -s "https://index.commoncrawl.org/CC-MAIN-2024-26-index?url=api.example.com/*&output=json" \
  | jq -r .url | sort -u

# subdomain-aware
echo "*.example.com" | waybackurls

What to grep for

bash

# old API versions
grep -E "/api/v[0-9]+/" historical-urls.txt

# debug / internal / admin / test endpoints
grep -Ei "/(debug|internal|admin|test|qa|legacy|old|deprecated|sandbox)/" historical-urls.txt

# extensions that scream "leftover"
grep -Ei "\.(bak|backup|orig|old|sql|swp|swagger|json\.bak)$" historical-urls.txt

# query params worth re-trying today (admin=true, debug=1)
grep -oE '\?[a-z_]+(=[^&]+)?' historical-urls.txt | sort -u

Live re-check

bash

# probe every candidate against current production
cat candidates.txt | httpx -status-code -follow-redirects -no-color \
  | grep -v "404"

# or curl loop with a sane rate
while read url; do
  echo -n "$url -> "
  curl -s -o /dev/null -w "%{http_code} (%{size_download}b)\n" "$url"
  sleep 0.5
done < candidates.txt

What you commonly find


Find	Detail
Old API versions still routed	/api/v1/users returns 200, current site uses /api/v3. v1 = older auth, no rate-limit, original SQL injection bug never patched in old code.
.bak / .old files	config.php.bak with original DB credentials. dump.sql sitting in /backups/ from a 2021 migration.
Pre-launch debug endpoints	/debug/info that returned env vars in 2020 when the app was being built — quietly still working on prod.
S3 / Azure / GCS bucket URLs	Wayback caches them. The bucket itself often still public.
Old API docs / Swagger UI	/v1/swagger-ui/ from before the team disabled it on the current version.

Subdomain enum — find the weaker deployment

A target named example.com almost never has just one host. There's api., dev-api., staging-api., admin., internal., maybe old-api. Each is often a different deployment of the same code — different security posture, different oversight. Pentesting them as siblings is mandatory.

Passive sources (no traffic to target)

bash

# subfinder — multi-source aggregator
subfinder -d example.com -all -silent -o subs.txt

# amass — passive only
amass enum -passive -d example.com -o subs.txt

# crt.sh — Certificate Transparency logs (every cert ever issued for *.example.com)
curl -s "https://crt.sh/?q=%25.example.com&output=json" \
  | jq -r ".[].name_value" | sort -u

# assetfinder
echo example.com | assetfinder --subs-only

# combine them all
( subfinder -d example.com -silent; \
  amass enum -passive -d example.com; \
  assetfinder --subs-only example.com; \
  curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r ".[].name_value" \
) | sort -u > all-subs.txt

Active probe of each

bash

# httpx — alive check, tech detect, status code
cat all-subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt

# or with screenshots
cat all-subs.txt | httpx -screenshot -srd shots/

# DNS-only — what resolves vs what doesn't
cat all-subs.txt | dnsx -silent -resp

Sub-subdomains that often matter


Subdomain pattern	Why interesting
api., api-v2., api-internal.	Direct API hosts. Most likely target.
dev-api., staging-api., qa-api., test-api.	Same API, weaker auth. Same vulns as prod — without prod's alerting.
admin., internal., ops.	Admin panels. Often weaker auth + assume "behind VPN" but exposed anyway.
old-api., legacy., v1.	Forgotten deployments. The team migrated away years ago.
graphql., gql.	GraphQL gateways. Often shipped without introspection disabled.
webhook., events., callback.	Inbound webhook receivers. Sometimes accept SSRF payloads as event sources.
cdn., assets., static.	Static-asset hosts. Source maps, JS bundles, dumped files.

Acquisition / takeover risk

⚠ Many subdomains point at deprovisioned services (S3 buckets that were deleted, Heroku apps no longer claimed, Azure CNAMEs dangling). Check every alive subdomain with subjack or nuclei -t takeovers. Subdomain takeover lets an attacker serve content on the target's domain — instant XSS + cookie theft against the parent site.

Version enum — old versions outlive the team that built them

Even on a single subdomain, the same API often runs at multiple versions side by side. Frontend uses v3; nginx still routes v1, v2, v4 (staging), beta, internal. Old versions outlive the teams that built them — and outlive the security improvements those teams made later.

Sweeping versions

bash

# basic
for v in v0 v1 v2 v3 v4 v5 internal beta staging dev test old legacy; do
  echo -n "/api/$v/users: "
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/api/$v/users
done

# ffuf — versioned wordlist
ffuf -u https://api.example.com/api/FUZZ/users \
     -w api-versions.txt \
     -mc 200,401,403,500

# Burp Intruder — same idea with full request manipulation

What changes between versions


Aspect	Typical drift
Auth scheme	v1 accepted Basic; v3 requires Bearer + PKCE. v1 still alive = downgrade path.
Input validation	v1 didn't parameterise queries — SQL injection still works. Validation was added in v2.
Rate limiting	Added in v3. v1 / v2 still let attackers spray credentials.
Field exposure	v1 response included passwordHash; v3 removed it. Old route still returns it.
Error format	v1 leaks stack traces in 500s; v3 returns generic JSON. Use v1's leaks to fingerprint v3's internals.
Verb support	v1 accepted DELETE on /users/{id} without confirmation; v3 requires re-auth. Use the older verb.

Beyond integer versions


Style	Probe
By Accept header	`Accept: application/vnd.example.v1+json` — some APIs version by mime-type. Try v0, v1, v2 in the header.
By query param	`?api_version=1`, `?v=1.0`. Old behaviour might trigger when omitted.
By subdomain	api-v1.example.com vs api.example.com (cf. §5).
By path prefix	/api/v1/, /api2/, /v3/api/. Try all permutations.

GraphQL introspection — the schema is a free attack-surface map

GraphQL has built-in introspection — a query that returns the entire schema. By default this is enabled in development frameworks; many teams forget to disable in production. One introspection query → every type, field, query, mutation, argument — the entire attack surface for free.

Detecting GraphQL

bash

# common paths
/graphql
/api/graphql
/v1/graphql
/v2/graphql
/query
/__graphql
/graphiql       # interactive UI
/playground     # Apollo playground

# detection probe
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{__typename}"}'
# {"data":{"__typename":"Query"}}  ← confirmed GraphQL

The introspection query

bash

# minimal
curl -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{__schema{types{name}}}"}'

# full standard introspection — too long for here, see:
# https://raw.githubusercontent.com/graphql/graphql-js/main/src/utilities/getIntrospectionQuery.ts

# tools
inql introspect -t https://api.example.com/graphql -o schema.json
graphql-introspection-query  > schema.json
clairvoyance -u https://api.example.com/graphql -w common-words.txt  # if introspection disabled

# Burp + InQL extension automates the whole flow

# render the schema as a graph
graphql-voyager   # open schema.json
graphqlcheck      # security-focused linter

What the schema reveals


Reveal	Detail
Mutations the UI never exposes	deleteUser, impersonate, runRawSQL, executeShell — exist in the schema, never wired into the app, but still callable.
Field-level secrets	User.passwordHash, User.mfaSecret — fields that should never be queryable from the public API.
Internal types	AdminUser, InternalEvent — types whose existence reveals admin functionality.
Deprecated fields	`@deprecated` directives mark soft-removed surface. Often still works, just hidden from docs.
Arg shapes	Argument types reveal what custom enums, JSON-blob inputs, file uploads exist. Each is a fuzz target.

When introspection is disabled

shell

# clairvoyance — brute-force the schema from error messages
clairvoyance -u https://api.example.com/graphql -w common-graphql-fields.txt -o schema.json

# how it works:
#  query { aRandomField }
#  → server: "Cannot query field 'aRandomField' on type 'Query'. Did you mean 'foo, bar, baz'?"
#  → clairvoyance parses the suggestion + iterates
#  often recovers ~80% of the schema even with introspection off

💡 GraphQL also enables other attack classes: batching, alias overloading, depth/complexity attacks, query weight bypass. See the GraphQL Security deep-dive for the attack catalogue. The recon here is the prerequisite step.

Wordlist brute-force — when there's no spec

No spec, no GraphQL, no obvious indicators — when everything else fails, brute-force the URL space. Modern wordlists (Assetnote, SecLists) cover thousands of common API patterns; modern fuzzers (ffuf, kiterunner, arjun) are smart about verbs and parameter discovery.

Endpoint brute-force

shell

# ffuf — fast, simple
ffuf -u https://api.example.com/FUZZ \
     -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \
     -mc 200,401,403,500 \
     -fs 0 \
     -t 50

# Recurse into discovered directories
ffuf -u https://api.example.com/users/FUZZ \
     -w common.txt -mc 200,401,403 -recursion -recursion-depth 3

# kiterunner — verb-aware, knows API patterns
kr scan https://api.example.com -w routes-large.kite

# kiterunner brute (random GUID-style API paths)
kr brute https://api.example.com -w routes-small.txt

# gobuster
gobuster dir -u https://api.example.com -w wordlist.txt -s "200,401,403" -k

# feroxbuster
feroxbuster -u https://api.example.com -w wordlist.txt -s 200,401,403

Parameter brute-force

shell

# arjun — hidden parameters on a known endpoint
arjun -u https://api.example.com/v1/search -m GET --include="q=test"
arjun -u https://api.example.com/v1/users -m POST -i POST.json

# x8 — alternative with smarter heuristics
x8 -u https://api.example.com/v1/search -w params.txt

# parameth — python alternative
python parameth.py -u https://api.example.com/v1/search

Wordlists worth knowing


Wordlist	Use
Assetnote wordlists	wordlists.assetnote.io — curated by API-focused researchers. Huge "kiterunner-routes-large" set.
SecLists/Discovery/Web-Content/api/	api-endpoints.txt, common-api-endpoints-mazen160.txt — classic baseline.
SecLists/Fuzzing/Polyglots	For body fuzzing when an endpoint is found.
SecLists/Discovery/Web-Content/burp-parameter-names.txt	Parameter wordlist for arjun.
raft-large-words / raft-large-files	OG OWASP raft wordlists. Still relevant.

Tuning for noise


Flag	Use
-mc / status filter	-mc 200,401,403 keeps only useful responses. Filter out the 404 baseline.
-fs / size filter	-fs 0 hides empty-body responses. -fs 1234 hides ones with same size as a known 404 page.
-mr / regex match	Match on response body content (e.g., -mr "graphql\|swagger") for targeted discovery.
-t / threads	-t 50 is aggressive; -t 10 is polite. WAFs / rate limiters will ban you above ~100.
-p / delay	-p 0.1 = 100ms delay between requests. Use against fragile or production targets.
-H / headers	-H "Authorization: Bearer …" — fuzz authenticated endpoints. Often a different surface than unauth.

⚠ Wordlist fuzzing is by far the loudest recon technique. The logs will read "10,000 requests from 1.2.3.4 in 60 seconds, mostly 404s" — there's no hiding it. Some bug-bounty programs flatly ban aggressive automated scanning, so stay inside scope.

Other passive sources — Github, Shodan, mobile apps

Beyond the seven main techniques, a few smaller sources occasionally turn up the critical find:

Other passive sources worth checking


Source	What to look for
GitHub code search	`"api.example.com" filename:.env`, `"api.example.com" "Authorization: Bearer"`. Devs commit secrets surprisingly often.
Postman public collections	`https://www.postman.com/search?q=example.com&type=collection`. Employees sometimes publish working collections.
Shodan / Censys	`Shodan: ssl.cert.subject.cn:"*.example.com"` — finds hosts even if DNS doesn't resolve.
Pastebin / ghostbin / gist	Tokens, dumps, "help, my API isn't working" posts with full requests.
JS source maps	See SPA Security article — `bundle.js.map` reconstructs original source.
robots.txt + sitemap.xml	Sometimes lists API endpoints meant to be hidden from crawlers. Disallow lines = "here's where we don't want you to look".
.well-known/*	/.well-known/openid-configuration, /.well-known/security.txt, /.well-known/api-catalog (draft).
CSP report-uri / report-to	CSP header often points at an /report endpoint — and sometimes a CSP report leaks server-side details on accepted payloads.
error pages	Default 404 / 500 pages often reveal framework + version (Django, Rails, Spring Boot whitelabel error page).

Mobile app extraction

bash

# pull the APK
adb shell pm list packages | grep example
adb shell pm path com.example.app
adb pull /data/app/com.example.app-1/base.apk

# decode it
apktool d base.apk -o decoded/

# search for API endpoints
grep -rEoh 'https?://[^"]+' decoded/ | sort -u
grep -rEoh '/api/v\d+/[^"]+' decoded/ | sort -u

# secrets
grep -rE '(api_key|secret|password)' decoded/

# iOS .ipa is similar:
unzip app.ipa -d app/
strings Payload/example.app/example | grep -E 'https?://'

Workflow — chaining the techniques in 2 hours

Here's how the techniques slot together in order on a real engagement:

Recon workflow — the first 2 hours


Time	Action
Min 0-10	subfinder + amass + crt.sh → all-subs.txt. httpx for alive check.
Min 10-20	For each alive host: try /openapi.json, /swagger.json, /v3/api-docs, /graphql. Pull anything that returns 200.
Min 20-30	Browse the app in a clean browser with DevTools open. Capture HAR file. Save all JS bundles.
Min 30-50	JS mining — endpoints + secrets + config. Combine with spec if found.
Min 50-70	waybackurls + gau on each subdomain. Filter for interesting paths. Live-check.
Min 70-90	Version enum on each discovered subdomain. v0/v1/v2/internal/beta sweep.
Min 90-110	GraphQL introspection if /graphql found. Schema → InQL → audit.
Min 110-120	ffuf / kiterunner on the holes — paths not covered by spec, JS, or wayback.
End of hour 2	Consolidated endpoint inventory. Imported to Postman. Ready to test.

Recon report deliverable

shell

## API Recon — Acme Corp (example.com) — 2026-05-23

### Targets
- api.example.com           [prod]    nginx + node, JWT auth
- dev-api.example.com       [dev]     no auth ⚠
- staging-api.example.com   [staging] HTTP Basic ⚠
- old-api.example.com       [legacy]  no rate limit ⚠

### Discovered specs
- https://api.example.com/openapi.json  (OpenAPI 3.0.2, 187 endpoints)
- https://staging-api.example.com/v1/swagger.json  (Swagger 2.0, 142 endpoints)

### Notable endpoint surface
- /api/v1/admin/* (12 endpoints) — listed in spec but UI never uses them
- /internal/debug/dumpdb — confirmed live, requires no auth on dev-api
- /api/v0/users — listed nowhere current, still serves 200

### Secrets found in bundles
- Stripe publishable key (public, OK)
- Stripe SECRET KEY (sk_live_…) in admin.chunk.js ☠
- Internal Slack webhook in main.bundle.js

### Versions discovered
prod: v3 (current), v0/v1/v2/beta all live ⚠
staging: v2 only
dev: v3 + experimental v4

### GraphQL
graphql.example.com — introspection ON ☠
mutations exposed: deleteUser, impersonate, runRawSQL ☠

### Next steps for active testing
1. test admin endpoints with normal-user token (BFLA)
2. fuzz /internal/debug/* for SSRF/RCE
3. test the impersonate mutation
4. report sk_live_ Stripe key leak NOW (before further testing)

Cheat sheet — tools, one-shot pipeline, defender list

A single-page reference for the whole recon workflow:

Tool / wordlist quick-ref


Tool	Use
subfinder / amass / assetfinder	subdomain enum (passive)
crt.sh / SecurityTrails / Censys	Certificate Transparency, DNS history
httpx / dnsx	alive check + tech detect
katana / hakrawler / gospider	SPA-aware crawlers
waybackurls / gau	historical URL recovery
ffuf / gobuster / feroxbuster	directory + endpoint brute-force
kiterunner	verb-aware API endpoint discovery
arjun / x8 / parameth	hidden parameter discovery
inql / clairvoyance	GraphQL introspection / brute-force
graphql-voyager	GraphQL schema visualisation
LinkFinder / SecretFinder	JS bundle mining
gitleaks / trufflehog / nosey-parker	secret hunting in repos
nuclei + templates	broad template-based scanning
Postman / Insomnia / Bruno	organise + test the discovered surface
Burp Suite	capture + active testing

One-shot recon pipeline

bash

# environment
export TARGET="example.com"

# subdomains
subfinder -d $TARGET -all -silent > subs.txt
amass enum -passive -d $TARGET >> subs.txt
sort -u subs.txt -o subs.txt

# alive
cat subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt

# spec discovery on each alive host
for h in $(awk '{print $1}' alive.txt); do
  for p in openapi.json swagger.json v3/api-docs api-docs graphql; do
    code=$(curl -s -o /dev/null -w "%{http_code}" "$h/$p")
    [ "$code" = "200" ] && echo "$h/$p"
  done
done > specs.txt

# wayback URLs across all subs
for h in $(awk '{print $1}' alive.txt | sed 's|https\?://||'); do
  echo "$h" | waybackurls
done | sort -u > wayback.txt

# JS bundles via crawler
cat alive.txt | xargs -I{} -P 4 katana -u {} -d 3 -jc 2>/dev/null \
  | grep -E '\.js($|\?)' > js-urls.txt

# pull bundles, grep
mkdir -p bundles && cat js-urls.txt | parallel -j 4 wget -q -P bundles/ {}
grep -rEoh '/api/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u > js-endpoints.txt
grep -rEoh 'sk_(live|test)_[a-zA-Z0-9]{24,}' bundles/ > secrets.txt

# ffuf gaps
ffuf -u "https://api.$TARGET/FUZZ" -w api-wordlist.txt -mc 200,401,403 -o ffuf.json

# consolidate
cat js-endpoints.txt wayback.txt ffuf.json | extract_url.py | sort -u > all-endpoints.txt
wc -l all-endpoints.txt

Defender checklist


#	Control
1.	OpenAPI spec — restrict to authenticated users in production; keep "internal" endpoints out of the public spec.
2.	JS bundles — no hardcoded secrets; sourcemaps stripped from prod deploy.
3.	Wayback — review archive.org for your domain; request takedowns of any leaked content.
4.	Subdomains — inventory all of them. Decommissioned services = takeover risk; close their DNS.
5.	API versions — decommission old versions explicitly. Block at LB if endpoint is "deprecated for security".
6.	GraphQL — disable introspection in production. Disable field suggestions if using clairvoyance defense.
7.	Standard error pages with no framework / version info.
8.	WAF / rate-limit rules trigger on rapid 404 patterns (typical fuzz signature).
9.	GitHub secret-scan + pre-commit hooks (gitleaks); revoke any committed secret immediately.
10.	Run the same recon against yourself quarterly. Find what an attacker would find first.

Closing thoughts

Three things to take away:

Most of the attack surface isn't in the docs. The public docs describe the API the team wants you to see. Production routing, meanwhile, still knows about old versions, dev subdomains, debug endpoints and leftover backup files — and all of it is reachable. Recon's whole job is to get the complete list, not the curated one. JS bundles, wayback, a version sweep and subdomain enum together dwarf whatever the official spec admits to.

Passive recon costs nothing; active recon costs noise. Spend the first hour entirely on passive sources — crt.sh, wayback, GitHub, Shodan, public Postman collections — without sending a single packet at the target. Only once that picture is built do you start active scanning with ffuf, kiterunner or introspection queries. It keeps your footprint small and keeps you clearly distinct from a blanket scanner in the logs.

Recon is a defensive tool too. If you can find your own forgotten v0 endpoint in twenty minutes, so can anyone else. Run this exact workflow against your own infrastructure every quarter — the output is your shadow-API inventory: old versions to decommission, leaked secrets to rotate, debug endpoints to firewall off. Defensive recon is about the cheapest security investment per finding you'll ever make.

Next on the API Security track: BOLA (Broken Object Level Authorization) — the most common API vulnerability, and the obvious first attack to run against the endpoint inventory you just built.

Reactions

Published	May 24, 2026
Updated	Jul 16, 2026
Reading time	19 min
Access	public

Overview — recon is half the engagement

What "API recon" produces

Recon hygiene

Spec discovery — find the OpenAPI, get a free inventory

Common spec paths to try

Wordlist + ffuf

What to do with a spec

JS bundle mining — the most-shipped doc in your target

Collecting bundles

Mining patterns

Dedicated tools

Wayback Machine — endpoints the team forgot existed

Pulling historical URLs

What to grep for

Live re-check

What you commonly find

Subdomain enum — find the weaker deployment

Passive sources (no traffic to target)

Active probe of each

Sub-subdomains that often matter

Acquisition / takeover risk

Version enum — old versions outlive the team that built them

Sweeping versions

What changes between versions

Beyond integer versions

GraphQL introspection — the schema is a free attack-surface map

Detecting GraphQL

The introspection query

What the schema reveals

When introspection is disabled

Wordlist brute-force — when there's no spec

Endpoint brute-force

Parameter brute-force

Wordlists worth knowing

Tuning for noise

Other passive sources — Github, Shodan, mobile apps

Other passive sources worth checking

Mobile app extraction

Workflow — chaining the techniques in 2 hours

Recon workflow — the first 2 hours

Recon report deliverable

Cheat sheet — tools, one-shot pipeline, defender list

Tool / wordlist quick-ref

One-shot recon pipeline

Defender checklist

Closing thoughts

Related Articles