openapi · js mining · wayback · subdomain · version sprawl · graphql · ffuf
Overview — recon is half the engagement
Recon is the first hour of every API engagement and the last hour you should ever skip. Before sending a single attack, you want to know: how many endpoints exist, what auth they expect, what versions are still alive, what dev/staging/legacy deployments share the same code, what hidden parameters change behaviour, what the team forgot to take down. Get this right and the attacks that follow are targeted; skip it and you're fuzzing the wrong surface for a week.
We'll go through seven recon techniques in roughly the order you'd reach for them: spec discovery, JS bundle mining, wayback historical URLs, subdomain enumeration, version sweeping, GraphQL introspection, and wordlist brute-force. None of them need an authenticated session, and a few don't even send a packet to the target.
What "API recon" produces
| Output | Why it matters |
|---|---|
| Endpoint inventory | Every URL × method the target exposes — official + forgotten + dev/staging. Feed into Postman / Burp for systematic testing. |
| Auth scheme map | Which endpoints expect API key vs Bearer vs mTLS vs nothing. Where weak auth is permitted (older versions, debug routes). |
| Schema awareness | Field names — including fields the UI never displays. Often more sensitive than discovered endpoints (passwordHash, internalNotes, etc.). |
| Tech fingerprint | Framework (Django REST, Express, FastAPI, Spring), proxy (nginx, Cloudflare), language. Drives which CVEs and behaviours to test for. |
| Secrets / leaks | Hardcoded keys in JS bundles, deprecated dev URLs in old commits, debug logging output. Often the engagement ends here. |
| Subdomain + version sprawl | Same API deployed on 5 different hosts at 3 different versions. The weakest deployment dictates the strength of the system. |
Recon hygiene
Spec discovery — find the OpenAPI, get a free inventory
If the target ships an OpenAPI / Swagger spec, recon shortcuts by ~10×. Every endpoint, parameter, auth scheme, and example payload is in one JSON file. Worth a few minutes of focused fuzzing to find.
Common spec paths to try
# OpenAPI / Swagger /openapi.json /openapi.yaml /swagger.json /swagger.yaml /v3/api-docs # springfox / springdoc default /v2/api-docs # older springfox /api-docs /api/swagger.json /api/v1/swagger.json /api/v2/openapi.json /docs/openapi.yaml /docs/api/spec.json /_api/spec /_openapi /spec /api/v3/spec.yaml # UI-driven explorers (often unprotected) /swagger-ui/ /swagger-ui/index.html /redoc /redoc/index.html /api/docs /docs /scalar /rapidoc # GraphQL playgrounds /graphql /graphiql /playground # RAML / WSDL legacy /api.raml /services?wsdl
Wordlist + ffuf
# fuzz a curated wordlist of API doc paths ffuf -u https://api.example.com/FUZZ \ -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \ -mc 200,401,403 \ -fs 0 \ -t 50 \ -o spec-discovery.json # kiterunner (smarter — knows API patterns + verbs) kr scan https://api.example.com -w routes-large.kite # or curl all at once for p in openapi.json swagger.json v3/api-docs api-docs docs/openapi.yaml ; do echo -n "$p: " curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/$p done
What to do with a spec
| # | Action |
|---|---|
| 1. Import to Postman / Bruno | Every endpoint becomes a runnable saved request. See Postman / Pentesting deep-dive. |
| 2. jq grep for keywords | jq '..|.paths?|keys?' → all paths. Then grep -E "(admin|internal|debug|test|legacy)". |
| 3. Extract field names | jq '..|objects|.properties?|keys?' → response schema fields. Look for passwordHash, ssn, apiKey, internalToken. |
| 4. Auth scheme audit | jq .components.securitySchemes — does ANY endpoint allow weaker auth (Basic alongside Bearer)? |
| 5. servers section | Sometimes the "servers" block lists internal URLs (http://10.x.x.x, http://internal.lan). Pivot targets. |
| 6. "x-" extensions | x-internal: true, x-hidden: true, x-experimental often mark sensitive endpoints the docs site hides — but the spec still lists them. |
JS bundle mining — the most-shipped doc in your target
When there's no spec — or even when there is — the most useful API documentation in the target is the JS bundle. Frontend code makes every API call your target supports; greppable strings reveal endpoints, auth headers, hardcoded keys, and feature flags. Modern SPAs ship 1-5 MB of minified JS per page — full app logic, ready to be mined.
Collecting bundles
# 1. browse the app in a clean browser profile # 2. DevTools → Network tab → filter: JS # 3. select all → right-click → "Save all as HAR" # OR per-file: right-click → "Save as" # 4. headless alternative — pull every JS the index references curl -s https://app.example.com/ \ | grep -oE 'src="[^"]+\.js[^"]*"' \ | grep -oE '[^"]+\.js' \ | while read url; do [[ "$url" == http* ]] || url="https://app.example.com$url" wget -q -P bundles/ "$url" done # 5. or use a tool that does the SPA-aware fetch (waits for runtime chunks) katana -u https://app.example.com -d 3 -jc -o crawl.txt gospider -s https://app.example.com -d 3 -c 10 -t 20 --js
Mining patterns
# 1. endpoints — fetch/axios/jquery grep -rEoh 'fetch\(\s*"[^"]+"' bundles/ | sort -u grep -rEoh 'axios\.\w+\(\s*"[^"]+"' bundles/ | sort -u grep -rEoh '\$.\w+\(\s*"[^"]+"' bundles/ | sort -u # 2. relative URL strings starting with /api or /v\d grep -rEoh '"[a-zA-Z./-]*?/api/[a-zA-Z./_-]+' bundles/ | sort -u grep -rEoh '"/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u # 3. secrets — Stripe keys, AWS, GitHub, Slack tokens grep -rEoh 'sk_live_[a-zA-Z0-9]{24,}' bundles/ grep -rEoh 'AKIA[0-9A-Z]{16}' bundles/ grep -rEoh 'ghp_[a-zA-Z0-9]{36}' bundles/ grep -rEoh 'xox[bpsa]-[a-zA-Z0-9-]+' bundles/ # 4. config blocks — API_BASE, FEATURE_FLAGS, REGION grep -rEoh '(apiKey|API_KEY|secret|token|baseUrl|API_BASE)[\s:=]+"[^"]+"' bundles/
Dedicated tools
| Tool | Use |
|---|---|
| LinkFinder | Python tool with regex-tuned for endpoint extraction. python linkfinder.py -i bundles/ -o results.html. |
| SecretFinder | LinkFinder's sibling for hardcoded secrets. Many overlapping regexes with gitleaks. |
| nuclei -t exposures/tokens | Nuclei templates for live URL secret scanning. |
| katana / hakrawler / gospider | SPA-aware crawlers that wait for runtime chunks. |
| js-beautify / prettier | Beautify before grep — sometimes regex patterns need newlines to work right. |
| source-map-explorer / unmin | If a .js.map shipped with the bundle (see our SPA Security article!) you can get original .ts/.tsx source back. |
Wayback Machine — endpoints the team forgot existed
The Internet Archive Wayback Machine crawls public web continuously. For any target with a 5+ year history, it has snapshots of URLs that today's site never links to — old API versions, legacy endpoints, backup files, .json dumps left there during a migration in 2019. Many of those URLs still respond on production today; nobody removed them.
Pulling historical URLs
# waybackurls — easiest echo "api.example.com" | waybackurls | sort -u > historical-urls.txt # more sources at once echo "api.example.com" | gau --providers wayback,otx,commoncrawl,urlscan > all-sources.txt # CDX API directly (no tool) curl -s "http://web.archive.org/cdx/search/cdx?url=api.example.com/*&output=text&fl=original&collapse=urlkey" \ | sort -u > historical-urls.txt # common-crawl — bigger / older curl -s "https://index.commoncrawl.org/CC-MAIN-2024-26-index?url=api.example.com/*&output=json" \ | jq -r .url | sort -u # subdomain-aware echo "*.example.com" | waybackurls
What to grep for
# old API versions grep -E "/api/v[0-9]+/" historical-urls.txt # debug / internal / admin / test endpoints grep -Ei "/(debug|internal|admin|test|qa|legacy|old|deprecated|sandbox)/" historical-urls.txt # extensions that scream "leftover" grep -Ei "\.(bak|backup|orig|old|sql|swp|swagger|json\.bak)$" historical-urls.txt # query params worth re-trying today (admin=true, debug=1) grep -oE '\?[a-z_]+(=[^&]+)?' historical-urls.txt | sort -u
Live re-check
# probe every candidate against current production cat candidates.txt | httpx -status-code -follow-redirects -no-color \ | grep -v "404" # or curl loop with a sane rate while read url; do echo -n "$url -> " curl -s -o /dev/null -w "%{http_code} (%{size_download}b)\n" "$url" sleep 0.5 done < candidates.txt
What you commonly find
| Find | Detail |
|---|---|
| Old API versions still routed | /api/v1/users returns 200, current site uses /api/v3. v1 = older auth, no rate-limit, original SQL injection bug never patched in old code. |
| .bak / .old files | config.php.bak with original DB credentials. dump.sql sitting in /backups/ from a 2021 migration. |
| Pre-launch debug endpoints | /debug/info that returned env vars in 2020 when the app was being built — quietly still working on prod. |
| S3 / Azure / GCS bucket URLs | Wayback caches them. The bucket itself often still public. |
| Old API docs / Swagger UI | /v1/swagger-ui/ from before the team disabled it on the current version. |
Subdomain enum — find the weaker deployment
A target named example.com almost never has just one host. There's api., dev-api., staging-api., admin., internal., maybe old-api. Each is often a different deployment of the same code — different security posture, different oversight. Pentesting them as siblings is mandatory.
Passive sources (no traffic to target)
# subfinder — multi-source aggregator subfinder -d example.com -all -silent -o subs.txt # amass — passive only amass enum -passive -d example.com -o subs.txt # crt.sh — Certificate Transparency logs (every cert ever issued for *.example.com) curl -s "https://crt.sh/?q=%25.example.com&output=json" \ | jq -r ".[].name_value" | sort -u # assetfinder echo example.com | assetfinder --subs-only # combine them all ( subfinder -d example.com -silent; \ amass enum -passive -d example.com; \ assetfinder --subs-only example.com; \ curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r ".[].name_value" \ ) | sort -u > all-subs.txt
Active probe of each
# httpx — alive check, tech detect, status code cat all-subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt # or with screenshots cat all-subs.txt | httpx -screenshot -srd shots/ # DNS-only — what resolves vs what doesn't cat all-subs.txt | dnsx -silent -resp
Sub-subdomains that often matter
| Subdomain pattern | Why interesting |
|---|---|
| api., api-v2., api-internal. | Direct API hosts. Most likely target. |
| dev-api., staging-api., qa-api., test-api. | Same API, weaker auth. Same vulns as prod — without prod's alerting. |
| admin., internal., ops. | Admin panels. Often weaker auth + assume "behind VPN" but exposed anyway. |
| old-api., legacy., v1. | Forgotten deployments. The team migrated away years ago. |
| graphql., gql. | GraphQL gateways. Often shipped without introspection disabled. |
| webhook., events., callback. | Inbound webhook receivers. Sometimes accept SSRF payloads as event sources. |
| cdn., assets., static. | Static-asset hosts. Source maps, JS bundles, dumped files. |
Acquisition / takeover risk
subjack or nuclei -t takeovers. Subdomain takeover lets an attacker serve content on the target's domain — instant XSS + cookie theft against the parent site.Version enum — old versions outlive the team that built them
Even on a single subdomain, the same API often runs at multiple versions side by side. Frontend uses v3; nginx still routes v1, v2, v4 (staging), beta, internal. Old versions outlive the teams that built them — and outlive the security improvements those teams made later.
Sweeping versions
# basic for v in v0 v1 v2 v3 v4 v5 internal beta staging dev test old legacy; do echo -n "/api/$v/users: " curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/api/$v/users done # ffuf — versioned wordlist ffuf -u https://api.example.com/api/FUZZ/users \ -w api-versions.txt \ -mc 200,401,403,500 # Burp Intruder — same idea with full request manipulation
What changes between versions
| Aspect | Typical drift |
|---|---|
| Auth scheme | v1 accepted Basic; v3 requires Bearer + PKCE. v1 still alive = downgrade path. |
| Input validation | v1 didn't parameterise queries — SQL injection still works. Validation was added in v2. |
| Rate limiting | Added in v3. v1 / v2 still let attackers spray credentials. |
| Field exposure | v1 response included passwordHash; v3 removed it. Old route still returns it. |
| Error format | v1 leaks stack traces in 500s; v3 returns generic JSON. Use v1's leaks to fingerprint v3's internals. |
| Verb support | v1 accepted DELETE on /users/{id} without confirmation; v3 requires re-auth. Use the older verb. |
Beyond integer versions
| Style | Probe |
|---|---|
| By Accept header | Accept: application/vnd.example.v1+json — some APIs version by mime-type. Try v0, v1, v2 in the header. |
| By query param | ?api_version=1, ?v=1.0. Old behaviour might trigger when omitted. |
| By subdomain | api-v1.example.com vs api.example.com (cf. §5). |
| By path prefix | /api/v1/, /api2/, /v3/api/. Try all permutations. |
GraphQL introspection — the schema is a free attack-surface map
GraphQL has built-in introspection — a query that returns the entire schema. By default this is enabled in development frameworks; many teams forget to disable in production. One introspection query → every type, field, query, mutation, argument — the entire attack surface for free.
Detecting GraphQL
# common paths /graphql /api/graphql /v1/graphql /v2/graphql /query /__graphql /graphiql # interactive UI /playground # Apollo playground # detection probe curl -s -X POST https://api.example.com/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{__typename}"}' # {"data":{"__typename":"Query"}} ← confirmed GraphQL
The introspection query
# minimal curl -X POST https://api.example.com/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{__schema{types{name}}}"}' # full standard introspection — too long for here, see: # https://raw.githubusercontent.com/graphql/graphql-js/main/src/utilities/getIntrospectionQuery.ts # tools inql introspect -t https://api.example.com/graphql -o schema.json graphql-introspection-query > schema.json clairvoyance -u https://api.example.com/graphql -w common-words.txt # if introspection disabled # Burp + InQL extension automates the whole flow # render the schema as a graph graphql-voyager # open schema.json graphqlcheck # security-focused linter
What the schema reveals
| Reveal | Detail |
|---|---|
| Mutations the UI never exposes | deleteUser, impersonate, runRawSQL, executeShell — exist in the schema, never wired into the app, but still callable. |
| Field-level secrets | User.passwordHash, User.mfaSecret — fields that should never be queryable from the public API. |
| Internal types | AdminUser, InternalEvent — types whose existence reveals admin functionality. |
| Deprecated fields | @deprecated directives mark soft-removed surface. Often still works, just hidden from docs. |
| Arg shapes | Argument types reveal what custom enums, JSON-blob inputs, file uploads exist. Each is a fuzz target. |
When introspection is disabled
# clairvoyance — brute-force the schema from error messages clairvoyance -u https://api.example.com/graphql -w common-graphql-fields.txt -o schema.json # how it works: # query { aRandomField } # → server: "Cannot query field 'aRandomField' on type 'Query'. Did you mean 'foo, bar, baz'?" # → clairvoyance parses the suggestion + iterates # often recovers ~80% of the schema even with introspection off
Wordlist brute-force — when there's no spec
No spec, no GraphQL, no obvious indicators — when everything else fails, brute-force the URL space. Modern wordlists (Assetnote, SecLists) cover thousands of common API patterns; modern fuzzers (ffuf, kiterunner, arjun) are smart about verbs and parameter discovery.
Endpoint brute-force
# ffuf — fast, simple ffuf -u https://api.example.com/FUZZ \ -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \ -mc 200,401,403,500 \ -fs 0 \ -t 50 # Recurse into discovered directories ffuf -u https://api.example.com/users/FUZZ \ -w common.txt -mc 200,401,403 -recursion -recursion-depth 3 # kiterunner — verb-aware, knows API patterns kr scan https://api.example.com -w routes-large.kite # kiterunner brute (random GUID-style API paths) kr brute https://api.example.com -w routes-small.txt # gobuster gobuster dir -u https://api.example.com -w wordlist.txt -s "200,401,403" -k # feroxbuster feroxbuster -u https://api.example.com -w wordlist.txt -s 200,401,403
Parameter brute-force
# arjun — hidden parameters on a known endpoint arjun -u https://api.example.com/v1/search -m GET --include="q=test" arjun -u https://api.example.com/v1/users -m POST -i POST.json # x8 — alternative with smarter heuristics x8 -u https://api.example.com/v1/search -w params.txt # parameth — python alternative python parameth.py -u https://api.example.com/v1/search
Wordlists worth knowing
| Wordlist | Use |
|---|---|
| Assetnote wordlists | wordlists.assetnote.io — curated by API-focused researchers. Huge "kiterunner-routes-large" set. |
| SecLists/Discovery/Web-Content/api/ | api-endpoints.txt, common-api-endpoints-mazen160.txt — classic baseline. |
| SecLists/Fuzzing/Polyglots | For body fuzzing when an endpoint is found. |
| SecLists/Discovery/Web-Content/burp-parameter-names.txt | Parameter wordlist for arjun. |
| raft-large-words / raft-large-files | OG OWASP raft wordlists. Still relevant. |
Tuning for noise
| Flag | Use |
|---|---|
| -mc / status filter | -mc 200,401,403 keeps only useful responses. Filter out the 404 baseline. |
| -fs / size filter | -fs 0 hides empty-body responses. -fs 1234 hides ones with same size as a known 404 page. |
| -mr / regex match | Match on response body content (e.g., -mr "graphql|swagger") for targeted discovery. |
| -t / threads | -t 50 is aggressive; -t 10 is polite. WAFs / rate limiters will ban you above ~100. |
| -p / delay | -p 0.1 = 100ms delay between requests. Use against fragile or production targets. |
| -H / headers | -H "Authorization: Bearer …" — fuzz authenticated endpoints. Often a different surface than unauth. |
Other passive sources — Github, Shodan, mobile apps
Beyond the seven main techniques, a few smaller sources occasionally turn up the critical find:
Other passive sources worth checking
| Source | What to look for |
|---|---|
| GitHub code search | "api.example.com" filename:.env, "api.example.com" "Authorization: Bearer". Devs commit secrets surprisingly often. |
| Postman public collections | https://www.postman.com/search?q=example.com&type=collection. Employees sometimes publish working collections. |
| Shodan / Censys | Shodan: ssl.cert.subject.cn:"*.example.com" — finds hosts even if DNS doesn't resolve. |
| Pastebin / ghostbin / gist | Tokens, dumps, "help, my API isn't working" posts with full requests. |
| JS source maps | See SPA Security article — bundle.js.map reconstructs original source. |
| robots.txt + sitemap.xml | Sometimes lists API endpoints meant to be hidden from crawlers. Disallow lines = "here's where we don't want you to look". |
| .well-known/* | /.well-known/openid-configuration, /.well-known/security.txt, /.well-known/api-catalog (draft). |
| CSP report-uri / report-to | CSP header often points at an /report endpoint — and sometimes a CSP report leaks server-side details on accepted payloads. |
| error pages | Default 404 / 500 pages often reveal framework + version (Django, Rails, Spring Boot whitelabel error page). |
Mobile app extraction
# pull the APK adb shell pm list packages | grep example adb shell pm path com.example.app adb pull /data/app/com.example.app-1/base.apk # decode it apktool d base.apk -o decoded/ # search for API endpoints grep -rEoh 'https?://[^"]+' decoded/ | sort -u grep -rEoh '/api/v\d+/[^"]+' decoded/ | sort -u # secrets grep -rE '(api_key|secret|password)' decoded/ # iOS .ipa is similar: unzip app.ipa -d app/ strings Payload/example.app/example | grep -E 'https?://'
Workflow — chaining the techniques in 2 hours
Here's how the techniques slot together in order on a real engagement:
Recon workflow — the first 2 hours
| Time | Action |
|---|---|
| Min 0-10 | subfinder + amass + crt.sh → all-subs.txt. httpx for alive check. |
| Min 10-20 | For each alive host: try /openapi.json, /swagger.json, /v3/api-docs, /graphql. Pull anything that returns 200. |
| Min 20-30 | Browse the app in a clean browser with DevTools open. Capture HAR file. Save all JS bundles. |
| Min 30-50 | JS mining — endpoints + secrets + config. Combine with spec if found. |
| Min 50-70 | waybackurls + gau on each subdomain. Filter for interesting paths. Live-check. |
| Min 70-90 | Version enum on each discovered subdomain. v0/v1/v2/internal/beta sweep. |
| Min 90-110 | GraphQL introspection if /graphql found. Schema → InQL → audit. |
| Min 110-120 | ffuf / kiterunner on the holes — paths not covered by spec, JS, or wayback. |
| End of hour 2 | Consolidated endpoint inventory. Imported to Postman. Ready to test. |
Recon report deliverable
## API Recon — Acme Corp (example.com) — 2026-05-23 ### Targets - api.example.com [prod] nginx + node, JWT auth - dev-api.example.com [dev] no auth ⚠ - staging-api.example.com [staging] HTTP Basic ⚠ - old-api.example.com [legacy] no rate limit ⚠ ### Discovered specs - https://api.example.com/openapi.json (OpenAPI 3.0.2, 187 endpoints) - https://staging-api.example.com/v1/swagger.json (Swagger 2.0, 142 endpoints) ### Notable endpoint surface - /api/v1/admin/* (12 endpoints) — listed in spec but UI never uses them - /internal/debug/dumpdb — confirmed live, requires no auth on dev-api - /api/v0/users — listed nowhere current, still serves 200 ### Secrets found in bundles - Stripe publishable key (public, OK) - Stripe SECRET KEY (sk_live_…) in admin.chunk.js ☠ - Internal Slack webhook in main.bundle.js ### Versions discovered prod: v3 (current), v0/v1/v2/beta all live ⚠ staging: v2 only dev: v3 + experimental v4 ### GraphQL graphql.example.com — introspection ON ☠ mutations exposed: deleteUser, impersonate, runRawSQL ☠ ### Next steps for active testing 1. test admin endpoints with normal-user token (BFLA) 2. fuzz /internal/debug/* for SSRF/RCE 3. test the impersonate mutation 4. report sk_live_ Stripe key leak NOW (before further testing)
Cheat sheet — tools, one-shot pipeline, defender list
A single-page reference for the whole recon workflow:
Tool / wordlist quick-ref
| Tool | Use |
|---|---|
| subfinder / amass / assetfinder | subdomain enum (passive) |
| crt.sh / SecurityTrails / Censys | Certificate Transparency, DNS history |
| httpx / dnsx | alive check + tech detect |
| katana / hakrawler / gospider | SPA-aware crawlers |
| waybackurls / gau | historical URL recovery |
| ffuf / gobuster / feroxbuster | directory + endpoint brute-force |
| kiterunner | verb-aware API endpoint discovery |
| arjun / x8 / parameth | hidden parameter discovery |
| inql / clairvoyance | GraphQL introspection / brute-force |
| graphql-voyager | GraphQL schema visualisation |
| LinkFinder / SecretFinder | JS bundle mining |
| gitleaks / trufflehog / nosey-parker | secret hunting in repos |
| nuclei + templates | broad template-based scanning |
| Postman / Insomnia / Bruno | organise + test the discovered surface |
| Burp Suite | capture + active testing |
One-shot recon pipeline
# environment export TARGET="example.com" # subdomains subfinder -d $TARGET -all -silent > subs.txt amass enum -passive -d $TARGET >> subs.txt sort -u subs.txt -o subs.txt # alive cat subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt # spec discovery on each alive host for h in $(awk '{print $1}' alive.txt); do for p in openapi.json swagger.json v3/api-docs api-docs graphql; do code=$(curl -s -o /dev/null -w "%{http_code}" "$h/$p") [ "$code" = "200" ] && echo "$h/$p" done done > specs.txt # wayback URLs across all subs for h in $(awk '{print $1}' alive.txt | sed 's|https\?://||'); do echo "$h" | waybackurls done | sort -u > wayback.txt # JS bundles via crawler cat alive.txt | xargs -I{} -P 4 katana -u {} -d 3 -jc 2>/dev/null \ | grep -E '\.js($|\?)' > js-urls.txt # pull bundles, grep mkdir -p bundles && cat js-urls.txt | parallel -j 4 wget -q -P bundles/ {} grep -rEoh '/api/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u > js-endpoints.txt grep -rEoh 'sk_(live|test)_[a-zA-Z0-9]{24,}' bundles/ > secrets.txt # ffuf gaps ffuf -u "https://api.$TARGET/FUZZ" -w api-wordlist.txt -mc 200,401,403 -o ffuf.json # consolidate cat js-endpoints.txt wayback.txt ffuf.json | extract_url.py | sort -u > all-endpoints.txt wc -l all-endpoints.txt
Defender checklist
| # | Control |
|---|---|
| 1. | OpenAPI spec — restrict to authenticated users in production; keep "internal" endpoints out of the public spec. |
| 2. | JS bundles — no hardcoded secrets; sourcemaps stripped from prod deploy. |
| 3. | Wayback — review archive.org for your domain; request takedowns of any leaked content. |
| 4. | Subdomains — inventory all of them. Decommissioned services = takeover risk; close their DNS. |
| 5. | API versions — decommission old versions explicitly. Block at LB if endpoint is "deprecated for security". |
| 6. | GraphQL — disable introspection in production. Disable field suggestions if using clairvoyance defense. |
| 7. | Standard error pages with no framework / version info. |
| 8. | WAF / rate-limit rules trigger on rapid 404 patterns (typical fuzz signature). |
| 9. | GitHub secret-scan + pre-commit hooks (gitleaks); revoke any committed secret immediately. |
| 10. | Run the same recon against yourself quarterly. Find what an attacker would find first. |
Closing thoughts
Three things to take away:
Most of the attack surface isn't in the docs. The public docs describe the API the team wants you to see. Production routing, meanwhile, still knows about old versions, dev subdomains, debug endpoints and leftover backup files — and all of it is reachable. Recon's whole job is to get the complete list, not the curated one. JS bundles, wayback, a version sweep and subdomain enum together dwarf whatever the official spec admits to.
Passive recon costs nothing; active recon costs noise. Spend the first hour entirely on passive sources — crt.sh, wayback, GitHub, Shodan, public Postman collections — without sending a single packet at the target. Only once that picture is built do you start active scanning with ffuf, kiterunner or introspection queries. It keeps your footprint small and keeps you clearly distinct from a blanket scanner in the logs.
Recon is a defensive tool too. If you can find your own forgotten v0 endpoint in twenty minutes, so can anyone else. Run this exact workflow against your own infrastructure every quarter — the output is your shadow-API inventory: old versions to decommission, leaked secrets to rotate, debug endpoints to firewall off. Defensive recon is about the cheapest security investment per finding you'll ever make.
Next on the API Security track: BOLA (Broken Object Level Authorization) — the most common API vulnerability, and the obvious first attack to run against the endpoint inventory you just built.