API Reconnaissance

Mapping the API attack surface before you send a single attack: OpenAPI spec discovery, JS bundle mining, recovering historical URLs from the Wayback Machine, subdomain enum, version sprawl, GraphQL introspection, and wordlist brute-force — seven recon workflows, walked through one at a time.

openapi · js mining · wayback · subdomain · version sprawl · graphql · ffuf

Overview — recon is half the engagement

Recon is the first hour of every API engagement and the last hour you should ever skip. Before sending a single attack, you want to know: how many endpoints exist, what auth they expect, what versions are still alive, what dev/staging/legacy deployments share the same code, what hidden parameters change behaviour, what the team forgot to take down. Get this right and the attacks that follow are targeted; skip it and you're fuzzing the wrong surface for a week.

We'll go through seven recon techniques in roughly the order you'd reach for them: spec discovery, JS bundle mining, wayback historical URLs, subdomain enumeration, version sweeping, GraphQL introspection, and wordlist brute-force. None of them need an authenticated session, and a few don't even send a packet to the target.

What "API recon" produces

OutputWhy it matters
Endpoint inventoryEvery URL × method the target exposes — official + forgotten + dev/staging. Feed into Postman / Burp for systematic testing.
Auth scheme mapWhich endpoints expect API key vs Bearer vs mTLS vs nothing. Where weak auth is permitted (older versions, debug routes).
Schema awarenessField names — including fields the UI never displays. Often more sensitive than discovered endpoints (passwordHash, internalNotes, etc.).
Tech fingerprintFramework (Django REST, Express, FastAPI, Spring), proxy (nginx, Cloudflare), language. Drives which CVEs and behaviours to test for.
Secrets / leaksHardcoded keys in JS bundles, deprecated dev URLs in old commits, debug logging output. Often the engagement ends here.
Subdomain + version sprawlSame API deployed on 5 different hosts at 3 different versions. The weakest deployment dictates the strength of the system.

Recon hygiene

Only fully passive recon is truly free — crt.sh, the Wayback Machine, public GitHub. The moment you start active probing (ffuf, kiterunner, introspection queries) you're showing up in the target's logs, so make sure your engagement scope actually covers it. Bug-bounty programs usually allow active recon, but check before you assume. And rate-limit yourself — your job is to find bugs, not to knock the thing over.

Spec discovery — find the OpenAPI, get a free inventory

If the target ships an OpenAPI / Swagger spec, recon shortcuts by ~10×. Every endpoint, parameter, auth scheme, and example payload is in one JSON file. Worth a few minutes of focused fuzzing to find.

Common spec paths to try

shell
# OpenAPI / Swagger
/openapi.json
/openapi.yaml
/swagger.json
/swagger.yaml
/v3/api-docs            # springfox / springdoc default
/v2/api-docs            # older springfox
/api-docs
/api/swagger.json
/api/v1/swagger.json
/api/v2/openapi.json
/docs/openapi.yaml
/docs/api/spec.json
/_api/spec
/_openapi
/spec
/api/v3/spec.yaml

# UI-driven explorers (often unprotected)
/swagger-ui/
/swagger-ui/index.html
/redoc
/redoc/index.html
/api/docs
/docs
/scalar
/rapidoc

# GraphQL playgrounds
/graphql
/graphiql
/playground

# RAML / WSDL legacy
/api.raml
/services?wsdl

Wordlist + ffuf

bash
# fuzz a curated wordlist of API doc paths
ffuf -u https://api.example.com/FUZZ \
     -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \
     -mc 200,401,403 \
     -fs 0 \
     -t 50 \
     -o spec-discovery.json

# kiterunner (smarter — knows API patterns + verbs)
kr scan https://api.example.com -w routes-large.kite

# or curl all at once
for p in openapi.json swagger.json v3/api-docs api-docs docs/openapi.yaml ; do
  echo -n "$p: "
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/$p
done

What to do with a spec

#Action
1. Import to Postman / BrunoEvery endpoint becomes a runnable saved request. See Postman / Pentesting deep-dive.
2. jq grep for keywordsjq '..|.paths?|keys?' → all paths. Then grep -E "(admin|internal|debug|test|legacy)".
3. Extract field namesjq '..|objects|.properties?|keys?' → response schema fields. Look for passwordHash, ssn, apiKey, internalToken.
4. Auth scheme auditjq .components.securitySchemes — does ANY endpoint allow weaker auth (Basic alongside Bearer)?
5. servers sectionSometimes the "servers" block lists internal URLs (http://10.x.x.x, http://internal.lan). Pivot targets.
6. "x-" extensionsx-internal: true, x-hidden: true, x-experimental often mark sensitive endpoints the docs site hides — but the spec still lists them.
💡 Save the spec locally and never re-download it during the engagement. Public targets serve specs with different versions / endpoints depending on user-agent, geo, or CDN cache state. One frozen copy gives you a stable baseline.

JS bundle mining — the most-shipped doc in your target

When there's no spec — or even when there is — the most useful API documentation in the target is the JS bundle. Frontend code makes every API call your target supports; greppable strings reveal endpoints, auth headers, hardcoded keys, and feature flags. Modern SPAs ship 1-5 MB of minified JS per page — full app logic, ready to be mined.

Collecting bundles

bash
# 1. browse the app in a clean browser profile
# 2. DevTools → Network tab → filter: JS
# 3. select all → right-click → "Save all as HAR"
#    OR per-file: right-click → "Save as"

# 4. headless alternative — pull every JS the index references
curl -s https://app.example.com/ \
  | grep -oE 'src="[^"]+\.js[^"]*"' \
  | grep -oE '[^"]+\.js' \
  | while read url; do
      [[ "$url" == http* ]] || url="https://app.example.com$url"
      wget -q -P bundles/ "$url"
    done

# 5. or use a tool that does the SPA-aware fetch (waits for runtime chunks)
katana -u https://app.example.com -d 3 -jc -o crawl.txt
gospider -s https://app.example.com -d 3 -c 10 -t 20 --js

Mining patterns

bash
# 1. endpoints — fetch/axios/jquery
grep -rEoh 'fetch\(\s*"[^"]+"' bundles/  | sort -u
grep -rEoh 'axios\.\w+\(\s*"[^"]+"' bundles/ | sort -u
grep -rEoh '\$.\w+\(\s*"[^"]+"' bundles/  | sort -u

# 2. relative URL strings starting with /api or /v\d
grep -rEoh '"[a-zA-Z./-]*?/api/[a-zA-Z./_-]+' bundles/ | sort -u
grep -rEoh '"/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u

# 3. secrets — Stripe keys, AWS, GitHub, Slack tokens
grep -rEoh 'sk_live_[a-zA-Z0-9]{24,}' bundles/
grep -rEoh 'AKIA[0-9A-Z]{16}' bundles/
grep -rEoh 'ghp_[a-zA-Z0-9]{36}' bundles/
grep -rEoh 'xox[bpsa]-[a-zA-Z0-9-]+' bundles/

# 4. config blocks — API_BASE, FEATURE_FLAGS, REGION
grep -rEoh '(apiKey|API_KEY|secret|token|baseUrl|API_BASE)[\s:=]+"[^"]+"' bundles/

Dedicated tools

ToolUse
LinkFinderPython tool with regex-tuned for endpoint extraction. python linkfinder.py -i bundles/ -o results.html.
SecretFinderLinkFinder's sibling for hardcoded secrets. Many overlapping regexes with gitleaks.
nuclei -t exposures/tokensNuclei templates for live URL secret scanning.
katana / hakrawler / gospiderSPA-aware crawlers that wait for runtime chunks.
js-beautify / prettierBeautify before grep — sometimes regex patterns need newlines to work right.
source-map-explorer / unminIf a .js.map shipped with the bundle (see our SPA Security article!) you can get original .ts/.tsx source back.
💡 Lazy-loaded chunks are the goldmine. Main bundle ships the public app; chunks load only when the user opens an admin / settings / billing page. Grep the chunks separately — they expose features the public spec never mentions.

Wayback Machine — endpoints the team forgot existed

The Internet Archive Wayback Machine crawls public web continuously. For any target with a 5+ year history, it has snapshots of URLs that today's site never links to — old API versions, legacy endpoints, backup files, .json dumps left there during a migration in 2019. Many of those URLs still respond on production today; nobody removed them.

Pulling historical URLs

bash
# waybackurls — easiest
echo "api.example.com" | waybackurls | sort -u > historical-urls.txt

# more sources at once
echo "api.example.com" | gau --providers wayback,otx,commoncrawl,urlscan > all-sources.txt

# CDX API directly (no tool)
curl -s "http://web.archive.org/cdx/search/cdx?url=api.example.com/*&output=text&fl=original&collapse=urlkey" \
  | sort -u > historical-urls.txt

# common-crawl — bigger / older
curl -s "https://index.commoncrawl.org/CC-MAIN-2024-26-index?url=api.example.com/*&output=json" \
  | jq -r .url | sort -u

# subdomain-aware
echo "*.example.com" | waybackurls

What to grep for

bash
# old API versions
grep -E "/api/v[0-9]+/" historical-urls.txt

# debug / internal / admin / test endpoints
grep -Ei "/(debug|internal|admin|test|qa|legacy|old|deprecated|sandbox)/" historical-urls.txt

# extensions that scream "leftover"
grep -Ei "\.(bak|backup|orig|old|sql|swp|swagger|json\.bak)$" historical-urls.txt

# query params worth re-trying today (admin=true, debug=1)
grep -oE '\?[a-z_]+(=[^&]+)?' historical-urls.txt | sort -u

Live re-check

bash
# probe every candidate against current production
cat candidates.txt | httpx -status-code -follow-redirects -no-color \
  | grep -v "404"

# or curl loop with a sane rate
while read url; do
  echo -n "$url -> "
  curl -s -o /dev/null -w "%{http_code} (%{size_download}b)\n" "$url"
  sleep 0.5
done < candidates.txt

What you commonly find

FindDetail
Old API versions still routed/api/v1/users returns 200, current site uses /api/v3. v1 = older auth, no rate-limit, original SQL injection bug never patched in old code.
.bak / .old filesconfig.php.bak with original DB credentials. dump.sql sitting in /backups/ from a 2021 migration.
Pre-launch debug endpoints/debug/info that returned env vars in 2020 when the app was being built — quietly still working on prod.
S3 / Azure / GCS bucket URLsWayback caches them. The bucket itself often still public.
Old API docs / Swagger UI/v1/swagger-ui/ from before the team disabled it on the current version.

Subdomain enum — find the weaker deployment

A target named example.com almost never has just one host. There's api., dev-api., staging-api., admin., internal., maybe old-api. Each is often a different deployment of the same code — different security posture, different oversight. Pentesting them as siblings is mandatory.

Passive sources (no traffic to target)

bash
# subfinder — multi-source aggregator
subfinder -d example.com -all -silent -o subs.txt

# amass — passive only
amass enum -passive -d example.com -o subs.txt

# crt.sh — Certificate Transparency logs (every cert ever issued for *.example.com)
curl -s "https://crt.sh/?q=%25.example.com&output=json" \
  | jq -r ".[].name_value" | sort -u

# assetfinder
echo example.com | assetfinder --subs-only

# combine them all
( subfinder -d example.com -silent; \
  amass enum -passive -d example.com; \
  assetfinder --subs-only example.com; \
  curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r ".[].name_value" \
) | sort -u > all-subs.txt

Active probe of each

bash
# httpx — alive check, tech detect, status code
cat all-subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt

# or with screenshots
cat all-subs.txt | httpx -screenshot -srd shots/

# DNS-only — what resolves vs what doesn't
cat all-subs.txt | dnsx -silent -resp

Sub-subdomains that often matter

Subdomain patternWhy interesting
api., api-v2., api-internal.Direct API hosts. Most likely target.
dev-api., staging-api., qa-api., test-api.Same API, weaker auth. Same vulns as prod — without prod's alerting.
admin., internal., ops.Admin panels. Often weaker auth + assume "behind VPN" but exposed anyway.
old-api., legacy., v1.Forgotten deployments. The team migrated away years ago.
graphql., gql.GraphQL gateways. Often shipped without introspection disabled.
webhook., events., callback.Inbound webhook receivers. Sometimes accept SSRF payloads as event sources.
cdn., assets., static.Static-asset hosts. Source maps, JS bundles, dumped files.

Acquisition / takeover risk

Many subdomains point at deprovisioned services (S3 buckets that were deleted, Heroku apps no longer claimed, Azure CNAMEs dangling). Check every alive subdomain with subjack or nuclei -t takeovers. Subdomain takeover lets an attacker serve content on the target's domain — instant XSS + cookie theft against the parent site.

Version enum — old versions outlive the team that built them

Even on a single subdomain, the same API often runs at multiple versions side by side. Frontend uses v3; nginx still routes v1, v2, v4 (staging), beta, internal. Old versions outlive the teams that built them — and outlive the security improvements those teams made later.

Sweeping versions

bash
# basic
for v in v0 v1 v2 v3 v4 v5 internal beta staging dev test old legacy; do
  echo -n "/api/$v/users: "
  curl -s -o /dev/null -w "%{http_code}\n" https://api.example.com/api/$v/users
done

# ffuf — versioned wordlist
ffuf -u https://api.example.com/api/FUZZ/users \
     -w api-versions.txt \
     -mc 200,401,403,500

# Burp Intruder — same idea with full request manipulation

What changes between versions

AspectTypical drift
Auth schemev1 accepted Basic; v3 requires Bearer + PKCE. v1 still alive = downgrade path.
Input validationv1 didn't parameterise queries — SQL injection still works. Validation was added in v2.
Rate limitingAdded in v3. v1 / v2 still let attackers spray credentials.
Field exposurev1 response included passwordHash; v3 removed it. Old route still returns it.
Error formatv1 leaks stack traces in 500s; v3 returns generic JSON. Use v1's leaks to fingerprint v3's internals.
Verb supportv1 accepted DELETE on /users/{id} without confirmation; v3 requires re-auth. Use the older verb.

Beyond integer versions

StyleProbe
By Accept headerAccept: application/vnd.example.v1+json — some APIs version by mime-type. Try v0, v1, v2 in the header.
By query param?api_version=1, ?v=1.0. Old behaviour might trigger when omitted.
By subdomainapi-v1.example.com vs api.example.com (cf. §5).
By path prefix/api/v1/, /api2/, /v3/api/. Try all permutations.

GraphQL introspection — the schema is a free attack-surface map

GraphQL has built-in introspection — a query that returns the entire schema. By default this is enabled in development frameworks; many teams forget to disable in production. One introspection query → every type, field, query, mutation, argument — the entire attack surface for free.

Detecting GraphQL

bash
# common paths
/graphql
/api/graphql
/v1/graphql
/v2/graphql
/query
/__graphql
/graphiql       # interactive UI
/playground     # Apollo playground

# detection probe
curl -s -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{__typename}"}'
# {"data":{"__typename":"Query"}}  ← confirmed GraphQL

The introspection query

bash
# minimal
curl -X POST https://api.example.com/graphql \
  -H "Content-Type: application/json" \
  -d '{"query":"{__schema{types{name}}}"}'

# full standard introspection — too long for here, see:
# https://raw.githubusercontent.com/graphql/graphql-js/main/src/utilities/getIntrospectionQuery.ts

# tools
inql introspect -t https://api.example.com/graphql -o schema.json
graphql-introspection-query  > schema.json
clairvoyance -u https://api.example.com/graphql -w common-words.txt  # if introspection disabled

# Burp + InQL extension automates the whole flow

# render the schema as a graph
graphql-voyager   # open schema.json
graphqlcheck      # security-focused linter

What the schema reveals

RevealDetail
Mutations the UI never exposesdeleteUser, impersonate, runRawSQL, executeShell — exist in the schema, never wired into the app, but still callable.
Field-level secretsUser.passwordHash, User.mfaSecret — fields that should never be queryable from the public API.
Internal typesAdminUser, InternalEvent — types whose existence reveals admin functionality.
Deprecated fields@deprecated directives mark soft-removed surface. Often still works, just hidden from docs.
Arg shapesArgument types reveal what custom enums, JSON-blob inputs, file uploads exist. Each is a fuzz target.

When introspection is disabled

shell
# clairvoyance — brute-force the schema from error messages
clairvoyance -u https://api.example.com/graphql -w common-graphql-fields.txt -o schema.json

# how it works:
#  query { aRandomField }
#  → server: "Cannot query field 'aRandomField' on type 'Query'. Did you mean 'foo, bar, baz'?"
#  → clairvoyance parses the suggestion + iterates
#  often recovers ~80% of the schema even with introspection off
💡 GraphQL also enables other attack classes: batching, alias overloading, depth/complexity attacks, query weight bypass. See the GraphQL Security deep-dive for the attack catalogue. The recon here is the prerequisite step.

Wordlist brute-force — when there's no spec

No spec, no GraphQL, no obvious indicators — when everything else fails, brute-force the URL space. Modern wordlists (Assetnote, SecLists) cover thousands of common API patterns; modern fuzzers (ffuf, kiterunner, arjun) are smart about verbs and parameter discovery.

Endpoint brute-force

shell
# ffuf — fast, simple
ffuf -u https://api.example.com/FUZZ \
     -w SecLists/Discovery/Web-Content/api/api-endpoints.txt \
     -mc 200,401,403,500 \
     -fs 0 \
     -t 50

# Recurse into discovered directories
ffuf -u https://api.example.com/users/FUZZ \
     -w common.txt -mc 200,401,403 -recursion -recursion-depth 3

# kiterunner — verb-aware, knows API patterns
kr scan https://api.example.com -w routes-large.kite

# kiterunner brute (random GUID-style API paths)
kr brute https://api.example.com -w routes-small.txt

# gobuster
gobuster dir -u https://api.example.com -w wordlist.txt -s "200,401,403" -k

# feroxbuster
feroxbuster -u https://api.example.com -w wordlist.txt -s 200,401,403

Parameter brute-force

shell
# arjun — hidden parameters on a known endpoint
arjun -u https://api.example.com/v1/search -m GET --include="q=test"
arjun -u https://api.example.com/v1/users -m POST -i POST.json

# x8 — alternative with smarter heuristics
x8 -u https://api.example.com/v1/search -w params.txt

# parameth — python alternative
python parameth.py -u https://api.example.com/v1/search

Wordlists worth knowing

WordlistUse
Assetnote wordlistswordlists.assetnote.io — curated by API-focused researchers. Huge "kiterunner-routes-large" set.
SecLists/Discovery/Web-Content/api/api-endpoints.txt, common-api-endpoints-mazen160.txt — classic baseline.
SecLists/Fuzzing/PolyglotsFor body fuzzing when an endpoint is found.
SecLists/Discovery/Web-Content/burp-parameter-names.txtParameter wordlist for arjun.
raft-large-words / raft-large-filesOG OWASP raft wordlists. Still relevant.

Tuning for noise

FlagUse
-mc / status filter-mc 200,401,403 keeps only useful responses. Filter out the 404 baseline.
-fs / size filter-fs 0 hides empty-body responses. -fs 1234 hides ones with same size as a known 404 page.
-mr / regex matchMatch on response body content (e.g., -mr "graphql|swagger") for targeted discovery.
-t / threads-t 50 is aggressive; -t 10 is polite. WAFs / rate limiters will ban you above ~100.
-p / delay-p 0.1 = 100ms delay between requests. Use against fragile or production targets.
-H / headers-H "Authorization: Bearer …" — fuzz authenticated endpoints. Often a different surface than unauth.
Wordlist fuzzing is by far the loudest recon technique. The logs will read "10,000 requests from 1.2.3.4 in 60 seconds, mostly 404s" — there's no hiding it. Some bug-bounty programs flatly ban aggressive automated scanning, so stay inside scope.

Other passive sources — Github, Shodan, mobile apps

Beyond the seven main techniques, a few smaller sources occasionally turn up the critical find:

Other passive sources worth checking

SourceWhat to look for
GitHub code search"api.example.com" filename:.env, "api.example.com" "Authorization: Bearer". Devs commit secrets surprisingly often.
Postman public collectionshttps://www.postman.com/search?q=example.com&type=collection. Employees sometimes publish working collections.
Shodan / CensysShodan: ssl.cert.subject.cn:"*.example.com" — finds hosts even if DNS doesn't resolve.
Pastebin / ghostbin / gistTokens, dumps, "help, my API isn't working" posts with full requests.
JS source mapsSee SPA Security article — bundle.js.map reconstructs original source.
robots.txt + sitemap.xmlSometimes lists API endpoints meant to be hidden from crawlers. Disallow lines = "here's where we don't want you to look".
.well-known/*/.well-known/openid-configuration, /.well-known/security.txt, /.well-known/api-catalog (draft).
CSP report-uri / report-toCSP header often points at an /report endpoint — and sometimes a CSP report leaks server-side details on accepted payloads.
error pagesDefault 404 / 500 pages often reveal framework + version (Django, Rails, Spring Boot whitelabel error page).

Mobile app extraction

bash
# pull the APK
adb shell pm list packages | grep example
adb shell pm path com.example.app
adb pull /data/app/com.example.app-1/base.apk

# decode it
apktool d base.apk -o decoded/

# search for API endpoints
grep -rEoh 'https?://[^"]+' decoded/ | sort -u
grep -rEoh '/api/v\d+/[^"]+' decoded/ | sort -u

# secrets
grep -rE '(api_key|secret|password)' decoded/

# iOS .ipa is similar:
unzip app.ipa -d app/
strings Payload/example.app/example | grep -E 'https?://'

Workflow — chaining the techniques in 2 hours

Here's how the techniques slot together in order on a real engagement:

Recon workflow — the first 2 hours

TimeAction
Min 0-10subfinder + amass + crt.sh → all-subs.txt. httpx for alive check.
Min 10-20For each alive host: try /openapi.json, /swagger.json, /v3/api-docs, /graphql. Pull anything that returns 200.
Min 20-30Browse the app in a clean browser with DevTools open. Capture HAR file. Save all JS bundles.
Min 30-50JS mining — endpoints + secrets + config. Combine with spec if found.
Min 50-70waybackurls + gau on each subdomain. Filter for interesting paths. Live-check.
Min 70-90Version enum on each discovered subdomain. v0/v1/v2/internal/beta sweep.
Min 90-110GraphQL introspection if /graphql found. Schema → InQL → audit.
Min 110-120ffuf / kiterunner on the holes — paths not covered by spec, JS, or wayback.
End of hour 2Consolidated endpoint inventory. Imported to Postman. Ready to test.

Recon report deliverable

shell
## API Recon — Acme Corp (example.com) — 2026-05-23

### Targets
- api.example.com           [prod]    nginx + node, JWT auth
- dev-api.example.com       [dev]     no auth ⚠
- staging-api.example.com   [staging] HTTP Basic ⚠
- old-api.example.com       [legacy]  no rate limit ⚠

### Discovered specs
- https://api.example.com/openapi.json  (OpenAPI 3.0.2, 187 endpoints)
- https://staging-api.example.com/v1/swagger.json  (Swagger 2.0, 142 endpoints)

### Notable endpoint surface
- /api/v1/admin/* (12 endpoints) — listed in spec but UI never uses them
- /internal/debug/dumpdb — confirmed live, requires no auth on dev-api
- /api/v0/users — listed nowhere current, still serves 200

### Secrets found in bundles
- Stripe publishable key (public, OK)
- Stripe SECRET KEY (sk_live_…) in admin.chunk.js ☠
- Internal Slack webhook in main.bundle.js

### Versions discovered
prod: v3 (current), v0/v1/v2/beta all live ⚠
staging: v2 only
dev: v3 + experimental v4

### GraphQL
graphql.example.com — introspection ON ☠
mutations exposed: deleteUser, impersonate, runRawSQL ☠

### Next steps for active testing
1. test admin endpoints with normal-user token (BFLA)
2. fuzz /internal/debug/* for SSRF/RCE
3. test the impersonate mutation
4. report sk_live_ Stripe key leak NOW (before further testing)

Cheat sheet — tools, one-shot pipeline, defender list

A single-page reference for the whole recon workflow:

Tool / wordlist quick-ref

ToolUse
subfinder / amass / assetfindersubdomain enum (passive)
crt.sh / SecurityTrails / CensysCertificate Transparency, DNS history
httpx / dnsxalive check + tech detect
katana / hakrawler / gospiderSPA-aware crawlers
waybackurls / gauhistorical URL recovery
ffuf / gobuster / feroxbusterdirectory + endpoint brute-force
kiterunnerverb-aware API endpoint discovery
arjun / x8 / paramethhidden parameter discovery
inql / clairvoyanceGraphQL introspection / brute-force
graphql-voyagerGraphQL schema visualisation
LinkFinder / SecretFinderJS bundle mining
gitleaks / trufflehog / nosey-parkersecret hunting in repos
nuclei + templatesbroad template-based scanning
Postman / Insomnia / Brunoorganise + test the discovered surface
Burp Suitecapture + active testing

One-shot recon pipeline

bash
# environment
export TARGET="example.com"

# subdomains
subfinder -d $TARGET -all -silent > subs.txt
amass enum -passive -d $TARGET >> subs.txt
sort -u subs.txt -o subs.txt

# alive
cat subs.txt | httpx -title -status-code -tech-detect -no-color -o alive.txt

# spec discovery on each alive host
for h in $(awk '{print $1}' alive.txt); do
  for p in openapi.json swagger.json v3/api-docs api-docs graphql; do
    code=$(curl -s -o /dev/null -w "%{http_code}" "$h/$p")
    [ "$code" = "200" ] && echo "$h/$p"
  done
done > specs.txt

# wayback URLs across all subs
for h in $(awk '{print $1}' alive.txt | sed 's|https\?://||'); do
  echo "$h" | waybackurls
done | sort -u > wayback.txt

# JS bundles via crawler
cat alive.txt | xargs -I{} -P 4 katana -u {} -d 3 -jc 2>/dev/null \
  | grep -E '\.js($|\?)' > js-urls.txt

# pull bundles, grep
mkdir -p bundles && cat js-urls.txt | parallel -j 4 wget -q -P bundles/ {}
grep -rEoh '/api/v\d+/[a-zA-Z./_-]+' bundles/ | sort -u > js-endpoints.txt
grep -rEoh 'sk_(live|test)_[a-zA-Z0-9]{24,}' bundles/ > secrets.txt

# ffuf gaps
ffuf -u "https://api.$TARGET/FUZZ" -w api-wordlist.txt -mc 200,401,403 -o ffuf.json

# consolidate
cat js-endpoints.txt wayback.txt ffuf.json | extract_url.py | sort -u > all-endpoints.txt
wc -l all-endpoints.txt

Defender checklist

#Control
1.OpenAPI spec — restrict to authenticated users in production; keep "internal" endpoints out of the public spec.
2.JS bundles — no hardcoded secrets; sourcemaps stripped from prod deploy.
3.Wayback — review archive.org for your domain; request takedowns of any leaked content.
4.Subdomains — inventory all of them. Decommissioned services = takeover risk; close their DNS.
5.API versions — decommission old versions explicitly. Block at LB if endpoint is "deprecated for security".
6.GraphQL — disable introspection in production. Disable field suggestions if using clairvoyance defense.
7.Standard error pages with no framework / version info.
8.WAF / rate-limit rules trigger on rapid 404 patterns (typical fuzz signature).
9.GitHub secret-scan + pre-commit hooks (gitleaks); revoke any committed secret immediately.
10.Run the same recon against yourself quarterly. Find what an attacker would find first.

Closing thoughts

Three things to take away:

Most of the attack surface isn't in the docs. The public docs describe the API the team wants you to see. Production routing, meanwhile, still knows about old versions, dev subdomains, debug endpoints and leftover backup files — and all of it is reachable. Recon's whole job is to get the complete list, not the curated one. JS bundles, wayback, a version sweep and subdomain enum together dwarf whatever the official spec admits to.

Passive recon costs nothing; active recon costs noise. Spend the first hour entirely on passive sources — crt.sh, wayback, GitHub, Shodan, public Postman collections — without sending a single packet at the target. Only once that picture is built do you start active scanning with ffuf, kiterunner or introspection queries. It keeps your footprint small and keeps you clearly distinct from a blanket scanner in the logs.

Recon is a defensive tool too. If you can find your own forgotten v0 endpoint in twenty minutes, so can anyone else. Run this exact workflow against your own infrastructure every quarter — the output is your shadow-API inventory: old versions to decommission, leaked secrets to rotate, debug endpoints to firewall off. Defensive recon is about the cheapest security investment per finding you'll ever make.

Next on the API Security track: BOLA (Broken Object Level Authorization) — the most common API vulnerability, and the obvious first attack to run against the endpoint inventory you just built.

Reactions

Related Articles