Find what the internet forgot to lock — operators, dorks, GitHub & Shodan, animated and defended
The search engine already did your recon
Search engines are the most thorough reconnaissance tool ever built — and someone else already paid for the crawl. Google has visited millions of servers and indexed whatever it found linked: open directories, forgotten backups, config files, admin panels, error pages that leak versions and paths. Google dorking (a.k.a. Google hacking) is simply asking for those things precisely, using search operators instead of plain keywords.
It's the purest form of passive OSINT: you never send a single packet to the target. You query Google's index, and Google answers from its cache of what it already saw. That makes dorking the quiet first step of nearly every engagement and bug-bounty recon — map the attack surface, harvest exposed files, and find the low-hanging fruit before you've touched the target at all.
The operators (that still work in 2026)
A dork is a query built from operators — modifiers that tell the engine where to look (URL, title, body, file type, domain) instead of just what. Here are the ones that still work in 2026 (Google has been retiring operators steadily):
| Operator | What it does | Example |
|---|---|---|
site: | Limit to a domain | site:example.com · site:*.example.com -www |
inurl: / allinurl: | Term appears in the URL | inurl:admin · inurl:wp-config |
intitle: / allintitle: | Term in the page title | intitle:"index of" |
intext: / allintext: | Term in the body | intext:"DB_PASSWORD" |
filetype: / ext: | Restrict to a file type | filetype:env · ext:sql |
"..." | Exact phrase | "-----BEGIN RSA PRIVATE KEY-----" |
- | Exclude a term | site:example.com -www |
OR / | | Either term | filetype:env | filetype:ini |
* | Wildcard (any words) | "index of" * "backup" |
.. | Number range | "camera" 2020..2024 |
AROUND(n) | Terms within n words | password AROUND(3) admin |
cache: was removed in September 2024 (use the Wayback Machine instead), and link:, info:, +, phonebook: and # are gone or unreliable. Old dork lists are full of these — they no longer fire.site:example.com filetype:pdf intext:"confidential" means "PDFs on example.com whose body says confidential." The skill is layering operators until only the interesting results remain."Index of" — open directory listings
The single most productive dork. When a web server has directory listing enabled and no index page, it renders an auto-generated "Index of /..." page — a clickable file browser of whatever is in that folder. Google indexes those pages, so you can search for open directories full of files nobody meant to publish.
intitle:"index of" "parent directory" intitle:"index of" backup intitle:"index of" "*.sql" | "*.bak" | "*.zip" intitle:"index of" ".env" intitle:"index of" "config" intitle:"index of" /admin site:example.com intitle:"index of"
Options -Indexes; Nginx: autoindex off;). It's the default-on behaviour that exposes whole folders. Then make sure nothing sensitive sits in a web-served path to begin with.Exposed files — .env, .git, dumps & keys
The crown jewels of dorking are files that should never have been web-reachable: environment files with database passwords and API keys, SQL dumps, exposed .git folders, log files, private keys. filetype:/ext: combined with a telltale string inside finds them.
# Environment files with secrets filetype:env "DB_PASSWORD" ext:env intext:"APP_KEY" # SQL dumps filetype:sql "INSERT INTO" "password" intitle:"index of" "dump.sql" # Exposed .git (then dump it with git-dumper) inurl:".git" intitle:"index of" intext:"[core]" ext:git config # Config / credentials / keys filetype:xml | filetype:conf | filetype:ini "password" intext:"-----BEGIN RSA PRIVATE KEY-----" filetype:log intext:"password" ext:json intext:"aws_secret_access_key"
.env or .git is often instant game-over: the .git folder reconstructs the whole source (git-dumper), and a .env hands over DB creds, API keys and signing secrets. These rank among the highest-impact findings in bug bounties precisely because dorking surfaces them in seconds..env, .git, backups or dumps to a public directory. Block dotfiles and known-sensitive paths at the web server, rotate any key that has ever been web-reachable, and scan your own surface with these dorks before an attacker does.Login & admin panels
Dorking maps the authentication surface fast — admin consoles, device logins, database UIs, dashboards that were never meant to face the internet. You're not breaking in; you're building a target list of every door.
inurl:admin intitle:login intitle:"phpMyAdmin" "Welcome to phpMyAdmin" inurl:/wp-admin/ | inurl:/wp-login.php intitle:"Login" inurl:"/dashboard" intitle:"Grafana" inurl:login inurl:":8080/manager/html" # exposed Tomcat manager intitle:"Jenkins" inurl:8080 site:example.com inurl:login | inurl:signin | inurl:admin
noindex so they never enter the index. Default-credential and weak-auth panels are a primary target once found, so MFA them and rename predictable paths.Error messages & version disclosure
Error pages are reconnaissance gold — they leak software versions, file-system paths, stack traces, and SQL structure. Dorking for telltale error strings finds servers that are verbose about their own internals.
# Database / SQL errors (often hint at SQLi too) intext:"SQL syntax" "near" intext:"Warning: mysql_connect()" intext:"Microsoft OLE DB Provider for SQL Server" # Verbose PHP / framework info "PHP Parse error" | "PHP Warning" | "PHP Error" intitle:"phpinfo()" "PHP Version" # Stack traces / debug pages intext:"Whoops, looks like something went wrong" # Laravel debug intext:"Traceback (most recent call last)" # Python debug
phpinfo() page or a framework debug trace tells an attacker your exact versions, paths and extensions — remove them.Exposed devices, cameras & dashboards
Internet-connected cameras, printers, NAS boxes and industrial panels announce themselves with recognisable titles and URL patterns — and far too many are indexed with no authentication. This is where dorking shades into device hunting (and where Shodan takes over — §9).
# Network cameras (classic, still finds live feeds) inurl:"viewerframe?mode=" intitle:"Live View / - AXIS" inurl:/view/index.shtml # Printers / NAS / panels intitle:"HP LaserJet" inurl:hp/device intitle:"Synology DiskStation" intitle:"Router" intext:"login" inurl:8080
noindex. If a device must be reachable, change defaults and restrict by IP.The Google Hacking Database (GHDB)
You don't have to invent dorks from scratch. The Google Hacking Database (GHDB) on Exploit-DB is a curated, categorised library of thousands of working dorks, maintained for exactly this purpose — the canonical reference Johnny Long started and Offensive Security now hosts.
| GHDB category | What it collects |
|---|---|
| Files Containing Passwords | Dorks that surface credentials in indexed files |
| Sensitive Directories | "index of" and exposed-folder patterns |
| Vulnerable Servers / Files | Software with known issues, by fingerprint |
| Error Messages | Version/path disclosure patterns |
| Pages Containing Login Portals | Admin/auth surface |
| Network / Vulnerability Data | Exposed configs, devices, monitoring |
GitHub dorking — secrets in public code
The biggest secret-leak source today isn't Google — it's GitHub. Developers commit .env files, API keys, tokens and private keys, then push to public repos. GitHub's code search is a dorking engine of its own, with its own operators, and it searches the actual file contents of millions of repositories.
# GitHub code-search syntax (search across public code) path:.env DB_PASSWORD "AKIA" language:JSON # AWS access key IDs start with AKIA filename:.npmrc _auth # npm auth tokens org:targetorg "api_key" # scoped to one org path:**/credentials AWS_SECRET_ACCESS_KEY "-----BEGIN PRIVATE KEY-----" language:PEM # Pivot: also search commit history, gists, and forks — secrets removed from # HEAD often survive in old commits. Tools: trufflehog, gitleaks, github-search. trufflehog github --org=targetorg
.gitignore for .env, environment variables / a secrets manager, and pre-commit secret scanning (gitleaks, trufflehog). If a secret was ever pushed, rotate it; removing the file does NOT remove it from history. Enable GitHub push protection / secret scanning on your org.Beyond Google — Shodan, Censys & FOFA
Google indexes web pages. Shodan, Censys and FOFA index the internet itself — they scan every IP and port and catalogue the service banners, so they see databases, RDP, IoT and ICS that Google never crawls. This is the natural escalation from dorking: from "what's been published" to "what's listening."
# Shodan (web UI or CLI: shodan search '<query>') org:"Target Inc" # everything owned by an org ssl.cert.subject.cn:target.com # by TLS certificate name http.title:"index of" # open directories, internet-wide port:9200 product:Elasticsearch # exposed Elasticsearch (often no auth) product:MongoDB port:27017 # exposed Mongo port:3389 # RDP exposed "230 Anonymous access granted" # anonymous FTP # Censys query language services.tls.certificates.leaf_data.subject.common_name: target.com services.service_name: ELASTICSEARCH # Shodan CLI quickstart shodan init <API_KEY> shodan search --fields ip_str,port,org 'ssl.cert.subject.cn:target.com' shodan host <IP> # full profile of one host
Chaining OSINT — from one dork to a full map
Dorking is one lens. Real OSINT recon chains sources — each finding feeds the next, building a map of the target's surface no single tool produces. A typical pivot:
# 1) Enumerate subdomains (passive) subfinder -d target.com -silent | tee subs.txt amass enum -passive -d target.com # 2) Dork each host for exposures site:*.target.com intitle:"index of" | inurl:admin | filetype:env # 3) Pull historical URLs that may still resolve (forgotten endpoints) gau target.com | tee urls.txt # getallurls (Wayback + others) katana -u https://target.com -jc # 4) Harvest emails / hosts / metadata theHarvester -d target.com -b all # 5) GitHub for leaked code/secrets, Shodan for live services (as above)
web.archive.org) is the de-facto replacement for the dead cache: operator — it holds historical snapshots of pages, including ones since taken down. Old snapshots frequently contain endpoints, parameters and content the live site has removed.Automation & tooling
Running dorks by hand is fine for a few; automation scales them — carefully, because Google aggressively rate-limits and CAPTCHAs automated querying.
| Tool | Use |
|---|---|
| pagodo / dorks-eye | Automate GHDB dorks against a target (throttle hard to avoid bans) |
| theHarvester | Emails, subdomains, hosts from many public sources |
| recon-ng | Modular OSINT framework (dorking, profiling, pivots) |
| Shodan / Censys CLI | Scriptable internet-wide service search + monitoring |
| gau / waybackurls / katana | Historical & crawled URL discovery |
| trufflehog / gitleaks | Secret scanning across GitHub/repos/history |
Defence — keeping yourself out of the index
The whole offensive chapter is also your hardening checklist — because the best defence is to never be in the index in the first place. The realistic priorities:
| Control | How |
|---|---|
| Keep secrets out of web roots | No .env, .git, backups, dumps in public paths; block dotfiles at the server |
| Disable directory listing | Apache Options -Indexes / Nginx autoindex off (kills "index of") |
noindex on sensitive pages | X-Robots-Tag: noindex / meta — the only reliable way to stay out of the index |
| Don't expose admin/devices | VPN or IP allowlist for panels, dashboards, databases, management ports |
| Kill verbose errors | Custom error pages, debug off in prod (no phpinfo/stack traces) |
| Secret scanning + rotation | gitleaks/trufflehog + push protection; rotate anything ever exposed |
| Remove what's already indexed | Google Search Console "Removals", then fix the underlying exposure |
robots.txt does NOT keep pages out of search. It asks crawlers not to fetch them, but a disallowed URL can still be indexed if linked elsewhere — and your robots.txt itself becomes a map of the paths you most want hidden. Use noindex, not robots.txt, to stay out of the index.site:yourdomain.com plus the file-type and "index of" patterns. The fastest way to know what an attacker can find is to look first.Closing — ask precisely
Google dorking endures because the failure it exploits never goes away: somebody, somewhere, publishes something they shouldn't, and a crawler finds it. The operators change (pour one out for cache:), the engines multiply (Shodan, Censys, GitHub code search), but the move is constant — ask precisely for what people forgot to lock down.
Two habits make you good at it. Offensively: layer operators until only signal remains, and always chain — a subdomain feeds a dork feeds a Shodan query feeds a GitHub search. Defensively: assume everything you put on a server will be indexed, and design so that even when it is, there's nothing worth finding.
Spend an hour dorking your own domain and a repo you own. The first time filetype:env or a GitHub secret-search lights up on something you forgot about, the value of this technique — and the urgency of the defences — stops being theoretical.