Google Dorking & OSINT

Search engines have already crawled the things people forgot to lock down — exposed .env files, open directories, admin panels, leaked keys. Google dorking is the art of asking for them precisely. This is the complete, animated field guide: every operator that still works in 2026, the dorks that surface credentials and devices, GitHub code-search and the Shodan/Censys pivot, full automation — and how to keep your own assets out of the index.

Find what the internet forgot to lock — operators, dorks, GitHub & Shodan, animated and defended

The search engine already did your recon

Search engines are the most thorough reconnaissance tool ever built — and someone else already paid for the crawl. Google has visited millions of servers and indexed whatever it found linked: open directories, forgotten backups, config files, admin panels, error pages that leak versions and paths. Google dorking (a.k.a. Google hacking) is simply asking for those things precisely, using search operators instead of plain keywords.

It's the purest form of passive OSINT: you never send a single packet to the target. You query Google's index, and Google answers from its cache of what it already saw. That makes dorking the quiet first step of nearly every engagement and bug-bounty recon — map the attack surface, harvest exposed files, and find the low-hanging fruit before you've touched the target at all.

The searching is passive and legal. Acting on what you find is not. Viewing an exposed admin panel's login page is one thing; logging in, downloading a leaked database, or using found credentials is unauthorised access — a crime. Dork your own assets, your bug-bounty scope, or a lab. Treat everything else as look-but-don't-touch.

The operators (that still work in 2026)

A dork is a query built from operators — modifiers that tell the engine where to look (URL, title, body, file type, domain) instead of just what. Here are the ones that still work in 2026 (Google has been retiring operators steadily):

OperatorWhat it doesExample
site:Limit to a domainsite:example.com · site:*.example.com -www
inurl: / allinurl:Term appears in the URLinurl:admin · inurl:wp-config
intitle: / allintitle:Term in the page titleintitle:"index of"
intext: / allintext:Term in the bodyintext:"DB_PASSWORD"
filetype: / ext:Restrict to a file typefiletype:env · ext:sql
"..."Exact phrase"-----BEGIN RSA PRIVATE KEY-----"
-Exclude a termsite:example.com -www
OR / |Either termfiletype:env | filetype:ini
*Wildcard (any words)"index of" * "backup"
..Number range"camera" 2020..2024
AROUND(n)Terms within n wordspassword AROUND(3) admin
Dead / degraded operators (don't waste dorks on these): cache: was removed in September 2024 (use the Wayback Machine instead), and link:, info:, +, phonebook: and # are gone or unreliable. Old dork lists are full of these — they no longer fire.
Operators combine with AND logic by default — stack them to narrow hard: site:example.com filetype:pdf intext:"confidential" means "PDFs on example.com whose body says confidential." The skill is layering operators until only the interesting results remain.

"Index of" — open directory listings

The single most productive dork. When a web server has directory listing enabled and no index page, it renders an auto-generated "Index of /..." page — a clickable file browser of whatever is in that folder. Google indexes those pages, so you can search for open directories full of files nobody meant to publish.

shell
intitle:"index of" "parent directory"
intitle:"index of" backup
intitle:"index of" "*.sql" | "*.bak" | "*.zip"
intitle:"index of" ".env"
intitle:"index of" "config"
intitle:"index of" /admin
site:example.com intitle:"index of"
🛡 Defend: Disable automatic directory listing (Apache: Options -Indexes; Nginx: autoindex off;). It's the default-on behaviour that exposes whole folders. Then make sure nothing sensitive sits in a web-served path to begin with.

Exposed files — .env, .git, dumps & keys

The crown jewels of dorking are files that should never have been web-reachable: environment files with database passwords and API keys, SQL dumps, exposed .git folders, log files, private keys. filetype:/ext: combined with a telltale string inside finds them.

bash
# Environment files with secrets
filetype:env "DB_PASSWORD"
ext:env intext:"APP_KEY"

# SQL dumps
filetype:sql "INSERT INTO" "password"
intitle:"index of" "dump.sql"

# Exposed .git (then dump it with git-dumper)
inurl:".git" intitle:"index of"
intext:"[core]" ext:git config

# Config / credentials / keys
filetype:xml | filetype:conf | filetype:ini "password"
intext:"-----BEGIN RSA PRIVATE KEY-----"
filetype:log intext:"password"
ext:json intext:"aws_secret_access_key"
An exposed .env or .git is often instant game-over: the .git folder reconstructs the whole source (git-dumper), and a .env hands over DB creds, API keys and signing secrets. These rank among the highest-impact findings in bug bounties precisely because dorking surfaces them in seconds.
🛡 Defend: Keep secrets out of web roots entirely; never deploy .env, .git, backups or dumps to a public directory. Block dotfiles and known-sensitive paths at the web server, rotate any key that has ever been web-reachable, and scan your own surface with these dorks before an attacker does.

Login & admin panels

Dorking maps the authentication surface fast — admin consoles, device logins, database UIs, dashboards that were never meant to face the internet. You're not breaking in; you're building a target list of every door.

shell
inurl:admin intitle:login
intitle:"phpMyAdmin" "Welcome to phpMyAdmin"
inurl:/wp-admin/ | inurl:/wp-login.php
intitle:"Login" inurl:"/dashboard"
intitle:"Grafana" inurl:login
inurl:":8080/manager/html"           # exposed Tomcat manager
intitle:"Jenkins" inurl:8080
site:example.com inurl:login | inurl:signin | inurl:admin
🛡 Defend: Don't expose admin/management interfaces to the internet — put them behind a VPN or IP allowlist, and add noindex so they never enter the index. Default-credential and weak-auth panels are a primary target once found, so MFA them and rename predictable paths.

Error messages & version disclosure

Error pages are reconnaissance gold — they leak software versions, file-system paths, stack traces, and SQL structure. Dorking for telltale error strings finds servers that are verbose about their own internals.

shell
# Database / SQL errors (often hint at SQLi too)
intext:"SQL syntax" "near"
intext:"Warning: mysql_connect()"
intext:"Microsoft OLE DB Provider for SQL Server"

# Verbose PHP / framework info
"PHP Parse error" | "PHP Warning" | "PHP Error"
intitle:"phpinfo()" "PHP Version"

# Stack traces / debug pages
intext:"Whoops, looks like something went wrong"   # Laravel debug
intext:"Traceback (most recent call last)"          # Python debug
🛡 Defend: Disable verbose errors and debug mode in production (custom error pages only). A phpinfo() page or a framework debug trace tells an attacker your exact versions, paths and extensions — remove them.

Exposed devices, cameras & dashboards

Internet-connected cameras, printers, NAS boxes and industrial panels announce themselves with recognisable titles and URL patterns — and far too many are indexed with no authentication. This is where dorking shades into device hunting (and where Shodan takes over — §9).

shell
# Network cameras (classic, still finds live feeds)
inurl:"viewerframe?mode="
intitle:"Live View / - AXIS"
inurl:/view/index.shtml

# Printers / NAS / panels
intitle:"HP LaserJet" inurl:hp/device
intitle:"Synology DiskStation"
intitle:"Router" intext:"login" inurl:8080
Viewing a live camera feed or device dashboard you don't own can itself cross legal lines — and acting on it certainly does. This is exactly where curiosity becomes unauthorised access.
🛡 Defend: Never expose device admin/streams directly to the internet. Put them behind a VPN, require authentication, and set noindex. If a device must be reachable, change defaults and restrict by IP.

The Google Hacking Database (GHDB)

You don't have to invent dorks from scratch. The Google Hacking Database (GHDB) on Exploit-DB is a curated, categorised library of thousands of working dorks, maintained for exactly this purpose — the canonical reference Johnny Long started and Offensive Security now hosts.

GHDB categoryWhat it collects
Files Containing PasswordsDorks that surface credentials in indexed files
Sensitive Directories"index of" and exposed-folder patterns
Vulnerable Servers / FilesSoftware with known issues, by fingerprint
Error MessagesVersion/path disclosure patterns
Pages Containing Login PortalsAdmin/auth surface
Network / Vulnerability DataExposed configs, devices, monitoring
Browse it at exploit-db.com/google-hacking-database. Each entry is a ready dork with a date and author. Treat it as a checklist to run against your own scope — and to audit your own exposure.

GitHub dorking — secrets in public code

The biggest secret-leak source today isn't Google — it's GitHub. Developers commit .env files, API keys, tokens and private keys, then push to public repos. GitHub's code search is a dorking engine of its own, with its own operators, and it searches the actual file contents of millions of repositories.

shell
# GitHub code-search syntax (search across public code)
path:.env DB_PASSWORD
"AKIA" language:JSON                       # AWS access key IDs start with AKIA
filename:.npmrc _auth                      # npm auth tokens
org:targetorg "api_key"                    # scoped to one org
path:**/credentials AWS_SECRET_ACCESS_KEY
"-----BEGIN PRIVATE KEY-----" language:PEM

# Pivot: also search commit history, gists, and forks — secrets removed from
# HEAD often survive in old commits. Tools: trufflehog, gitleaks, github-search.
trufflehog github --org=targetorg
A leaked cloud key from a public repo is one of the fastest paths to a full breach — automated scanners find them within minutes of a push, which is why providers now auto-scan and quarantine. Assume any secret ever committed is compromised.
🛡 Defend: Never commit secrets — use .gitignore for .env, environment variables / a secrets manager, and pre-commit secret scanning (gitleaks, trufflehog). If a secret was ever pushed, rotate it; removing the file does NOT remove it from history. Enable GitHub push protection / secret scanning on your org.

Beyond Google — Shodan, Censys & FOFA

Google indexes web pages. Shodan, Censys and FOFA index the internet itself — they scan every IP and port and catalogue the service banners, so they see databases, RDP, IoT and ICS that Google never crawls. This is the natural escalation from dorking: from "what's been published" to "what's listening."

shell
# Shodan (web UI or CLI: shodan search '<query>')
org:"Target Inc"                           # everything owned by an org
ssl.cert.subject.cn:target.com             # by TLS certificate name
http.title:"index of"                      # open directories, internet-wide
port:9200 product:Elasticsearch            # exposed Elasticsearch (often no auth)
product:MongoDB port:27017                 # exposed Mongo
port:3389                                  # RDP exposed
"230 Anonymous access granted"             # anonymous FTP

# Censys query language
services.tls.certificates.leaf_data.subject.common_name: target.com
services.service_name: ELASTICSEARCH

# Shodan CLI quickstart
shodan init <API_KEY>
shodan search --fields ip_str,port,org 'ssl.cert.subject.cn:target.com'
shodan host <IP>                           # full profile of one host
🛡 Defend: Find your own footprint first: search Shodan/Censys for your org name and certificate CN, and close anything that shouldn't be public (databases, RDP, management ports). Put services behind a VPN/allowlist and monitor for new exposures continuously.

Chaining OSINT — from one dork to a full map

Dorking is one lens. Real OSINT recon chains sources — each finding feeds the next, building a map of the target's surface no single tool produces. A typical pivot:

bash
# 1) Enumerate subdomains (passive)
subfinder -d target.com -silent | tee subs.txt
amass enum -passive -d target.com

# 2) Dork each host for exposures
site:*.target.com intitle:"index of" | inurl:admin | filetype:env

# 3) Pull historical URLs that may still resolve (forgotten endpoints)
gau target.com | tee urls.txt              # getallurls (Wayback + others)
katana -u https://target.com -jc

# 4) Harvest emails / hosts / metadata
theHarvester -d target.com -b all
# 5) GitHub for leaked code/secrets, Shodan for live services (as above)
The Wayback Machine (web.archive.org) is the de-facto replacement for the dead cache: operator — it holds historical snapshots of pages, including ones since taken down. Old snapshots frequently contain endpoints, parameters and content the live site has removed.

Automation & tooling

Running dorks by hand is fine for a few; automation scales them — carefully, because Google aggressively rate-limits and CAPTCHAs automated querying.

ToolUse
pagodo / dorks-eyeAutomate GHDB dorks against a target (throttle hard to avoid bans)
theHarvesterEmails, subdomains, hosts from many public sources
recon-ngModular OSINT framework (dorking, profiling, pivots)
Shodan / Censys CLIScriptable internet-wide service search + monitoring
gau / waybackurls / katanaHistorical & crawled URL discovery
trufflehog / gitleaksSecret scanning across GitHub/repos/history
Automating raw Google queries gets your IP CAPTCHA'd or blocked quickly, and hammering targets you find is no longer passive. Prefer the official APIs (Shodan/Censys), throttle everything, and stay strictly inside authorised scope.

Defence — keeping yourself out of the index

The whole offensive chapter is also your hardening checklist — because the best defence is to never be in the index in the first place. The realistic priorities:

ControlHow
Keep secrets out of web rootsNo .env, .git, backups, dumps in public paths; block dotfiles at the server
Disable directory listingApache Options -Indexes / Nginx autoindex off (kills "index of")
noindex on sensitive pagesX-Robots-Tag: noindex / meta — the only reliable way to stay out of the index
Don't expose admin/devicesVPN or IP allowlist for panels, dashboards, databases, management ports
Kill verbose errorsCustom error pages, debug off in prod (no phpinfo/stack traces)
Secret scanning + rotationgitleaks/trufflehog + push protection; rotate anything ever exposed
Remove what's already indexedGoogle Search Console "Removals", then fix the underlying exposure
A common myth: robots.txt does NOT keep pages out of search. It asks crawlers not to fetch them, but a disallowed URL can still be indexed if linked elsewhere — and your robots.txt itself becomes a map of the paths you most want hidden. Use noindex, not robots.txt, to stay out of the index.
🛡 Defend: Run the dorks from this article against your own domains quarterly — site:yourdomain.com plus the file-type and "index of" patterns. The fastest way to know what an attacker can find is to look first.

Closing — ask precisely

Google dorking endures because the failure it exploits never goes away: somebody, somewhere, publishes something they shouldn't, and a crawler finds it. The operators change (pour one out for cache:), the engines multiply (Shodan, Censys, GitHub code search), but the move is constant — ask precisely for what people forgot to lock down.

Two habits make you good at it. Offensively: layer operators until only signal remains, and always chain — a subdomain feeds a dork feeds a Shodan query feeds a GitHub search. Defensively: assume everything you put on a server will be indexed, and design so that even when it is, there's nothing worth finding.

Spend an hour dorking your own domain and a repo you own. The first time filetype:env or a GitHub secret-search lights up on something you forgot about, the value of this technique — and the urgency of the defences — stops being theoretical.

Reactions

Related Articles