f in x
OSINT Reconnaissance for Penetration Testing: Advanced Information Gathering Techniques
> cd .. / HUB_EDITORIALE
Sicurezza Informatica

OSINT Reconnaissance for Penetration Testing: Advanced Information Gathering Techniques

[2026-06-15] Author: Ing. Calogero Bono

You have a target to test. Before you write a single line of code or launch a scan, you need to know everything about it. Name, emails, domains, suppliers, technologies, people. Without this phase, your penetration test is a shot in the dark. At Meteora Web, we have seen companies with sensitive data exposed on public boards, expired SSL certificates, and forgotten subdomains. OSINT reconnaissance is the Swiss army knife of any ethical hacker. In this guide we go beyond simple Google dorks: we will cover tools, techniques, and automation to gather information systematically, legally, and operationally.

OSINT is not just Google: the full spectrum of public sources

OSINT (Open Source Intelligence) means exploiting publicly accessible data. But be careful: not everything public is easy to find. You need to look at domains, search engines, social networks, leak databases, SSL certificates, WHOIS records, public APIs. We always start with one question: what could an attacker know about you without ever touching a firewall?

The digital perimeter of the target

First step: list all domains and subdomains. Not just the main site. Tools like theHarvester and Sublist3r give you an initial list. But the real value comes when you cross-reference data with SSL certificates (Certificate Transparency logs).

Sponsored Protocol

# Install theHarvester
pip install theHarvester

# Basic domain search
python3 theHarvester -d example.com -b google

Action now: take your domain (or a test one) and run the command above. How many subdomains appear? Compare with known ones. If you find something your client didn't know about, you've already won the first battle.

WHOIS and registries: not just expiration dates

WHOIS reveals the registrant's name and contacts, DNS servers, creation dates. An attacker can use it for social engineering. We have seen companies with personal phone numbers in WHOIS records. Use whois from terminal or online services like WhoisXMLAPI. Caution: many registrars offer privacy protection, but often forget to apply it to secondary domains.

whois example.com

Action now: check at least 3 domains in your perimeter (including .com, .net, .org) and see if any personal data is exposed.

OSINT on people: emails, social, and work profiles

Gathering information on individuals is sensitive and must respect privacy laws. But during an authorized penetration test, knowing employee email addresses allows you to test password spray, targeted phishing, and more. We always start with LinkedIn and public sources.

Sponsored Protocol

Email harvesting with theHarvester and Hunter.io

In addition to Google, theHarvester supports sources like LinkedIn, Yahoo, Bing, PGP key servers. Hunter.io provides an API to find emails associated with a domain. Beware of false positives: always verify with holehe if the email exists on known services.

# Install holehe
pip install holehe

# Verify email
holehe email@example.com

Action now: choose an authorized target domain, extract emails with theHarvester -b linkedin, then verify the most likely one with holehe. Record which services the email is present on (important for credential stuffing attacks).

Advanced Google Dorking for people

Dorking isn't just for vulnerable files. You can search for public profiles with site:linkedin.com "Company" "Role" or PDF documents with filetype:pdf "example.com". Powerful combos: intext:"password" site:example.com (if you find clear-text passwords, you have a critical flaw).

Minimum dorks to try:

  • site:example.com ext:log – exposed log files
  • site:example.com intitle:"index of" – directory listing
  • "example.com" "confidential" filetype:pdf – sensitive documents

Infrastructure and technologies: Shodan, Censys, Certificate Transparency

You don't need server access to know what software is running. Shodan indexes banners of exposed services. Censys does the same for hosts. SSL certificate logs (CT) reveal subdomains and issue dates. An attacker looks for outdated Apache, Nginx, OpenSSH versions. At Meteora Web, we use this data to anticipate vulnerabilities.

Sponsored Protocol

Shodan: find your target on the network

# Shodan filter by domain
shodan search hostname:example.com

# Search by specific port
shodan search "port:443 hostname:example.com"

Action now: (with a free API key) search your target on Shodan. How many services are exposed? Any open databases (MongoDB, Elasticsearch)? Immediately report the most critical ones.

Certificate Transparency: the subdomain goldmine

crt.sh is the database of all issued certificates. A single query gives you unknown subdomains.

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq '.[].name_value' | sort -u

Action now: run that curl against your target domain. How many subdomains do you get? Compare with Sublist3r. The difference is the “hidden” subdomains.

Automation with OSINT frameworks: Recon-ng and Maltego

Doing everything by hand is inefficient. A serious penetration test requires automation and correlation. Recon-ng is a modular OSINT framework. Modules range from WHOIS to Google+, Shodan to Have I Been Pwned. Maltego is visual: it creates graphs of relationships between domains, emails, people. We prefer Recon-ng because it's scriptable and integrates with our workflow.

Sponsored Protocol

Recon-ng workspace example

# Launch Recon-ng
recon-ng

# Create workspace
workspaces create pentest_target

# Load module
modules load recon/domains-hosts/google_site_web
# Set source
set SOURCE example.com
# Run
run

Action now: install Recon-ng, create a workspace for a test target, and run at least 5 collection modules (hosts, contacts, emails). Export the report in HTML.

Operational OSINT: how not to burn the reconnaissance phase

Anecdote: once, a client had an exposed Jenkins server with default credentials. We found it in 10 minutes with Shodan. But if the penetration tester hadn't been authorized, he could have gained access immediately. Golden rules:

  • Never test on a target without written authorization.
  • Use a dedicated VM or isolated container to avoid contaminating your tracks.
  • Do not download sensitive files (if you find a database dump, stop and report).
  • Document every step: for the final report you need reproducible evidence.

Tools you should always have at hand

ToolUse
theHarvesterEmails and subdomains
Sublist3rSubdomains
ShodanExposed services
crt.shSSL certificates
Recon-ngModular framework
holeheEmail presence on services
Google dork queriesIndexed data

What to do now – Operational checklist

  1. Define the perimeter: main domains, subdomains, known IPs.
  2. Collect WHOIS and certificates: use crt.sh and whois for every domain.
  3. Extract emails and users: theHarvester + holehe.
  4. Scan public services: Shodan with hostname filter.
  5. Automate: create a bash script that unifies all commands into a single report.
  6. Check Google dorks: at least 5 specific queries for the target sector.
  7. Document everything: screenshots, commands, JSON output. Don't trust your memory.

OSINT reconnaissance is what separates a amateur penetration test from a professional one. At Meteora Web, we place it at the heart of every assessment. If you want to dive deeper into the full ethical hacking cycle, read our definitive pillar guide. Remember: an ethical hacker's skill is measured by how much information they can gather without ever touching the target.

Ing. Calogero Bono

> AUTHOR_EXTRACTED

Ing. Calogero Bono

Ingegnere Informatico, co-fondatore di Meteora Web. Esperto in architetture software, sicurezza informatica e sviluppo sistemi scalabili.
[ Read Full Dossier ]

> METEORA_WEB // DIGITAL AGENCY

We build the digital presence your business deserves.

Websites, social media, online advertising, e-commerce and high-performance hosting, engineered with method by computer engineers in Sciacca, for all of Italy.

> MW_JOURNAL

> READ_ALL()