Building an OSINT Engine for Typosquatting Detection
Detecting typosquatting domains is one of the foundational capabilities of any brand protection platform. Here’s how BrandGuard approaches it.
The Attack Surface
When a threat actor wants to impersonate your brand online, they have several techniques at their disposal:
- Typosquatting:
stryxintel.com→stryxintel.com(missing character) - Homoglyph substitution:
stryxintel.com→strуxintel.com(Cyrillic ‘у’ instead of Latin ‘u’) - TLD hopping:
stryxintel.com→stryxintel.co,.org,.xyz - Combosquatting:
stryxintel-security.com,stryxintel-login.com - Bitsquatting: domains that differ by a single bit error
The OSINT Pipeline
Our detection engine runs through several stages:
1. Name Generation
We generate candidate domain mutations using algorithmic transformations:
def generate_mutations(brand: str) -> list[str]:
mutations = set()
# Missing character
for i in range(len(brand)):
mutations.add(brand[:i] + brand[i+1:])
# Double character
for i in range(len(brand)):
mutations.add(brand[:i] + brand[i] + brand[i] + brand[i+1:])
# Homoglyph substitution
homoglyphs = {
'a': ['а', 'à', 'á', 'ä', 'α'],
'e': ['е', 'è', 'é', 'ë', 'ε'],
'i': ['і', 'ì', 'í', 'ï', 'ι'],
'o': ['о', 'ò', 'ó', 'ö', 'ο'],
'u': ['υ', 'ù', 'ú', 'ü', 'μ'],
'c': ['с', 'ç', 'ς'],
'y': ['у', 'ÿ'],
}
# ... substitution logic ...
return list(mutations)
2. DNS Resolution
For each candidate domain, we check:
- DNS A/AAAA records (is the domain registered and resolving?)
- WHOIS registration data (creation date, registrar, registrant)
- SSL certificate issuance (has someone put a cert on it?)
3. Threat Scoring
Each live domain gets a threat score based on:
- Age: Domains registered in the last 30 days are prioritized
- Content: Is it serving a login page? A phishing kit?
- Association: Does the IP host other suspicious domains?
- HTTP response: 200 vs 404 vs redirect
The Free vs Premium Split
The free BrandGuard check returns a partial report — the top 3 most suspicious findings and an overall risk score. The premium subscription unlocks:
- Full daily monitoring across all TLDs
- Historical tracking of domain registrations
- Automated alerts via email
- PDF executive summaries
This model lets us deliver immediate value while keeping the core infrastructure sustainable.
What’s Next
We’re currently integrating social media monitoring (fake LinkedIn/X profiles) and expanding our homoglyph database to cover Unicode ranges from Cyrillic to Hangul.
The full engine runs as an async worker每天早上 via Celery, processing thousands of domains per scan cycle. At our current scale, a complete scan finishes in under 90 seconds per client.
Stay tuned for the next post on email security auditing with SPF/DKIM/DMARC analysis.