engineeringNov 30, 20252 min read

Engineering Notes: Entity Resolution in Noisy Public Data

A technical look at how we resolve identities across messy public data while keeping false matches low and auditability high.

ODIN Engineering

Engineering Notes: Entity Resolution in Noisy Public Data

Entity resolution sounds simple until you try it on real public data. Names change, locations drift, and the same person can appear in half a dozen formats across a single month. If you treat every match as equal, you end up with confident mistakes.

Direct answer (40-60 words): Reliable entity resolution combines conservative matching with explicit uncertainty. We score overlaps across identifiers, penalize conflicts, and keep the evidence trail visible. That approach reduces false positives while still surfacing likely matches that analysts can verify quickly.

The three signals we trust most

We weight signals differently based on stability. A professional license number is strong; a self-reported job title is weak. This ranking keeps the system honest.

How we keep the model auditable

Every match is tied to a set of source features, not just a probability. That means an analyst can see why the system suggested a match and decide whether it belongs in the case file.

Where the trade-offs live

Precision wins when decisions are high-stakes, but it can lower recall. We bias toward fewer, cleaner matches and rely on OSINT workflows to expand from there.

If a match can't be explained, it doesn't ship. Transparency beats complexity in real investigations.

Why this matters in practice

If your workflow depends on trust, your data pipeline should earn it. That's the lens we use when we build ODIN, and it's why our system prefers clear evidence over clever shortcuts.

#entity-resolution #data-quality #engineering #ml