Fuzzy matching, scored honestly
Anyone can report a dedupe accuracy number. The question is whether they tuned it on the data they're reporting on. I ran fuzzy matching on the Fodors–Zagat benchmark and scored it the honest way: threshold picked on train, precision/recall/F1 reported on a held-out test split.
How do you actually match messy records?
Block obvious non-matches, then score each candidate pair by weighted similarity across the fields that identify a record — name, address, city, phone. A pair above the threshold is a match. Simple; the honesty is in the evaluation, not the model.
Why the train/test split is the whole story
A threshold tuned on the same data you report on is a magic trick, not a measurement. Here the threshold (0.88) was chosen using only the training labels, then frozen. On the untouched test split: F1 0.9333, precision 0.913, recall 0.9545 — 2 false matches, 1 miss out of 189 pairs.
Is a high score bragging?
No — and saying so is the point. Fodors–Zagat is a well-separated benchmark; strong methods score near-perfect. A high number here reflects the dataset, not a claim that your CRM is this clean. The audit trail shows the actual matches, false positives, and misses.
Key takeaways
- Score against labels or you're guessing — dedupe accuracy needs ground truth.
- Tune on train, report on test — a number from the data you tuned on is meaningless.
- 0.9333 test F1, threshold 0.88 frozen from train — the method generalized.
- Publish the audit trail — real false positives beat a single accuracy figure.
Keep reading: How much does it cost to clean up messy data? and the full case study.
Read the full writeup → the case study
The newsletter
Receipts in your inbox.
Every build and post, as it ships. No fluff.