Case study / AI-assisted CRM cleanup
A 50,000-record CRM deduplication effort seems straightforward — just give it to the AI. That is, until you do the math. Fifty thousand records means almost 1.25 billion possible record pairs.
The standard approach is naive. It asks, “Which records are duplicates?” The more important question is, “Which records are confidently not duplicates?” Once you can identify that, the search space becomes instantly manageable.
This is the rigth way to use AI.
At a glance
For companies using Zoho CRM, duplicate contacts and accounts are rarely just a cosmetic problem. They affect sales follow-up, ownership, reporting, email history, Zoho Books sync assumptions, and confidence in the CRM. Native deduplication helps with obvious exact matches, but it does not solve the messy middle.
Duplicates quietly corrupt almost everything downstream. They double-count (or under-count) revenue in reports, split a customer's history across two records so nobody sees the full picture, route the same person to two sales reps, and produce the embarrassing “we already have you in our system” moment on support calls. The cost is rarely a single dramatic failure. It's the slow erosion of trust in the data — people stop believing the numbers, and start working around the CRM instead of through it.
Most teams handle it one of three ways: ignore it and hope it stays manageable, pay someone to scrub the data manually every few years, or buy a deduplication add-on that auto-merges on rules nobody fully understands — until it merges two records that should have stayed separate, and trust drops another notch.
Any record cleanup situation seems simple at first: just find the duplicates. It sounds reasonable, but that's the wrong starting point for messy data at this scale.
With 20,000 records, there are roughly 200 million possible pairs. At 50,000 records, it's 1.25 billion. Most of those pairs are plainly not duplicates. Asking AI to inspect the whole space is impractical. Asking fuzzy matching to identify duplicates directly is dangerous. We know from experience that it produces confident-looking merge suggestions that can collapse coworkers at the same company, different locations of the same business, or two legitimate records that share a phone number or address.
The real problem was not “can AI find duplicates?” The real problem was how to reduce a huge comparison space into a smaller, auditable review queue without throwing away the pairs that matter.
That changes the design. The first pass should not try to prove that two records are duplicates. The first pass should remove pairs that are very likely not duplicates. Then AI and human judgment can be used where they are actually useful: reviewing the uncertain middle.
AI is useful in this kind of work, but not as an oracle. The mistake is to treat deduplication as a single prompt: here is the CRM, tell us what to merge. That is exactly the kind of workflow that sounds impressive until it breaks trust in the data.
The safer use of AI is bounded. Let deterministic tools handle exact matches. Let code handle blocking and scoring. Let AI critique the design, identify edge cases, and review records after the candidate set has been reduced to a manageable size. Let a human control the merge.
The pattern we used here is unusual enough that it's worth describing. One AI was responsible for proposing logic — scoring weights, blocking strategies, the structure of the review categories. A second AI took the role of skeptical reviewer: challenging the proposed weights, looking for missed cases, and pushing on thresholds that looked too aggressive.
Several real design issues only surfaced because the second AI was specifically tasked with finding them. In one case, the original scoring weights would have confidently discarded pairs where a person had simply changed jobs — same name, new email, new phone, new company. The bug never reached production because the critique loop caught it before a single line of code was written.
The human kept architectural authority and arbitrated when the agents disagreed. That is what separates this kind of work from "vibe coding" — letting an AI produce code you ship without reading it. The discipline that made the difference here was working in pseudocode first: the logic was expressed in human-readable form before any code was written, and it was read and audited at key points along the way.
This is what good AI use usually looks like in real operations: not a silver bullet, but a disciplined process that combines human judgment with automation, critique, and auditability.
The workflow was designed as a four-phase process. The goal was not to automate merges. The goal was to make review possible without pretending the system had more certainty than it did.
Native Zoho CRM deduplication handles exact-match cases such as identical email or phone values. Those should be removed from the custom process before doing anything more expensive or interpretive. If two records match deterministically, there is no need to spend AI review effort on them.
The script generates candidate pairs using blocking strategies — surname phonetics, company tokens, zip/name combinations, email local parts, street-number-plus-zip, and other recall-oriented passes. It then scores the pairs using independent signal families: email, phone, name, address, and company.
The scoring direction is intentionally inverted from what people expect. Positive scores mean evidence the records are not duplicates. Negative scores mean evidence they are duplicates. A pair is discarded only when multiple independent positive signals agree and no strong duplicate signal contradicts them.
The output is not a merge file. It is a review queue. Each pair includes the score, category, blocking passes, and scoring contributors so the reviewer can see why the pair survived. That explanation layer matters. It keeps the review auditable instead of reducing the process to an unresolvable black box whose output must be accepted on faith.
The final merge remains controlled. For Zoho CRM environments connected to Zoho Books, the Books-synced record may need to be selected as the master to avoid breaking accounting relationships or downstream sync assumptions. That is not the kind of decision an AI score should make casually.
Consider an example: two records with the same first name, same email, same phone, same address, and same company — but different surnames. In CRM data, this pattern is most often a name change, not two different people. Naive fuzzy matching tends to discard pairs like this because the name comparison fails badly. The inverted-scoring approach catches it reliably: matching email, phone, address, and company together produce a strong duplicate signal that easily outweighs the surname mismatch.
Importantly, in our approach, these records are not automatically resolved. In this case, the record pair lands in the high-confidence-duplicate review category, exactly where it belongs. This is a significant point of departure from other methods, which would auto-merge this pair or miss it entirely rather than flag it for review.
Most record pairs are not duplicates. The fastest route to a useful workflow is to remove the obvious non-duplicates and preserve the ambiguous cases for review. This is less glamorous than the claims you'll find online, but it is more reliable.
Missing data should not be treated as evidence. If one record lacks a phone number, that does not mean the two records have different phone numbers. It means there is no phone comparison to make.
A different email address alone is not enough. People change companies. Phone numbers go stale. Addresses move. The discard category requires multiple independent non-duplicate signals and no strong duplicate signal.
AI was useful for challenging the scoring logic, identifying edge cases, and helping prepare mock records for testing. It also reviews the output of every code run, where it can readily identify obvious duplicates from those flagged by scoring. Only the fuzziest cases remain for human review. The AI is never given authority to merge records directly.
Every surviving pair carries its reasoning with it: which blocking pass found it, which scorers contributed, and why it landed in a given category. That makes review faster and makes the process defensible. This is especially important in regulated industries like healthcare or banking.
The project moved from an unbounded “find duplicates” problem to a controlled data-quality workflow.
A cleanup like this is normally a multi-week effort: weeks of manual spreadsheet work, or weeks of waiting for a consultant to grind through the records by hand. With this workflow, the heavy lifting collapses to an afternoon. The script reduces an almost 1.25-billion-pair theoretical comparison space to a manageable candidate set that can be scored, categorized, and reviewed in one focused sitting. The review output is typically measured in hundreds of pairs, making an easy pass for any first-class AI.
The more important value is trust. A reckless dedupe process can damage trust in the CRM faster than the original duplicate problem did. This design avoids pretending that AI has certainty where it does not. It uses AI to improve the process while keeping the dangerous action — merging records — under human control.
This process combines established deduplication techniques into a practical, human-led CRM cleanup workflow that is unusually well-suited to small and midsize businesses: inexpensive, explainable, cautious about merge risk, and designed to reduce review volume before asking AI or humans to judge ambiguous records.
We searched for comparable CRM deduplication workflows and did not find another example with this structure — especially the use of inverted scoring to identify likely non-duplicates first. That inversion is the key move. It makes the process fast enough to be practical while elevating the riskiest merge decisions to a human reviewer.
Assuming the scoring script is already in place, here is what the actual cleanup looks like from start to finish. The work is sequential but most steps take minutes. The one step that takes real time is the one that should: human review of the pairs the system couldn't resolve.
| Step | What happens | What you see |
|---|---|---|
| 1 | Take a full export of the CRM module before doing anything else. | A timestamped backup file stashed somewhere safe. If anything goes sideways later, this is what you restore from. |
| 2 | Run your CRM's native deduplication to clear the obvious exact matches. | A small number of automatic merges resolved by the platform itself, no judgment required. |
| 3 | Export the remaining records to a CSV file. | One spreadsheet with thousands of rows and the standard contact fields. |
| 4 | Verify that the column headers in the export match what the script expects. | Either a clean match — proceed — or a one-line adjustment to the script's column mapping so the right fields feed into the right scorers. (If you need to, ask an AI to help you here, including how to install and run a Python script.) |
| 5 | Run the scoring script against the file. First time on a new dataset, run it against a 50-row slice first to confirm the categories look sensible. | A short terminal summary showing how many candidate pairs were generated and how many landed in each category. |
| 6 | Send the resulting review queue to an AI for first-pass analysis. | The AI reads each pair along with its score and reasoning, then adds a recommendation column: merge, keep separate, or needs human judgment. The hardest calls float to the top. |
| 7 | Open the AI-annotated queue in Excel and focus on the pairs the AI flagged as needing human judgment, spot-checking its other calls along the way. | Hundreds, not thousands, of pairs. This is where the time goes — but only on the cases that genuinely need it. The clear merges and clear non-duplicates are already labeled. |
| 8 | Make the final calls and mark each pair: merge, keep separate, or flag for the client. | Your decisions in a final column, with the full audit trail behind them: blocking pass, scoring contributors, AI reasoning, your call. |
| 9 | Apply the confirmed merges in the CRM, preserving Books-linked records as master where applicable, then sanity-check that reports tie and downstream connections still resolve. | Getting results into CRM is its own effort. For anything more than a few dozen duplicates, you'll want to use the merge records API, which is out of scope for this case study, partly because of length but also because nothing new is proposed. Follow Zoho's documented procedures. The most important point is that a dry run is a necessity in any scenario. Revert to the backup from step 1 if anything looks off. |
The pattern is deliberate: the machine does the cheap work (eliminating obvious non-duplicates), and the human spends their time only on the pairs where judgment actually matters.
The Python script behind this workflow is available for download. It is the same scoring pipeline described above, written for ordinary CRM contact/account exports and designed to be adjusted to match your field names.
Requires Python 3.9 or newer. Installation and setup take a few minutes on Windows or macOS. If you'd rather have it run against your data without setting it up yourself, that's exactly the kind of engagement we do.
Are you in need of practical, human-led AI solutions? Let's talk.
Start a Conversation