Ultimate Salesforce Duplicate Management Guide (Best practices)
Note: At the time of writing this we are still in process of being accepted into AppExchange. This post jumps ahead to show you what can be done with Dedupely.
Duplicates in Salesforce prove to be a challenge resulting in costly issues. In this guide I'll cover key aspects of duplicate issues. Then I'll cover strategies and duplicate management best practices.
Chances are your team has endless headaches with duplicates. This guide is a plan on how to attack this problem.
Table of Contents:
- Salesforce Duplicates 101
- Step One: De-dupe existing duplicated Salesforce records
- Finding duplicates in Salesforce
- Start merging one-by-one
- Using merge rules to save time
- Set your master records
- When should I bulk merge or turn on auto merge?
- Audit your merges
- “Help! I’m still seeing duplicates”
- Best practices to prevent merge disasters
- Pro-tip: Use SigParser to enrich your data
- Initial de-dupe start to finish
- Step Two: Preventing Duplicate record entry
- Step Three: Automate
- Budgeting your Salesforce De-dupe
- In-house vs. AppExchange Solutions
Salesforce Duplicates 101
How duplicates enter Salesforce
Based on some research we learned that duplicates enter Salesforce the following ways:
- Imported data.
- Third-party integrations that sync or generate new records.
- Web forms with insufficient checks.
- In-house API usage that needs tweaking to make sure it’s not creating dupes.
Why are Salesforce duplicates costly and who should worry?
Everyone in the company that cares about the customer and the brand.
The following most likely have to keep an eye on data quality:
- Sales reps.
- Sales managers.
- Sales and Marketing VPs and CMOs.
Why? Data duplication issues costs businesses in various ways:
- Duplicate leads means salespeople end up contacting the same person.
- Call records and notes go missing.
- Employees can waste up to 50% of their time on mundane data quality tasks (MITSloan).
- Prevention costs $1, correction costs $10 and leaving duplicates cost up to $100 (SiriusDecisions).
- Bad data affects lead generation, marketing, finance and customer relationships.
A study on lead data health shows that 15% of company sales leads are duplicated. Another study shows that companies can recover up to 70% of their revenue just on clean data alone.
Everyone in the company should be concerned about data quality and duplicate lead data. It affects customer relationships which in turn affect our jobs. In the end it’s bad for everyone.
Should I worry about duplicates today?
The best answer is: no, yesterday! The second best answer is, right now.
You might say, “our next newsletter isn’t until next month, we can start fixing duplicates then…”
Waiting to the last minute to take care of your duplicates is the perfect recipe for disaster. I’ve seen customers come to us time and time again needing 300,000+ records de-duped right now. This can be very dangerous. You’re risking your data and rushing a process that takes time and tweaking to get right.
Don’t wait until your next deadline to get serious about fixing your duplicate issues. Start a few weeks ahead of time, at least. This will give you a decent head start.
What Salesforce does to reduce duplicates
Salesforce being a enterprise level CRM gives you a few native options to reduce duplicate entries.
Preventing sales reps from creating duplicates
You can create rules in Salesforce to alert the sales rep when they are about to enter a duplicate record.
You can learn more about how to create duplicate rules at the Stop Users from Creating Duplicate Records page.
Showing potential duplicate records (de-dupe on the fly)
When viewing records in Salesforce you’ll see duplicate detection hints inside the profile. This helps your team de-dupe records as they see them.
To learn how to set this up go to the Show Duplicate Records in Lightening Experience page.
Running duplicate jobs
Duplicate jobs can be set up to run and find duplicate matches globally across your Salesforce instance.
This creates bulk finding and rules on how to handle them once found.
Duplicate jobs can be learned about in greater detail here.
Native duplicate management in Salesforce is quite extensive and all areas are covered in this document.
Step One: De-dupe existing duplicated Salesforce records
The initial cleanup consists of doing a bit of homework to find which types of duplicates you have. In this initial clean you’ll do the preventative work and research in order to repeat this step or set it to run automatically.
Sign up for Dedupe.ly
Go to the Salesforce integration page and sign up for Dedupely. Once there, you’ll be asked to sign in using your Salesforce account. After that, you’ll be asked which objects you want to sync. Select the objects to sync and click “Download” and you’ll be on your way.
Finding duplicates
Start by selecting the fields you want to match. You can start by using the most basic fields (first and last names) and adding more fields later, narrowing down the possibilities.
Once the scan is finish it’s time to review the duplicates. Start by scrolling over the duplicate list looking for any inconsistencies in the matches.
Discovering incorrect matches will show you how much confidence you can put in your match settings. This just means you need to tweak the setting before any bulk merges until you see no incorrect matches.
Start merging one-by-one
One-by-one merging at the beginning will give you a deep level of understanding as to which details need to be addressed in the merges.
Custom merging gives you control over the field values that win the merge and over the master contact.
Using merge rules to save time
Merge rules automate the custom merging field value selection. You can set merge rules to decide automatically which value should win based on its value, its attributes or associated attributes of the record.
This will save you a lot of time of custom merging and is invaluable in bulk and automatic merging.
Set your master records
Master records–the record that is kept after the merge–and the slave records–the records that get deleted–can be also be set in merge rules by setting the ID that is kept.
When should I bulk merge or turn on auto merge?
This depends entirely on how confident you are in your matches. If you’ve noticed incorrect matches, there’s a good chance there are more–that you won’t notice further down the line–therefor you want to avoid using match settings that have a high risk for bad merges.
Keep creating different match settings that leave nearly zero room for error. For example, first name, last name and email should be very safe. The chances of someone having the exact same name and email address is pretty slim to none.
Once you’re confident in your match, then start merging in bulk and running automations.
Audit your merges
Your merge history gives you a brief overview of the merged records. Also check Salesforce and hit “refresh” in the browser to see the latest updates to your merged data. If everything looks good, you’re well on your way to a duplicate free Salesforce instance!
“Help! I’m still seeing duplicates”
This is normal. You can’t get every dupe on the first round. At this point we start digging deeper. First and easiest way is to play around with the match settings.
By default Dedupely always adds “Similar match” to pre-selected text fields. Each type of field matcher will give you the ability to fine-tune your matches.
Changing this setting can help you find new matches. Each one can be used for finding different types of duplicates based on that field:
- Exact match is the strictest match type. It ignores uppercase and lowercase but must match the text exactly.
- Similar match is also fairly strict but ignores punctuation marks among other commonalities that deem removable.
- Match first similar word/match last similar word both work to match the first or last words. This can work well for company names or first names that have the middle name added.
- Fuzzy match is the most aggressive match type and works roughly on sound-alike matches. However, this is the most inaccurate match type and should never be relied on to produce correct matches.
After this, take a look at the prefixes and suffixes in your data. You can improve matching by ignoring common terms. By adding ignored terms you reduce the extra noise preventing proper matching.
Best practices to prevent merge disasters
Yes, disasters can happen. They are entirely preventable and with the following best practices you don’t ever have to worry about data being mangled:
- Always have backups of your data handy. Create a special backup before large bulk merges.
- Always test and audit your matching setup. It’s easy to make a mistake by rushing. Dedupely does everything possible to prevent users from making simple mistakes. However, always closely review before bulk merges and audit the changes after the merge.
- Be aware of how and when your data evolves. Adapt your match setup and merge rules accordingly to avoid collisions with changes in your Salesforce data.
- Be in-tune with errors and nuances in your data. Having defaults like “000-0000” in a phone number which would match all records with blank phone numbers. To a computer non-blank defaults don’t look blank. Dedupely does try to make sure each input passes validation of what it’s suppose to look like.
- Never automate merges that you haven’t tested and reviewed. Never run bulk merges until you’ve looked over a good portion of the duplicates and are confident of the results.
- Take attribution into account. Decide how lead and contact owners are preserved through merge rules. Sam’s lead might become Jack’s lead because Jack has a duplicate of Sam’s lead. Who wins the ownership in this case?
Pro Tip: Use SigParser to enrich your data
Why not kill two birds with one stone? Enrich your data while making it easier to de-dupe. The more unique data you have the easier it is to de-dupe. More relevant data helps boost efforts in sales, financial reporting, biz dev and lead scoring.
I find that SigParser does a great job of this. Not only are they an enrichment software, they source prospect data from signatures already in your email archives. It's also super affordable and you can't beat the price anywhere.
The signature data found in your email is scraped and added to Salesforce as contacts and leads. Later you can use Dedupely to clean the duplicates and better match newly enriched records than you could before.
Initial de-dupe start to finish
The initial de-dupe can take anywhere from a day to two weeks. It really depends on the shape of your data and how much time is dedicated to the task to completion.
The initial cleanup will give you a close look at your duplicate data and how fields should be preserved.
You can learn more about how to use Dedupely here in our support center.
Step Two: Preventing duplicate record entry
Prevention is the least costly step of them all. There are a number of Salesforce methods to preventing manual duplicate entry.
At the time of writing this article Dedupely doesn’t have a Salesforce prevention tool. However there are a few others you can look into in the meantime by going to AppExchange.
Review the common duplicate sources
Knowing where your duplicates are coming from will help you find solutions to preventing them from entering.
If your development team uses Salesforce APIs or builds custom solutions for Salesforce, consult with them on the possibility of their code causing duplicates and what can be done to fix that.
Check on any types of third-party apps that create new records. Web to lead form integrations, integrations that synchronize records across platforms without duplicate checks.
Educating your team on data etiquette
Each team member who enters or modifies data in Salesforce should be schooled on the importance of data etiquette. Your team should know how to handle each of the following:
- How data should be formatted
- How to enter a record without duplicating it
- The tools they need to use to de-dupe or find a duplicate record
- How to avoid incorrect sales rep attribution
If your team understands how to protect and maintain your data as it enters Salesforce you cut out a large portion of data incorrectness.
Step Three: Automate
Once you’ve gotten past the hump of step one and the tinkering of step two it’s now time to automate.
Prevention is the best way to solving most your duplicate problems (and maybe all problems). However dupes are inevitable and automation will pay dividends over the days, weeks, months and years your team is interacting with Salesforce. They will thank you as well!
Use auto merge to pick up daily duplicates
Auto merge will help you avoid the work of merging obvious duplicates.
That said, you will have to put in some manual reviewing week to week. Dedupely will still alert you on non-auto merge match settings that are catching duplicates. You can then review them and make a decision on the individual dupes. If you see that each week these duplicates are obvious catches you can also automate those match settings.
With the help of your team and some preventative measures you’ll stay on top of your data.
Budgeting your Salesforce De-dupe
How de-dupes are priced
Based on the vendors I’ve seen out there I can tell you your record count is the largest factor in pricing.
Smaller record bases of less than 30,000-50,000 will start at around $500/yr.
Larger record bases 150,000+ can cost anywhere from $2,000 to $20,000 one off or per year.
Costs are also influenced by:
- the level of customization of your Salesforce instance
- consulting bases instead of self-serve
- the amount of time required to complete the de-dupe
Each data cleansing company handles their pricing a tiny bit different. The best way to get an idea is to contact them directly and handle it one on one.
Realistic time frames for initial de-dupe
Just as pricing is based on amount of records, so is the amount of time you should take into consideration.
Smaller Salesforce databases will open take an hour or more to be cleaned up. Larger databases can take days or weeks to complete. The amount of data, the extra care that needs to be taken and the variations of duplicates to be caught are much more vast in a larger database.
The best way to avoid last minute panic is to de-dupe your Salesforce instance weeks ahead of time, not the days or hours ahead of time. You want to take a few factors into consideration when it comes to deadlines:
- The speed of the company you contract to do your de-dupe.
- Extra customization, testing and care needed to avoid data loss and errors.
- Nuances and errors that need to be fixed in your existing data.
- Extra time needed to sync and move data around.
To name a few. The worst ones are the unexpected issues that botch your newsletter or launch announcement deadline. Be safe, do it ahead of time.
Who should be involved in the de-dupe?
Your data guy/gal, sales managers and anyone who needs to get their input on the data and how it’s modified.
How do I calculate the ROI of de-duping Salesforce?
There are a number of studies showing how duplicate data causes some heavy damage to your company. However that may not be enough to justify the spend on data cleansing.
A simple way to calculate the ROI would be to calculate the lost hours reps spend un-mangling, finding duplicates and merging their data. You can measure the amount of time it takes to find one group of duplicates and merge them by hand. Then ask your reps how many times a day/week they have to fix duplicates. Multiply those together and you have an hour figure over a certain period of time. How much are those hours costing your company instead of benefiting it?
The calculation could look like this:
((Number of duplicates merged by reps daily) * (minutes to merge a duplicate / 60) = (lost hours)) * (hourly base pay)
Let’s test some fictional numbers. Our reps merge up to 20 duplicates a day, on average it takes 4 minutes to merge each one.
(20 * 4 minutes = 80 minutes) * 30.00/hr = $39 per day
Then take into account the number of reps in the team (let’s say 3) and you spend anywhere from $1170 to $3510 a month on hours dedicated to merging duplicate data.
You can factor in lost opportunities and other estimations. However if this calculation doesn’t appear significant you probably don’t have a duplicate problem. If you do it should be pretty easy to convince your team and peers to get on board with you in your de-dupe efforts.
NOTE: Obviously I’m no mathematician. This is a rough estimate. This is just to give an idea on how to estimate the financial loss occurred by duplicate data.
In-house vs. AppExchange Solutions
We’ve all been in the DIY rabbit hole where, in the end, we just threw in the towel and paid someone else to do it.
Then there are scenarios where in-house just makes so much sense.
What are the options we have for effective de-duping?
API solution built by an in-house developer
You have developers. So why not put them to work on a custom solution that fits all of the requirements? Not so fast!
Some developers love writing their own apps. Believe me, at one point I would have designed our entire software stack–and nearly did–before I was reminded of the true costs of in-house.
Your developers are here to build the product or service you offer. Developers are also not cheap to pay. The initial build will take 1) time of initial development 2) time of bug fixes and specs back and forth 3) maintenance and updates over the years 4) costs of hosting and running.
The hours invested into this add up over time, eventually not being effective. Not to mention you’ll have to wait a good while before this runs smoothly.
Salesforce Native Duplicate Management
Salesforce sets itself apart from the rest by having native deduplication management.
Cloudingo wrote a fantastic article on the shortcomings of Salesforce’s native duplicate management and how it’s actually limited in a number of ways.
- Not enough ways to match fields, we’re stuck with exact and fuzzy.
- Only five match rules can be created and used at a time
- Duplicate tasks only work for top-tiered Salesforce editions
- No automatic merging, mas merging
- Limited to how many records can be grouped into each match (3)
- Overall somewhat cumbersome
The Salesforce native duplicate management, while a cut above other CRMs, still misses the mark in terms of time saving duplicate management. While it does the job it still doesn’t completely solve the problem in a significant way.
“Heck, it’s the sales team’s job!”
When was the last time “data cleansing and cleaning up duplicates” was in the job description of a sales rep? Do any of your sales reps think their job is cleaning up bad data?
Training personnel to clean up after themselves is pretty basic. Everyone should actively clean up the data as they enter it into Salesforce. It’s just proper etiquette not to mention respect for fellow team members.
However if you’re not aiding sales in gathering, handling and cleaning data you’re making their lives harder, which in turn make sales harder and mean less sales for your company. By “aiding” I mean providing them with the proper tools to do their job.
Why AppExchange solutions are a no-brainer
There are dozens of de-dupe apps on the Salesforce AppExchange. Some of them are not so great and others are pretty awesome. These apps are built by companies that dedicate large amounts of resources on building solutions to save you insane amounts of time and money. They are more affordable and give a higher return than the above alternatives.
You’ll save time on DIY hacks that end in misery, heavy projects that distract your team from serving your customer and overcomplicated native features that don’t really cut it.
What to look for in a de-dupe provider
I’ve done a lot of reviews of the pros and cons of all of the existing AppExchange apps. There are a lot right now. Some free, some don’t list prices. Either way you can’t go wrong with most of them.
If you’re going with a consultancy make sure they understand your specific needs. Don’t take anything for granted. They might not understand what X and Y field are for and not think they’re important. Tell them everything you can and ask as many questions as possible. In most cases they will be very in tune with what you need.
Self-serve solutions will give your team full control over the process. Although there is a learning curve to be expected and be prepared to dig through a bit of documentation and videos.
The software you choose should have the following features:
- Field matching options and ability to ignore common terms, set exact, fuzzy or sound-alike.
- Allow you to merge in bulk, customize merges and automate merges once you’ve done the homework and are confident in your settings.
- Provides you with merge rules that cover all your master, attribution and custom fields needs.
- Keeps you up to date on new duplicates that enter your Salesforce data.
- Has extremely responsive support
This is what I’m leading Dedupely to accomplish. If self-serve doesn’t work for you, our team will pitch in at no extra cost. Take a closer look at the Salesforce Dedupely connector here.