Open Menu


Ultimate HubSpot Deduplication Guide

Clinton Skakun on May 9 2020

HubSpot deduplication has evolved over the past few years. HubSpot users didn't have much to work with before 2018 before we launched Dedupely <> HubSpot. Soon after HubSpot launched their own AI deduplication tool. Now there are a few options but pitfalls still exist.

This is a complete guide to duplicate management in HubSpot. I've done my best to cover every area with as much detail as possible. The end goal is to get rid of duplicate contacts, companies and so on in your HubSpot lists so your team can have sales-ready interactions.

Table of Contents

  1. HubSpot Duplicates 101
    1. How duplicate contacts enter HubSpot
    2. Why HubSpot duplicates are costly and who should worry?
    3. Should I worry about deduplicating HubSpot today?
    4. What HubSpot currently does to prevent duplicates
      1. Preventing duplicates with emails and company domain name
      2. (Easily) Import de-dupe in HubSpot
      3. Using the HubSpot deduplication tool (for HubSpot Pro users)
  2. Step One: Finding and merging existing duplicates in HubSpot
    1. Addressing merge risk
    2. Best practices to prevent merge disasters
    3. Finding duplicates in HubSpot
    4. Start merging duplicates one-by-one
    5. Using merge rules to save time
    6. Set the master record
    7. Audit your merged HubSpot records
    8. “Help! I’m still seeing duplicates in HubSpot”
    9. How long should the initial HubSpot de-dupe take?
  3. Step Two: Preventing duplicate record entry in HubSpot
    1. Review the common duplicate sources
    2. Educating your team on data etiquette
  4. Step Three: Automate
    1. Use auto merge to pick up daily duplicates
  5. Budgeting your HubSpot De-dupe
    1. How de-dupes are priced
    2. Realistic time frames for initial HubSpot de-dupe
    3. Who should be involved in the de-dupe?
    4. How do I calculate the ROI of de-duping HubSpot?
  6. In-house vs. HubSpot Marketplace Solutions
    1. API solution built by an in-house developer
    2. HubSpot Native Duplicate Management
    3. “Heck, it’s the sales team’s job!”
    4. Why HubSpot Marketplace solutions are a no-brainer
    5. And no! HubSpot doesn’t earn kick-back from their integration providers, as some might have you believe.
    6. What to look for in a de-dupe provider

HubSpot Duplicates 101

How duplicate contacts enter HubSpot

According to our experience, our research and listening to our customers, these are the following ways duplicates enter HubSpot:

  • Manually creating new records by hand without the proper checks beforehand.
  • Improperly imported records or importing the same records more than once without including emails (the import de-dupe key) in the CSVs.
  • Third-party integrations that don't run checks for existing contacts before creating new records.
  • Web-form-2-lead setups that just insert and don't properly check for existing records.
  • Incorrect in-house API implementations.

At the end of the day, HubSpot is as susceptible to duplicates as any spreadsheet, database or other CRM or record-based system.

Why HubSpot duplicates are costly and who should worry?

Everyone on your team should be aware of why duplicate data is hurting the company's bottom line. Not only does it hurt your company's wallet, but also the brand.

You should be worried about data quality if you work in:

  • Sales reps
  • Marketing or sales managers
  • Sales and Marketing VPs, CMOs, Marketing Directors ...

Data duplication is costing your business in the following ways:

  • Makes sales reps have to wade through more data, contacting the same person twice by accident, wasting time researching and preparing for calls and follow-ups already made.
  • Skews valuable numbers from attribution reports to financial accounts.
  • Employees can waste up to 50% of their time on mundane data quality tasks (MITSloan).
  • Prevention costs $1, correction costs $10 and leaving duplicates cost up to $100 (SiriusDecisions).
  • Hurts your campaign performance, brand and customer relationships.

Should I worry about deduplicating HubSpot today?

Like said in the list above, prevention and correction cost 10 times less than leaving it. The sooner you hedge your investment the less money goes out the window.

Some data should be duplicated. This can tell us how many return customers we have or repeat orders.

Data decay is like smoking, a dirty house or weight gain. The sooner you do it the more of a gain you get.

study on lead data health shows that 15% of company sales leads are duplicated. Another study shows that companies can recover up to 70% of their revenue just on clean data alone.

Don't wait until you have three days before your next campaign to de-dupe your database. The time is today.

What HubSpot currently does to prevent duplicates

Being such a common problem in data entry, HubSpot already has a few safeguards in place to prevent duplicates.

Preventing duplicates with emails and company domain name

HubSpot doesn't allow you to add two contacts with the same email. Yes! This means you have fewer duplicates long term. This also means you're extra careful not to use company-generic (info@) email addresses that more than one employee uses.

For companies, Company Domain Name is the unique key that can't be used more than once. For example, you can't have two companies with

(Easily) Import de-dupe in HubSpot

Again, when importing in HubSpot, it's easy to import duplicates if you forget to add the de-dupe key (email for contacts or domain for companies).

We've covered this before but it's as easy as simply importing with the correct keys in your CSV or Excel.

Also, you can use existing HubSpot IDs to re-import back into HubSpot, updating existing records...

"You can use an object ID to specify any records that already exist in your CRM. All objects include contact, company, deal, ticket, and product. If you import an object that already exists in HubSpot, any matching properties will be updated with the latest data from your import." -- HubSpot

Using the HubSpot deduplication tool (for HubSpot Pro users)

Hooray! HubSpot is one of the few CRMs that gives enough of a damn to have a duplicate finder tool. According to HubSpot, the duplicate finder uses AI to know which contacts are similar, improving the duplicate matches.

The feature finds duplicates on a periodic basis (we're not sure if it's daily or weekly) and provides a list of (two-by-two) duplicates in a paginated list.

The manage duplicates feature is only for Pro and above users. This means you'll have to upgrade if you're using a Free account.

Althought there has been a lot of a acclaim to HubSpot for this tool it's still lacking in a number of ways, according to the HubSpot community.

For one, there are only two records per match. Duplicates come in any numbers.

Second of all, there's no bulk merging. However implementing this on HubSpot's part can be risky since there's no differentiation between partial matching duplicates and exact matching ones. This creates incredible risk for corrupted data and accidental merges.

Step One: Finding and merging existing duplicates in HubSpot

The first large job you have is to clean up the bulk of duplicates you have currently sitting in your HubSpot account right now. Once that's done, then we'll look into automating and preventing duplicates from entering.

Note: We're going to use Dedupely for the following examples. You can sign up for a free trial to see how this works on your account.

Addressing merge risk

Are two people with the same first and last name in real life duplicates of each other? No! However, unless your company has 100 million customers, you probably don't have to worry about people with the same name.

So we can start by preventing ourselves from making common sense mistakes with duplicate matches that are natural. Records sometimes have the same phone numbers, or other matching attributes.

In short, the more fields we use to match duplicates with the lower the chance of incorrect merging. However, the more fields we use, the fewer duplicates we're going to have. So there's a risk trade-off that, at some point, you'll have to make.

The amount of risk you take depends entirely on you and I urge you to proceed as risk-adverse as possible. Why? Because once done, it's nearly impossible to fully and quickly recover from large amounts of incorrectly merged contacts. With the amount of data that moves around in a merge, it's very hard from a technical standpoint to undo merges (we still haven't found a reliable way to do it, and therefor we don't).

Best practices to prevent merge disasters

Yes, disasters can happen. They are entirely preventable and with the following best practices you don’t ever have to worry about data being mangled:

  1. Always have backups of your data handy. Create a special backup before large bulk merges.
  2. Always test and audit your matching setup. It’s easy to make a mistake by rushing. Dedupely does everything possible to prevent users from making simple mistakes. However, always closely review before bulk merges and audit the changes after the merge.
  3. Be aware of how and when your data evolves. Adapt your match setup and merge rules accordingly to avoid collisions with changes in your HubSpot data.
  4. Be in-tune with errors and nuances in your data. Having defaults like “000-0000” in a phone number which would match all records with blank phone numbers. To a computer non-blank defaults don’t look blank. Dedupely does try to make sure each input passes validation of what it’s suppose to look like.
  5. Never automate merges that you haven’t tested and reviewed. Never run bulk merges until you’ve looked over a good portion of the duplicates and are confident of the results.
  6. Take attribution into account. Decide how lead and contact owners are preserved through merge rules. Sam’s lead might become Jack’s lead because Jack has a duplicate of Sam’s lead. Who wins the ownership in this case?

Finding duplicates in HubSpot

Finding duplicates in Dedupely HubSpot

Start by selecting the fields you want to match. You can start by using the most basic fields (first and last names) and adding more fields later, narrowing down the possibilities.

Finding duplicates matching loading

Once the scan is finish it’s time to review the duplicates. Start by scrolling over the duplicate list looking for any inconsistencies in the matches.

Results of duplicate matches.

Discovering incorrect matches will show you how much confidence you can put in your match settings. This just means you need to tweak the setting before any bulk merges until you see no incorrect matches.

Start merging duplicates one-by-one

Merging one-by-one selecting the duplicates you want to merge.

One-by-one merging at the beginning will give you a deep level of understanding as to which details need to be addressed in the merges.

Custom merging gives you control over the field values that win the merge and over the master contact.

Using merge rules to save time

Form for merge rules in Dedupely

Merge rules automate the custom merging field value selection. You can set merge rules to decide automatically which value should win based on its value, its attributes or associated attributes of the record.

This will save you a lot of time of custom merging and is invaluable in bulk and automatic merging.

Set the master record

Master records–the record that is kept after the merge–and the slave records–the records that get deleted–can be also be set in merge rules by setting the ID that is kept.

Audit your merged HubSpot records

Audit and history of merged HubSpot duplicates.

Your merge history gives you a brief overview of the merged records. Also check HubSpot and hit “refresh” in the browser to see the latest updates to your merged data. If everything looks good, you’re well on your way to a duplicate-free HubSpot account!

“Help! I’m still seeing duplicates in HubSpot”

This is normal. You can’t get every dupe on the first round. At this point we start digging deeper. First and easiest way is to play around with the match settings.

By default Dedupely always adds “Similar match” to pre-selected text fields. Each type of field matcher will give you the ability to fine-tune your matches.

Changing this setting can help you find new matches. Each one can be used for finding different types of duplicates based on that field:

  • Exact match is the strictest match type. It ignores uppercase and lowercase but must match the text exactly.
  • Similar match is also fairly strict but ignores punctuation marks among other commonalities that deem removable.
  • Match first similar word/match last similar word both work to match the first or last words. This can work well for company names or first names that have the middle name added.
  • Fuzzy match is the most aggressive match type and works roughly on sound-alike matches. However, this is the most inaccurate match type and should never be relied on to produce correct matches.

After this, take a look at the prefixes and suffixes in your data. You can improve matching by ignoring common terms. By adding ignored terms you reduce the extra noise preventing proper matching.

How long should the initial HubSpot de-dupe take?

This initial process can take anywhere from a few hours to a few weeks. It depends entirely on how much data you have and what shape your data is in.

Step Two: Preventing duplicate record entry in HubSpot

Like we talked about above, preventing duplicates is way cheaper than fixing them. Obviously, fixing them is also cheaper than leaving them be.

HubSpot has a few features in place to prevent duplicates from entering however nothing is perfect and human error still prevailes.

Review the common duplicate sources

Knowing where your duplicates are coming from will help you find solutions to preventing them from entering.

If your development team uses HubSpot's APIs or builds custom solutions for HubSpot, consult with them on the possibility of their code causing duplicates and what can be done to fix that.

Check on any types of third-party apps that create new records. Web to lead form integrations, integrations that synchronize records across platforms without duplicate checks.

Educating your team on data etiquette

Each team member who enters or modifies data in HubSpot should be schooled on the importance of data etiquette. Your team should know how to handle each of the following:

  • How data should be formatted
  • How to enter a record without duplicating it
  • The tools they need to use to de-dupe or find a duplicate record
  • How to avoid incorrect sales rep attribution

If your team understands how to protect and maintain your data as it enters HubSpot you cut out a large portion of data incorrectness.

Step Three: Automate

Once you’ve gotten past the hump of step one and the tinkering of step two it’s now time to automate.

Prevention is the best way to solving most your duplicate problems (and maybe all problems). However dupes are inevitable and automation will pay dividends over the days, weeks, months and years your team is interacting with HubSpot. They will thank you as well!

Use auto merge to pick up daily duplicates

Turning on auto merge in Dedupely

Auto merge will help you avoid the work of merging obvious duplicates.

That said, you will have to put in some manual reviewing week to week. Dedupely will still alert you on non-auto merge match settings that are catching duplicates. You can then review them and make a decision on the individual dupes. If you see that each week these duplicates are obvious catches you can also automate those match settings.

With the help of your team and some preventative measures you’ll stay on top of your data.

Budgeting your HubSpot De-dupe

How de-dupes are priced

Based on the vendors I’ve seen out there I can tell you your record count is the largest factor in pricing.

Smaller record bases of less than 30,000-50,000 will start at around $500/yr.

Larger record bases 150,000+ can cost anywhere from $2,000 to $20,000 one off or per year.

Costs are also influenced by:

  • the level of customization of your HubSpot account
  • consulting bases instead of self-serve
  • the amount of time required to complete the de-dupe

Each data cleansing company handles their pricing a tiny bit different. The best way to get an idea is to contact them directly and handle it one on one.

Realistic time frames for initial HubSpot de-dupe

Just as pricing is based on amount of records, so is the amount of time you should take into consideration.

Smaller HubSot databases will open take an hour or more to be cleaned up. Larger databases can take days or weeks to complete. The amount of data, the extra care that needs to be taken and the variations of duplicates to be caught are much more vast in a larger database.

The best way to avoid last minute panic is to de-dupe your HubSpot database weeks ahead of time, not the days or hours ahead of time. You want to take a few factors into consideration when it comes to deadlines:

  • The speed of the company you contract to do your de-dupe.
  • Extra customization, testing and care needed to avoid data loss and errors.
  • Nuances and errors that need to be fixed in your existing data.
  • Extra time needed to sync and move data around.

To name a few. The worst ones are the unexpected issues that botch your newsletter or launch announcement deadline. Be safe, do it ahead of time.

Who should be involved in the de-dupe?

Your data guy/gal, sales managers and anyone who needs to get their input on the data and how it’s modified.

How do I calculate the ROI of de-duping HubSpot?

There are a number of studies showing how duplicate data causes some heavy damage to your company. However that may not be enough to justify the spend on data cleansing.

A simple way to calculate the ROI would be to calculate the lost hours reps spend un-mangling, finding duplicates and merging their data. You can measure the amount of time it takes to find one group of duplicates and merge them by hand. Then ask your reps how many times a day/week they have to fix duplicates. Multiply those together and you have an hour figure over a certain period of time. How much are those hours costing your company instead of benefiting it?

The calculation could look like this:

((Number of duplicates merged by reps daily) * (minutes to merge a duplicate / 60) = (lost hours)) * (hourly base pay)

Let’s test some fictional numbers. Our reps merge up to 20 duplicates a day, on average it takes 4 minutes to merge each one.

(20 * 4 minutes = 80 minutes) * 30.00/hr = $39 per day

Then take into account the number of reps in the team (let’s say 3) and you spend anywhere from $1170 to $3510 a month on hours dedicated to merging duplicate data.

You can factor in lost opportunities and other estimations. However if this calculation doesn’t appear significant you probably don’t have a duplicate problem. If you do it should be pretty easy to convince your team and peers to get on board with you in your de-dupe efforts.

NOTE: Obviously I’m no mathematician. This is a rough estimate. This is just to give an idea on how to estimate the financial loss occurred by duplicate data.

In-house vs. HubSpot Marketplace Solutions

We’ve all been in the DIY rabbit hole where, in the end, we just threw in the towel and paid someone else to do it.

Then there are scenarios where in-house just makes so much sense.

What are the options we have for effective de-duping?

API solution built by an in-house developer

You have developers. So why not put them to work on a custom solution that fits all of the requirements? Not so fast!

Some developers love writing their own apps. Believe me, at one point I would have designed our entire software stack–and nearly did–before I was reminded of the true costs of in-house.

Your developers are here to build the product or service you offer. Developers are also not cheap to pay. The initial build will take 1) time of initial development 2) time of bug fixes and specs back and forth 3) maintenance and updates over the years 4) costs of hosting and running.

The hours invested into this add up over time, eventually not being effective. Not to mention you’ll have to wait a good while before this runs smoothly.

HubSpot Native Duplicate Management

HubSpot sets itself apart from the rest by having native deduplication management.

We mentioned above the shortcomings of HubSpot's native duplicate management

  • Not enough ways to match fields, we’re stuck with AI thinking for us.
  • No automatic merging, bulk merging
  • Limited to how many records can be grouped into each match (2)
  • Overall somewhat cumbersome

The HubSpot native duplicate management, while a cut above other CRMs, still misses the mark in terms of time saving duplicate management.

“Heck, it’s the sales team’s job!”

When was the last time “data cleansing and cleaning up duplicates” was in the job description of a sales rep? Do any of your sales reps think their job is cleaning up bad data?

Training personnel to clean up after themselves is pretty basic. Everyone should actively clean up the data as they enter it into HubSpot. It’s just proper etiquette not to mention respect for fellow team members.

However if you’re not aiding sales in gathering, handling and cleaning data you’re making their lives harder, which in turn make sales harder and mean less sales for your company. By “aiding” I mean providing them with the proper tools to do their job.

Why HubSpot Marketplace solutions are a no-brainer

And no! HubSpot doesn't earn kick-back from their integration providers, as some might have you believe.

There are a few de-dupe apps on the HubSpot Marketplace. Some of them are not so great and others are pretty awesome. These apps are built by companies that dedicate large amounts of resources on building solutions to save you insane amounts of time and money.

You’ll save time on DIY hacks that end in misery, heavy projects that distract your team from serving your customer and overcomplicated native features that don’t really cut it.

What to look for in a de-dupe provider

I’ve done a lot of reviews of the pros and cons of all of the existing AppExchange apps. There are a lot right now. Some free, some don’t list prices. Either way you can’t go wrong with most of them.

If you’re going with a consultancy make sure they understand your specific needs. Don’t take anything for granted. They might not understand what X and Y field are for and not think they’re important. Tell them everything you can and ask as many questions as possible. In most cases they will be very in tune with what you need.

Self-serve solutions will give your team full control over the process. Although there is a learning curve to be expected and be prepared to dig through a bit of documentation and videos.

The software you choose should have the following features:

  • Field matching options and ability to ignore common terms, set exact, fuzzy or sound-alike.
  • Allow you to merge in bulk, customize merges and automate merges once you’ve done the homework and are confident in your settings.
  • Provides you with merge rules that cover all your master, attribution and custom fields needs.
  • Keeps you up to date on new duplicates that enter your HubSpot data.
  • Has extremely responsive support

This is what I’m leading Dedupely to accomplish. If self-serve doesn’t work for you, our team will pitch in at no extra cost. Take a closer look at the HubSpot <> Dedupely connector here.