Open Menu


What is CRM Data Duplication? Why It's a Trillion Dollar Problem

Clinton Skakun on Jan 2 2018

What's "duplicate data" or more specifically "duplicate CRM data"?Simply put, any piece of data that's been created more than once or appears identical. For example, three contacts named "John Smith" who refer to the same person (not actually three John Smiths, but the same one), would be considered duplicates.CRM software has only been around for a few decades but duplicate data has been an issue since the printing press, or sooner. It could be argued that even the ancient Sumerians ran into this problem when carving cuneiform script into their clay tablets.Duplicate data really became an issue with the advent of computers. When data could easily be copied without much human input, algorithms were created to combat this issue. Imagine the pains Dropbox ran into dealing with synchronizing folders and files across multiple machines that might already have the same data.We don't give duplicate data much thought when it's easily taken care of by software and virtually invisible. However, when we do encounter this problem we usually have to take extra steps to prevent it. Oddly enough, computer technology has become highly advanced, yet we still have trouble finding tools to organize and cleanse the digital mess we leave behind.Maybe the area we notice is the most is when it comes to our personal contact lists in our mobile phones. Earlier feature phones didn't have this issue because entering a contact was done manually and usually wasn't the quickest and easiest thing to accomplish. Once smartphones and cloud-syncing came on the scene, duplicate contacts became an instant problem. You might have dozens of the same people across multiple social networks, email accounts, and previously imported or migrated contact lists. You might also manually enter a contact that already exists in your list, but you simply forgot it was already there. Or, you didn't have the time to quickly search for the name before entering it.With all this automation and ease of use, it's easy to see (and something we experience on a daily basis) how duplicate contacts quickly take over our phones and email clients.And that's just what happens to the phone you use for personal use.In any company that has customers, this becomes an instant problem as well, but ten times easier to get wrong. Especially in our modern age where everything is kept digitally. Marketing and sales companies are especially prone to this problem. Most, if not all, tech companies have experienced this issue across their user or customer base, email lists, lead lists and, well let's just get to the point, CRM.CRM is commonly referred to as an industry, but CRM can also pertain to anything we use to maintain a list of business contacts, with data to help keep on top of the sales and marketing or any other business related use.The CRM industry has this issue mostly because of the shear bulk of, not just customers, but:
  • Email subscribers synchronized from the likes of Mailchimp (which sometimes also contains duplicate emails).
  • Leads collected from marketing campaigns, later imported into the CRM.
  • Followers and fans from social networks, like Facebook, Twitter, LinkedIn, etc. that are often times automatically synced with the CRM.
  • Cold leads collected from the phone book, purchased email lists, phone lists, etc.
  • Web2CRM-style leads fed in through forms and other opt-in methods
  • Contacts synced through integrations that don't check for duplicates
  • Lack of duplicate checks on imports
Just to mention a few.With all of the technology moving back and forth, and people's lack of ability to combat this themselves, it's no wonder nearly all businesses have this issue.So why is this really an issue? Who's actually getting hurt and why do we need to even think about this?You don't have to look too far to find the downsides of duplicate data. There are major drawbacks that effect not only your company from the inside, but also your customers, your credibility, and your overall brand.According to the 2016 Data Science Report from Crowd Flower, "60% [of data scientists] said they spent most of their time cleaning and organizing data." That's a huge amount of time that could be better spent on actually doing their job.In the sales sector, duplicate data confuses salespeople who make thousands of calls per month. Ever get cold calls from the same company multiple times, even after you've asked to be removed from the list (also, multiple times)? This is because they can't always simply remove you from the list (assume there is no "the" list). You've probably been entered more than once. One of your entries has a note to stop calling, but other duplicates haven't been updated. Which salesperson on a quota would consider updating or deleting multiple entries, even if they bother to "strike you off the list" at all?This pushes anyone interacting with your company to get frustrated. Why would a credible company make such a gross mistake? From within the company, it's easy to see why. From outside, there's little room to save face. Especially, when you combine personal that don't care with a mix of data issues.Even real pros who can't possibly remember every name and interact with 100s of clients a year. How are they supposed to remember that they called this person three months ago when there was no note or record, or the record of the last interaction was attached to another duplicate?Another example is when you receive the same newsletter multiple times from the same company. You probably signed up for the same newsletter more than once, or opted in somewhere else and landed on the same list.Bad data messes with everyone. Duplicates are everywhere and it works like a cancer, eventually getting too big to handle.Okay, okay! Duplicate contacts aren't as bad as government dept, or actual cancer, world hunger or the 100s of way worse issues society faces. But it is a problem, one that's worse than your website not working in Internet Explorer 8, the odd broken link or the extra 40 cents you could save on another brand of disposable coffee cups. Duplicate data does actually have a dark side, that affects your business's bottom line. IBM estimates that bad data costs US businesses, give or take, 3.1 trillion dollars, a year. That is, not handling your duplicate data (and not cleansing general "bad data") is leaking real money from your company.Really, we can't blame the team for not being better at data cleansing:
The reason bad data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive. The data they need has plenty of errors, and in the face of a critical deadline, many individuals simply make corrections themselves to complete the task at hand. They don’t think to reach out to the data creator, explain their requirements, and help eliminate root causes.--Harvard Business Review: Bad Data Costs the U.S. $3 Trillion Per Year


Duplicate data is a common problem across all businesses. And since technology touches nearly every industry, it has become much easier to accelerate without any effort. Yet, that much more challenging to combat, with real penalties to those who try.Financial reports derived from marketing, sales and general customer data get skewed. Credibility and moral suffers. Teams are much less productive. Proper decisions can't be made because data is inaccurate. Extra time has to be spent hunting for the right contact, note, piece of data.Can we really trust our data to do the job we trust it with? It really depends on how much time you and your team are spending on "fixing" the data. The extent at which your data is corrupt makes itself known. The reassuring part is that it's never silent for long. So yes, you can probably trust your Google Analytics, your bank statements, your call history or anything that isn't synced, touched and updated on a continual basis by dozens of people and services.Your CRM, your center of data collection, the data you collect and organize, your customer data, recorded interactions. Data that constantly moves, morphs and populates is the data you need to pay attention to, and probably your largest source of pain, lost productivity and lost opportunity.