Overview
This article outlines how Element451 searches for and scores possible duplicate records.
Duplicate Search Trigger
Updating any of the following student data fields triggers an automatic search for possible duplicates:
First Name
Last Name
Email
Phone
Addresses
Identities
Date of Birth
Step 1: Initial Search
The initial search looks at first name, last name, and email.
It searches for the exact first name and all variations of that name and nickname.
It only searches for the email prefix, not the "@domain.com." This initial search gives back a list of potential duplicates with scores based on similarity to the record's name and email in question.
The initial search returns a list of potential duplicates with scores based on similarity to the record's name and email.
Step 2: Logic-Based Scoring
Element451 then calculates a probability that the two records are duplicates by applying logic based on flags and bonuses from other matching fields.
Flags
As Element notices patterns in the duplicates that enter Element451, we can ensure that known duplicates with specific matching fields are recorded at a certain probability. A scoring system based on simple binary options is used:
Field Matches | 1 |
Field Does Not Match | 0 |
Bonuses
Each field is compared individually to add bonuses to its initial match score. Different bonuses are applied depending on how strong an indicator that field is.
Date of Birth | 20% |
Address | 30% |
Phone | 30% |
15% |
Matching first and last names alone will not flag duplicates. Additional matches are needed: email, date of birth, address, or phone number.
Fields Matching, Normalization, and Comparison
Fields Matching, Normalization, and Comparison
Person Name
All names are made lowercase.
Leading and trailing spaces are removed.
A list of name parts is made based on spaces or hyphens.
First and last names are compared separately. If any of the first name parts from one record exist in the other record's first name parts, it is a match. If both conditions are positive, the name is considered a match.
Email
Emails from both records are stripped, so just the prefix before “@” is compared. Since these records have already passed the initial search, we are just searching for an exact match in this round.
Identity
If either record has an “EMAIL” identity, Element compares it against the other record's primary email address.
Date of Birth
The day, month, and year of the records' dates of birth are compared for an exact match.
Addresses
Element uses a third-party API to normalize each address's “Street 1” field.
If there are multiple addresses, Element checks if any of the first record's normalized addresses match any of the second record's normalized addresses exactly.
Phone
Using the clean 7-digit number only, if any of the first record's phone numbers match any of the second record's phone numbers exactly, it is a positive match.
Relationships
If the first and second records are related, they will not be considered duplicates.
Special Cases
When the last name matches but other fields are mismatched, it is a potential family member.
Step 3: Calculating Probability
Once bonuses are added, E451 compares the new score to the maximum score to calculate the probability that the pair is duplicated.
If the match probability is higher than 70%, the record constitutes a duplicate in the database and is displayed in the Deduplication Module.
Selection of a Master Record
When Element451 finds a possible duplicate, it performs a "smart" selection to determine which record should be a new "master" record. First, Element looks for a recent activity such as:
Logging in
Opening an email
Filling out a form
Registered for an event
Then, Element will prioritize the record without hard email bounces or unsubscribe milestones.
Deduplication is only available with the Element Ignite and Engage packages.