Skip to main content

Deduplication Logic

How Element451 determines duplicates by comparing and scoring fields.

Ardis Kadiu avatar
Written by Ardis Kadiu
Updated over 2 weeks ago

Overview

Element451 automatically searches for and scores possible duplicate records to help you maintain clean, accurate student data. This article outlines how the duplicate detection logic works—from the fields that trigger a check to how matches are scored and presented in the Deduplication Module.

If you're looking for a more foundational overview of how to use the Deduplication Module, check out our Getting Started with Deduplication article.


When Does Element451 Search for Duplicates?

A duplicate search is automatically triggered when any of the following fields on a student's record are updated:

  • First Name

  • Last Name

  • Email

  • Phone

  • Addresses

  • Identities

  • Date of Birth


How Duplicate Match Scores Are Calculated

When a potential match is found, Element451 assigns it a duplicate probability score. This score determines the likelihood that the two records belong to the same person.

Scoring Logic + Thresholds

  • Matching first and last names alone will not flag duplicates. An additional field is required for a score to be generated. At least one of these fields must match: email, phone, address, date of birth, or social security number.

  • Each field contributes differently to the score—some fields carry more weight than others.

  • A minimum score of 70% is required to flag a duplicate. The more valuable fields that match, the higher the duplicate score. For a list of common matching field combinations and the calculated score, see the Matching Field Combinations and Their Scores section below.

Special Cases + Considerations

  • Person Name

    • All names are made lowercase.

    • Leading and trailing spaces are removed.

    • Hyphenated and space-separated names (like “Mary-Jane” or “Mary Jane”) are split into parts and compared individually.

    • First and last names are compared separately. If any of the first name parts from one record exist in the other record's first name parts, it is a match. If both conditions are positive, the name is considered a match.

    • Element451 uses a comprehensive nickname database with over 1,300 variations to catch common name substitutions (e.g., "Bill" for "William").

  • Email

    • Emails from both records are stripped, so just the prefix before “@” is compared. Since these records have already passed the initial search, we are just searching for an exact match in this round.

    • When a record only includes name and email, the email must be an exact match. This prevents false positives when limited data is available.

  • Identity

    • If either record has an “EMAIL” identity, Element compares it against the other record's primary email address.

  • Date of Birth

    • The day, month, and year of the records' dates of birth are compared for an exact match.

  • Addresses

    • Element uses a third-party API to normalize each address's “Street 1” field.

    • If there are multiple addresses, Element checks if any of the first record's normalized addresses match any of the second record's normalized addresses exactly.

  • Phone

    • Using the clean 7-digit number only, if any of the first record's phone numbers match any of the second record's phone numbers exactly, it is a positive match.

  • Relationships

    • If two records are connected in Element451 (e.g., siblings or parents), they will not be flagged as duplicates, even if other fields match.

  • Social Security Numbers

    • SSNs are considered highly reliable. If included in either record, a matching SSN significantly boosts the duplicate score.

    • SSNs must contain exactly 9 digits to be used in duplicate detection.

    • When comparing records, Element451 automatically ignores any dashes, spaces, or formatting characters, so 123-45-6789, 123 45 6789, and 123456789 are all treated as the same SSN.

    • Important: If two records have the same SSN with different formatting, you'll need to decide which format to keep when merging, based on your data preferences.

Matching Field Combinations and Their Scores

The table below shows how different combinations of matching fields contribute to the duplicate probability score. A score of 70% or higher is required for records to be flagged as potential duplicates.

Matching Fields

Score

Name + Email

70%

Name + DOB

75%

Name + Address

75%

Name + Phone

75%

Name + Email + DOB

85%

Name + Email + Address

85%

Name + Email + Phone

85%

Name + SSN

95%

Name + Email + SSN

95%

Name + Phone + SSN

95%

Name + Address + SSN

95%

Name + DOB + SSN

100%

Name + Email + DOB + SSN

100%

Name + DOB + Address + SSN

100%

Name + Address + Phone + SSN

100%

Name + DOB + Address

100%

Name + DOB + Phone

100%

Name + Address + Phone

100%

All fields except SSN

100%

All fields including SSN

100%

Note: These combinations reflect how the deduplication algorithm scores matches behind the scenes. Element451 does not require every field to be present, only enough matching data to reach the 70% threshold.


Selection of a Master Record

When Element451 finds a possible duplicate, it performs a "smart" selection to determine which record should be a new "master" record. First, Element looks for a recent activity such as:

  • Logging in

  • Opening an email

  • Filling out a form

  • Registered for an event

Then, Element will prioritize the record without hard email bounces or unsubscribe milestones.

Did this answer your question?