How Middesk matches data
You can most effectively use Middesk’s results if you understand how Middesk matches data. The Middesk approach to data is designed to provide accurate results, even across common permutations, while saving time.
Data matching process
Middesk performs these three steps to process data:
Normalize
Middesk first normalizes the data you submit into a “cleaned version.” This includes removing case sensitivity, extra white space, and punctuation. Then Middesk converts that text to a canonical version (for example, Street may become ST).
Search
Using the “cleaned version” data, Middesk then attempts to verify the business. To do so, Middesk searches its internal database to find the best set of records across Middesk’s authoritative data sets. This includes looking for the:
- Business name
- Business address
- Tax Identification Number (TIN)
- Associated people
Match reporting
Depending on the attribute, Middesk returns a value indicating the match type: Success, Warning, Failure, or Alternate.
Success
The normalized submitted attribute exactly matches at least one normalized, retrieved attribute.
Warning
The normalized attribute is similar to at least one normalized, retrieved attribute.
Failure
The submitted normalized attribute does not match any normalized, retrieved attributes.
Alternate (TIN-only)
The submitted attribute is associated with an alternative attribute according to the IRS. Middesk returns the alternative attribute associated with the submitted TIN.
Attribute-specific matching
Each attribute type has specific normalization and matching rules.
Name matching
Normalization
Middesk normalizes business names by:
- Removing case sensitivity
- Converting entity suffixes to canonical versions
- Removing extra whitespace at the beginning and end
- Removing punctuation and extraneous characters during comparison
Matching rules
Success: All characters in the normalized submitted name match all characters in the normalized retrieved name.
Warning: The normalized submitted name is similar to the retrieved name. This includes:
- Entity suffix differences (for example, “Middesk Inc” matches “Middesk LLC”)
- Minor character differences based on name length:
Failure: The normalized submitted name does not match any normalized retrieved name, or the submitted TIN is associated with a different business name.
Address matching
Normalization
Middesk normalizes and geocodes all addresses prior to matching using a third-party provider. Addresses are converted to a standard format.
Matching rules
Success: The normalized submitted address exactly matches at least one normalized retrieved address.
Warning: The normalized submitted address is similar or approximate to a retrieved address. There are two scenarios:
Similar address: Addresses are more than 0.2 miles apart but share the same state and city, or share the same postal code and street.
Approximate address: Addresses are within 0.2 miles of each other with the same street, state, and postal code. This typically occurs due to typos or missing suite numbers.
Failure: The normalized submitted address does not match any normalized retrieved address.
TIN matching
Normalization
Middesk normalizes TINs by:
- Requiring exactly 9 digits
- Removing dashes and other extraneous characters
Matching rules
Middesk uses the IRS TIN Matching program to verify whether the submitted TIN and business name match.
Success: The submitted business name is associated with the submitted TIN according to the IRS.
Alternate: The submitted TIN is associated with a different business name according to the IRS. Middesk returns the alternative business name associated with the submitted TIN.
Failure: The submitted TIN is not associated with the submitted business name according to the IRS, and is not associated with any other business name.
Person matching
Normalization
Middesk normalizes person names by:
- Removing case sensitivity
- Removing punctuation
- Removing extra whitespace at the beginning and end
- Removing extraneous characters during comparison
Matching rules
Success: The normalized submitted person name matches at least one normalized retrieved person name. Middesk accounts for common variations:
Failure: The submitted normalized person name does not match any normalized retrieved person name.