Data Matching

This page outlines how we normalize attribute inputs (referred to as “Submitted Attribute”) and determine a match when compared against stored records (referred to as “Retrieved Attribute”) retrieved from the Secretary of State data sources and the business’s website (if Web Analysis is ordered).

Name Attribute

Normalization

  • Removes case sensitivity
  • Converts the submitted and registered business name’s entity suffix to a canonical version
    • Examples:
      • ABC Limited Liability Company → ABC LLC
      • ABC company llc → ABC LLC
      • ABC Incorporated → ABC CORP
      • ABC Corporation → ABC CORP
  • Removes extra white space characters (extra spaces and new lines at the beginning and end of a string)
  • Removes punctuation and extraneous characters (during comparison)
! " # $ % & ' ( ) * + , - . : ; < = > ? @  [ ] \\ ^ _ ‘ { | } ~

Note that we don’t normalize the submitted inputs unless used in matching.


Examples:

  • Middesk, Inc. → MIDDESK INC
  • Apple+Inc.; → APPLE INC

Matching

  • Success
    • The normalized submitted name matches at least one normalized retrieved name.
    • This case is satisfied if:
      • All of the characters in the submitted name match all of the characters in the retrieved name.
  • Warning
    • The normalized submitted name is similar to at least one normalized retrieved name.
    • This case is satisfied if:
      • The only other difference is found in the entity suffixes (’Middesk Inc’ would be considered a similar match to ‘Middesk LLC’) OR
      • There are differences in characters between the submitted name and the retrieved name.
        • For business names that are less than 10 characters, we allow for a 0-character difference
          • Any submitted name that is less than 10 characters long and has at least 1 character difference when compared to the retrieved name would be considered a “Failure” status
        • For business names between 10 and 30 characters, we allow for a 1-character difference
          • ’Prudential Inc’ was submitted vs. ‘Prudentiel Inc’ was the name that was retrieved on the SOS filing
        • For business names with 30+ characters, we allow for a 2-character difference
          • ’ABC Brothers Construction Company LLC’ was submitted vs. ‘DEC Brothers Construction Company LLC’ was the name retrieved on the SOS filing
  • Failure
    • The submitted normalized name does not match any normalized retrieved name OR the submitted TIN is associated with a different business name.

Address Attribute

Normalization

  • All addresses are normalized and geocoded prior to matching
  • All full addresses are normalized to this format: 3618 Elinburg Cove Trl, Buford, GA 30519-5337
  • A third-party address normalization / geocoding provider is used

Examples:

  • 85 2ND STREET SAN FRANCISCO CA → 85 2nd St, San Francisco, CA 94105-3459
  • Middesk Inc 85 2ND ST ste 710 san francisco ca → 85 2nd St Ste 710, San Francisco, CA 94105-3465

Matching

  • Success
    • The normalized submitted address exactly matches at least one normalized retrieved address.
    • This case is satisfied if:
      • All of the characters in the submitted name match all of the characters in the retrieved name.
  • Warning
    • The normalized submitted address is similar to at least one normalized retrieved address OR is an approximate address to at least one normalized retrieved address.
    • For the "Similar Address" match cases, the addresses are:
      • Different because it means that they are more than 0.2 miles away from each other, but may share the same state and city OR they share the same postal code and street. This scenario is extremely rare.
        • Example:
          • Submitted Address: 4322 N Hall St, Dallas, TX 75219-2731
          • Retrieved Address: 4104 N Hall St Apt 107, Dallas, TX 75219-5627
        • Example:
          • Submitted Address: 3755 Redwine Rd Apt 9323, Atlanta, GA 30344-5970
          • Retrieved Address: 3755 Redwine Rd Apt 9215, East Point, GA 30344-5983
    • For the "Approximate Address" match cases, the addresses could either be:
      • Different addresses, but they have the same Street, State, and Postal Code and are within 0.2 miles of each other. We've seen that this is usually due to typo made by the end user.
        • Example:
          • Submitted Address: 21098 E Duncan St, Queen Creek, AZ 85142-4867
          • Retrieved Address: 21089 E Duncan St, Queen Creek, AZ 85142-4868
      • Same exact address and within 0.2 miles, but simply missing a suite number. Once again, we've seen that this activity is usually just a typo by the end user.
        • Example:
          • Submitted Address: 4201 Cypress Creek Pkwy Ste 540 #1197, Houston, TX 77068-3458
          • Retrieved Address: 4201 Cypress Creek Pkwy Ste 540, Houston, TX 77068-3458
  • Failure
    • The submitted normalized name does not match any normalized retrieved name OR the submitted TIN is associated with a different business name.




TIN Attribute

Normalization

  • 9 digits are required
  • Removes any dashes or other extraneous characters



Examples:

  • 12-3456789 becomes 123456789



Matching

  • Success
    • The submitted Business Name is associated with the submitted TIN according to the IRS.
      • Note: Middesk leverages the IRS TIN matching program to verify whether the submitted TIN and the submitted entity name match. One thing to note is that the IRS TIN Matching program only matches on the first four characters of the submitted name regardless of length of the actual string/entity name (You can read about the program in-depth on the IRS site).
  • Alternate
    • The submitted TIN is associated with an alternative Business Name according to the IRS. Middesk will return the alternative Business Name that is associated with the submitted TIN.
    • Example:
      • Submitted Business Name was “Middesk Inc.” and Submitted TIN was “123456789”. However, the IRS indicated that the Submitted TIN “123456789” was associated with the Business Name “ABC Bank” instead.
  • Failure
    • The submitted TIN is not associated with the submitted Business Name according to the IRS nor does is it associated with another Business Name.

Person Attribute

Normalization

  • Removes case sensitivity
  • Removes punctuation
  • Removes extra white space characters (extra spaces and new lines at the beginning and end of a string)
  • Removes punctuation and extraneous characters (during comparison)
! " # $ % & ' ( ) * + , - . : ; < = > ? @  [ ] \\ ^ _ ‘ { | } ~

Note that we don’t normalize the submitted inputs unless used in matching.


Examples:

  • Kyle Mack → KYLE MACK
  • John+Smith → JOHN SMITH



Matching

  • Success
    • The normalized person name matches at least one normalized retrieved person name.
    • This case is satisfied if:
      • The normalized submitted person name exactly matches the normalized retrieved person name or one of the following cases is met:
        • Simple typo. Keith Morgan matches Kieth Morgan
        • Name suffixes. John Smith Jr matches John Smith Junior
        • Simple transpositions. John Smith matches Smith John
        • Middle name missing: John Smith matches John Roger Smith
        • Middle initial: John Roger Smith matches John R Smith
        • Maiden names: Jane Johnson Smith matches Jane Smith
  • Failure
    • The submitted normalized person name does not match any normalized retrieved person name.