Defining match rules


Match rules determine how matching is performed between incoming records and existing records within the defined data groups. Individual data elements such as first, middle, and last name are the key elements for final match rules, and are used to create features.

Features

Features are combined in one or many ways to create feature sets. Finally, each feature set is given a confidence level that dictates what Network will do for any record pair that meets the rules for that feature set.

Each classifier begins with the <ruleClassifier> tag. The closing tag, </ruleClassifier>, concludes the entire classifier.

Feature sets

Feature sets consist of feature groupings that are ultimately assigned confidence levels to determine how Network treats record pairs that meet the rules for the feature set. You define feature sets and features for both the HCP entity and the HCO entity. Feature sets are categorized by those that should result in an ACT (automatically merge) or an ASK (create a suspect match) result.

Confidence levels

Within the match rules, feature sets define comparison methods for selected features and are configured with a confidence level, or threshold. The confidence level dictates how Network deals with the corresponding matched data pairs. Three types of actions determine how Network responds:

  • ACT actions tell Network to automatically merge pairs that are considered a strong match.
  • ASK actions tell Network to send the pairs that are a possible match to customer data stewards for review. The reviewer can choose to merge the records or keep them separate. Keeping them separate results in a new customer-owned record.
  • ADD actions tell network to automatically create a new customer-owned record for an incoming record that is not considered a match to any Network or customer-owned record in the customer instance.

ACT and ASK confidence levels are defined in the match rules; ADD actions are not, as they are implied upon failure of the other confidence levels.

ACT matches should be set at 0.9 and ASK matches should be set at 0.8. This means that any feature set with a confidence level of 0.9 or higher will automatically merge, and any feature set with a confidence level of 0.8 or higher but less than 0.9 will be sent to a data steward for review.

Note: ASK matches for Veeva OpenData records that have not been downloaded do not appear in suspect matches; Suspect matches only include records that are already in an instance. Veeva IDs for either match scenario (whether the record is downloaded or not) appear in the match logs.

Confidence values in features

Within a feature set, features referenced in that set each of their own confidence values that factor into the final ACT or ASK outcomes. A match outcome is determined by the highest feature set where the matching records passed all features within the set.

For example, the following feature set contains the these features: names are identical and licenses match.

<featureSet>
	<name>names are identical and licenses match</name>
	<confidence>0.94</confidence>
	<feature>names are identical</feature>
	<feature>licenses match</feature>
</featureSet>

Each feature in the feature set has its own confidence threshold; for example, .85 for names are identical and .9 for licenses match. The features within this feature set, and the confidence value of the feature set itself must be unique.

Default match rules

Network includes default match rulesClosedA definition that determines which fields in a record are a possible match and when a record comparison is considered a suspect match. See features and feature sets. that are tuned to align with the characteristics of data in a particular country. As of v3.0, default match rules are tailored by country to consider differences between Latin and Chinese character sets.

All customer-created source subscriptions as well as the change request and merge request subscriptions use one set of default match rules. When a new source subscription is created, Network provides default match rules for every country that you can tailor to each individual data set. Ad hoc match jobs use a second set of default match rules that administrators can configure in the Admin console.

Each match configuration shows if you are using the default rules or if they have been overridden to benefit your specific data.

Note: The change request (change_request__v) and merge request (merge_request__v) subscriptions are internal, hidden subscriptions used by Network when processing all add, change, or merge requests done through the UI or API calls. API calls are used by external systems like CRM.

All source subscriptions perform key matching first, and then use match rules for any incoming data. Match rules are defined per country and applied accordingly to the primary country of each incoming record.

The following sections provide examples of the default rules used in source subscriptions, change requests, and merge requests.

Example: US default match rules

The US default match rules use the following rules in descending order of importance.

ACT match rules (HCP)

  1. ME is identical (and NPI is not different)
  2. NPI and last name are identical (and NPI is not different)
  3. AOA and last name are identical (and NPI is not different)
  4. ME and NPI are identical
  5. names are identical and licenses match (and NPI is not different)
  6. names are identical and addresses match (and NPI is not different)
  7. names are identical with middle initial and licenses match (and NPI is not different)
  8. names are identical with middle initial and addresses match (and NPI is not different)
  9. names are identical and address line 1 is the same (and NPI is not different)
  10. concatenated full names are identical and addresses match (and NPI is not different)

ASK match rules (HCP)

  1. names are similar and address line 1 is the same (and NPI is not different)
  2. full names are similar and city is the same (and NPI is not different)

ACT match rules (HCO)

  1. names are very similar and licenses match
  2. names are very similar using nGram and licenses match
  3. names are very similar and addresses match
  4. names are very similar using nGram and addresses match
  5. names are very similar and address line 1 is the same
  6. names are very similar using nGram and address line 1 is the same

ASK match rules (HCO)

  1. names are very similar and city is the same
  2. names are very similar and address line 1 is similar

Example: Chinese default match rules

The Chinese default match rules use the following rules in descending order of importance.

ACT match rules (HCP)

  1. parent HCO name, HCP name, gender, and professional title are identical and specialties match
  2. parent HCO name, HCP name, and gender are identical and specialties match
  3. parent HCO name, HCP name, gender, and professional title are identical

ASK match rules (HCP)

  1. parent HCO name and HCP name are identical
  2. parent HCO name, pinyin name, and gender are identical and specialties match
  3. parent HCO name, pinyin name, gender, and professional title are identical
  4. parent HCO name and pinyin name are identical
  5. parent HCO name is similar, HCP name and gender are identical, and specialties match
  6. parent HCO name is similar; HCP name, gender and professional title are identical
  7. parent HCO name is similar, pinyin name and gender are identical, and specialties match
  8. parent HCO name is similar; pinyin name, gender, and professional title are identical
  9. parent HCO name is similar and HCP name is identical
  10. parent HCO name is similar and pinyin name is identical

The name comparisons leverage different algorithms to evaluate match pairs.

Note: Matching does not occur on the formatted_name__v field; It is a derived field.

ACT match rules (HCO)

  1. names are identical, addresses match, and HCO type is the same
  2. names are identical, city name matches, and HCO type is the same
  3. names are identical and addresses match

ASK match rules (HCO)

  1. pinyin names are identical and addresses matches
  2. pinyin names are identical and city name matches
  3. names are similar using nGram and addresses match
  4. names are similar using nGram and city name

The name comparisons leverage different algorithms to evaluate match pairs.