Match, merge, and survivorship
Data matching The process of identifying records that are similar or the same. is the process of comparing incoming records with existing records in a Network instance to determine whether they are the same. Data merging The process of combining successfully matched records, or those manually identified as the same. is the process of merging new or existing incoming records with data in a Network instance. Survivorship The process by which the final record is determined as duplicate records are merged, resulting in winning and losing values. is the process used during data merge to determine which parts of the existing records can be updated by incoming records.
The Network match process enables you to load data into Network and match or align incoming records with what already exists in your Network instance. You can fine tune the criteria by which matches and merges are made for both the regular match process (during data loads) and the ad hoc match process. Configuration for each is done similarly but separately within the Admin console.
The matching process consists of the following steps:
- Address cleansing validates incoming address data by running it through an address cleansing engine that ensures address formats are identical and that postal addresses are correct. Addresses are corrected against their respective location where possible but cannot necessarily be validated specifically for each particular entity. Note that address cleansing functionality can be disabled.
- Key matching occurs to complete the simplest matches before other matches. This step does not apply to first time loads, as Network has no prior external keys to compare. On subsequent loads, once an external key is linked to a Network entity, matching can be done on the external key to accelerate the matching process by avoiding fuzzy match Matching that is not exact; a comparison of similar values (for example Smith/Smyth) depending on match rules set by the customer administrator. for those records.
- Creating data groups (or blocks) identifies sets of candidate matches for incoming records, according to data groups defined in the match rules. Grouping data reduces the number of comparisons that are required. Each data group definition is used as search criteria against data in the Network instance and the results of each search create the individual data groups used in the match process. The search criteria are exact; the data returned from the customer database is an exact match to the search criteria. The search includes the entire customer instance, including both Network and customer-owned records in the customer org.
- Applying match rules to each data group compares the data to incoming records, one at a time, using the predetermined match rules (fuzzy match)
- Actioning the data applies the specified action in the match rules – either ACT, ASK, or ADD – for each incoming record.
You configure match by defining data groups, match rules, and confidence:
- fields may include Name, NPI, Specialty, and so on
- field groups may include Address, which includes other fields such as street, city, or zip code
- match strength can be weak, strong, or exact
Match behavior is dependent on how you configure match rules. Depending on the rules, for example, match might require name fields to be exactly the same between source and Network data. Additionally, it might require specialties to be similar, but not exact. A similar match would occur, for example, where an HCP might include only one specialty in the source data and multiple specialties in the Network data.
Match rules (fuzzy matches) use the following concepts:
- Feature - A feature is a defining characteristic used when two records are compared to determine a match. Examples of features include Name, Address, Specialty, ME number, and AOA number.
- Feature set - A feature set defines combinations of features for comparison. For example, if you compare two records with similar names, the same addresses and ME numbers, you can be confident that those records are the same. In this case, Name, Address, and ME number are part of the same feature set.
- Confidence - Confidence determines whether Network can automatically merge two records that match on a particular feature set, whether a manual review is required, or whether the records are clearly not matches.
- Matches on feature sets that have a high confidence score (above the defined ACT A high confidence match between two records. ACT matches result in a merge without any human review. threshold) result in an automatic merge of records.
- Matches on feature sets that have a confidence score below the defined ACT threshold but above the defined ASK A customer instance can subscribe to more than one master for a specific country (for example, Veeva OpenData and a third party master), which allows routing of DCRs to the appropriate master data stewards. threshold result in a suspect match, which must be reviewed by data stewards.
- Where no match is found using the defined feature sets, the incoming record is added to the Network instance as a new customer-stewarded record.
When an incoming record merges with a record from a master data source, the values provided by the master data source always survive for Veeva standard fields. That is, an incoming record can never directly overwrite values provided by the master data source.
If an incoming record has the same primary key as a record from the same source that was previously loaded into the Network instance, the incoming record is a potential update. It will then go through the merge process to apply survivorship rules, and to determine which parts of the existing record can be updated by the incoming record.
If the incoming record has not been previously been received by the Network instance, it goes through the fuzzy match Matching that is not exact; a comparison of similar values (for example Smith/Smyth) depending on match rules set by the customer administrator. process. This process compares the incoming record against records in the Network instance that are potentially similar and determines if the records represent the same person or organization.
For non-Veeva fields, survivorship is based on the precedence order defined for the source systems. For customer-stewarded records, survivorship on all fields is based on the precedence order defined for the source systems.
The Network merge process automatically de-duplicates child records (such as addresses, custom keys, licenses, and relationships) when two HCP records or HCO records merge.
Note: Inactivated external keys that are no longer used as a primary key for their corresponding record are ignored during the match process.
In the United States, Network leverages knowledge gathered from Veeva OpenData and makes use of alternate names and aliases for HCO names, which include short forms and former names for numerous HCOs. Additionally, Network leverages general short forms and acronyms to standardize corporate names prior to the matching process.
For example, many different ways to denote the term pharmacy are used within Network and standardized to the value pharmacy. The standard form of the term is then used in matching to remove ambiguity within corporate names.
- Aliases are not used for HCP names in Network.
- For Chinese data, Network standardizes numbers in one format prior to matching.