Deduplicating source data
DM
In source subscriptions, use the Source Dedup option to ensure that duplicate records from a source system can be matched and merged with other records from the same data feed, even if those records do not match any records currently in Network.
By default, Network matches incoming records against records that are currently in Network. If no matches are found for an incoming record, the incoming record is inserted into Network as a new, unique record.
However, when the subscription itself contains duplicates from the source, the default behavior should not apply. Finding duplicates within a data feed is common in initial loads from source systems, and can occur in subsequent loads as well.
Managing duplicate records
When Source Dedupe has been set in the subscription configuration, each data load undergoes a second iteration of matching to make sure any duplicates within the subscription that do not match current records in Network can be deduplicated. This occurs prior to final merge so that duplicates are not added to your Network data.
Configuring source deduplication
Defining source deduplication determines whether grouping and match rules are performed to deduplicate ASK A single incoming record matches multiple incoming or Network records without one clear "best" match. Ambiguous matches typically require human review. or unmatched source records. You can configure it for a source subscription in one of the following ways:
-
User interface - In the Settings section, select Source Dedupe
- Advanced Mode
- For entities (HCPs, HCOs), use the
job.match.dedup
property. By default, this property is set to False. For sub-objects, use
"job.merge.childDedup": "value"
. The value can be any combination of ADDRESS, LICENSE, or PARENTHCO. By default, there is no value.Note: The value must be in uppercase letters for the configuration to work. For example, "ADDRESS" cannot be in lowercase or mixed case letters.
- For entities (HCPs, HCOs), use the
Deduplicating sub-objects
Administrators and data managers can choose to merge any duplicate sub-objects that are loaded during a source subscription job.
This feature does not deduplicate existing sub-objects in your Network instance. It applies only to data that is loaded through source subscriptions after the feature is enabled in each subscription. Sub-object duplicates cannot be matched and merged in Veeva OpenData or third party source subscriptions.
Matching sub-objects
Duplicate sub-objects are matched and merged using key matching (custom keys are the same) and field matching (using duplicate detection rules). Key matching is not available for Parent HCO objects; they are matched using field matching, or duplicate detection rules.
Duplicate detection rules
Network contains a set of default sub-object comparison rules that are applied in any merge comparison. These rules are not visible in the Network UI.
Addresses
Addresses are compared to each other only if they have the same address verification status.
If the verification status is identical, the default comparison rules are then used. If the addresses are byte-to-byte identical based on the fields, the addresses are considered a match and are merged.
Addresses could be identical, but if their address verification status differs, the incoming address will not merge into the existing one; it will be added as a new address.
This behavior applies to addresses that are added in any job in Network.
Overridden addresses
-
Existing addresses - Addresses that have been overridden by a Data Steward will never be merged with an incoming address because the address verification statuses will not be the same.
-
New addresses - When new addresses are included in add or change request and they are overridden by the Data Steward that is processing the request, they are not re-cleansed.
Fields used in comparison rules
The following fields are used for each sub-object to compare for duplicates.
Address | License | Parent HCO |
---|---|---|
The following fields are compared for these address verification statuses:
Any other verification status
|
|
|
Tie-breaker rules
For sub-object source deduplication, if duplicate sub-objects exist (there's more than one match) then Network breaks the tie by matching the objects to the best sub-object based on the following criteria:
- Address: status,
ordinal__v
, lowest Network Entity ID (VID) - License: status,
best_state_license__v
, lowest VID - ParentHCO: status,
is_primary_relationship__v
, lowest VID
Sub-object deduplication examples
Review the following examples to understand how Network deduplicates sub-objects.
Example 1 - Deduplicating entities and sub-objects using duplicate detection rules
In this example, multiple HCPs are loaded with the same address multiple times. The HCPs have already been determined to match and merge.
The custom key uses the following configuration:
- Source: SAPĀ®
- Item: ADDRESS
- Value: Network ID (VID)
HCP
VID | First Name | Last Name | NPI Number | Custom Key |
---|---|---|---|---|
59259 | Bob | Smith | 9925125 | SAP:HCP:59259 |
7513B | Bob | Smith | 9925125 | SAP:HCP:7513B |
Address
VID | Address_ID | Address Line 1 | City | ZIP | Country | Custom Key |
---|---|---|---|---|---|---|
59259 | 4292151 | 123 Main Street | Salem | 97310 | United States | SAP:Address:4292151 |
59259 | 4292152 | 123 Main Street | Salem | 97310 | United States | SAP:Address:4292152 |
20012 | 4292153 | 123 Main Street | Salem | 97310 | United States | SAP:Address:4292153 |
20012 | 4292154 | 123 Main Street | Salem | 97310 | United States | SAP:Address:4292154 |
Result
When the data is loaded, Network first checks the custom keys for matches. In this example, the keys for the HCP and the Address do not match. Network uses the duplicate detection rules to determine if the objects are duplicates. The addresses are the same so one address is loaded. In this example, the address with VID 20012 wins because it was the last row to be added. All of the custom keys are added to the sub-object.
HCP
VID | First Name | Last Name | NPI Number | Custom Key |
---|---|---|---|---|
VID1 | Bob | Smith | 9925125 |
SAP:HCP:7513B
SAP:HCP:59259 |
Address
VID | Entity_ID | Address_ID | Address Line 1 | City | ZIP | Country | Custom Key |
---|---|---|---|---|---|---|---|
VID1A | 4292154 | 123 Main Street | Salem | 97310 | United States |
SAP:Address:4292154
SAP:Address:4292153 SAP:Address:4292152 SAP:Address:4292151 |
Example 2 - Duplicate detection rules
A record is loaded with duplicate addresses. There are no duplicate custom keys, so Network uses duplicate detection rules to deduplicate the sub-objects.
The custom key uses the following configuration:
- Source: SAP
- Item: ADDRESS
- Value: Network ID (VID)
VID | Address Line 1 | City | ZIP | Country | Phone | Custom Key |
---|---|---|---|---|---|---|
2215A | 123 Main Street | Salem | 97310 | United States | 978-224-5000 | SAP:Address:2215A |
7513B | 123 Main Street | Salem | 97310 | United States | 351-555-0000 | SAP:Address:7513B |
Result
When the data is loaded, Network first checks the custom keys for matches. In this example, the keys do not match, so Network uses the duplicate detection rules to determine if the objects are duplicates. The addresses are the same, so one address is loaded. In this example, the address with Source ID 7513B wins because it was the last row to be added. The phone number from the other address object is not included.
VID | Address Line 1 | City | ZIP | Country | Phone | Custom Key |
---|---|---|---|---|---|---|
VID2A | 123 Main Street | Salem | 97310 | United States | 351-555-0000 |
SAP:Address:7513B
SAP:Address:2215A |
Sub-object merging occurs after the HCP and HCO entities are matched together through source deduplication.
Job properties
When source subscriptions run, Network uses the following advanced properties for source deduplication:
Entity source dedupe - "job.match.dedup": "true". The value can be either true or false. The value is false, by default.
Child source dedupe - "job.merge.childDedup": "ADDRESS,PARENTHCO". The value can be any combination of ADDRESS, LICENSE, or PARENTHCO. By default, there is no value. The value must be in uppercase letters for the configuration to work.
Selecting the source deduplication options in the source subscription populates these job properties, but they can also be manually updated using Advanced Mode.
Job settings summary
To review the deduplication settings for the subscription, see Source Dedupe in the Match Settings. The list of objects that were selected for deduplication are listed. Entity refers to both HCPs and HCOs.