Deduplicating source data

AD
DM

In source subscriptions, use the Source Dedup option to ensure that duplicate records from a source system can be matched and merged with other records from the same data feed, even if those records do not match any records currently in Network.

By default, Network matches incoming records against records that are currently in Network. If no matches are found for an incoming record, the incoming record is inserted into Network as a new, unique record.

However, when the subscription itself contains duplicates from the source, the default behavior should not apply. Finding duplicates within a data feed is common in initial loads from source systems, and can occur in subsequent loads as well.

Managing duplicate records

When Source Dedupe has been set in the subscription configuration, each data load undergoes a second iteration of matching to make sure any duplicates within the subscription that do not match current records in Network can be deduplicated. This occurs prior to final merge so that duplicates are not added to your Network data.

Configuring source deduplication

Defining source deduplication determines whether grouping and match rules are performed to deduplicate ASKClosed A single incoming record matches multiple incoming or Network records without one clear "best" match. Ambiguous matches typically require human review. or unmatched source records. You can configure it for a source subscription in one of the following ways:

  • User interface - In the Settings section, select Source Dedupe

  • Advanced Mode
    • For entities (HCPs, HCOs), use the job.match.dedup property. By default, this property is set to False.
    • For sub-objects, use "job.merge.childDedup": "value". The value can be any combination of ADDRESS, LICENSE, or PARENTHCO. By default, there is no value.

      Note: The value must be in uppercase letters for the configuration to work. For example, "ADDRESS" cannot be in lowercase or mixed case letters.

Deduplicating sub-objects

Administrators and data managers can choose to merge any duplicate sub-objects that are loaded during a source subscription job.

This feature does not deduplicate existing sub-objects in your Network instance. It applies only to data that is loaded through source subscriptions after the feature is enabled in each subscription. Sub-object duplicates cannot be matched and merged in Veeva OpenData or third party source subscriptions.

Matching sub-objects

Duplicate sub-objects are matched and merged using key matching (custom keys are the same) and field matching (using duplicate detection rules). Key matching is not available for Parent HCO objects; they are matched using field matching, or duplicate detection rules.

Duplicate detection rules

Network contains a set of default sub-object comparison rules that are applied in any merge comparison. These rules are not visible in the Network UI.

Addresses

Addresses are compared to each other only if they have the same address verification status.

If the verification status is identical, the default comparison rules are then used. If the addresses are byte-to-byte identical based on the fields, the addresses are considered a match and are merged.

Addresses could be identical, but if their address verification status differs, the incoming address will not merge into the existing one; it will be added as a new address.

This behavior applies to addresses that are added in any job in Network.

Overridden addresses

  • Existing addresses - Addresses that have been overridden by a Data Steward will never be merged with an incoming address because the address verification statuses will not be the same.

  • New addresses - When new addresses are included in add or change request and they are overridden by the Data Steward that is processing the request, they are not re-cleansed.

Fields used in comparison rules

The following fields are used for each sub-object to compare for duplicates.

Address License Parent HCO

The following fields are compared for these address verification statuses:


V - Verified, A - Ambiguous, or P - Partially Verified

  • thoroughfare__v
  • premise_number__v
  • locality__v
  • country__v
  • postal_code_primary__v
    (except in China, use
    administrative_area__v instead)

Any other verification status

  • address_line_1__v
  • locality__v
  • country__v
  • postal_code__v
  • license_number__v
  • type_value__v
  • license_degree__v

 

  • parent_hco_vid__v
  • relationship_type__v
    (except in Japan, use
    department_name__v instead)
  • hierarchy_type__v

Tie-breaker rules

For sub-object source deduplication, if duplicate sub-objects exist (there's more than one match) then Network breaks the tie by matching the objects to the best sub-object based on the following criteria:

  • Address: status, ordinal__v, lowest Network Entity ID (VID)
  • License: status, best_state_license__v, lowest VID
  • ParentHCO: status, is_primary_relationship__v, lowest VID

Sub-object deduplication examples

Review the following examples to understand how Network deduplicates sub-objects.

Example 1 - Deduplicating entities and sub-objects using duplicate detection rules

In this example, multiple HCPs are loaded with the same address multiple times. The HCPs have already been determined to match and merge.

The custom key uses the following configuration:

  • Source: SAPĀ®
  • Item: ADDRESS
  • Value: Network ID (VID)

HCP

VID First Name Last Name NPI Number Custom Key
59259 Bob Smith 9925125 SAP:HCP:59259
7513B Bob Smith 9925125 SAP:HCP:7513B

Address

VID Address_ID Address Line 1 City ZIP Country Custom Key
59259 4292151 123 Main Street Salem 97310 United States SAP:Address:4292151
59259 4292152 123 Main Street Salem 97310 United States SAP:Address:4292152
20012 4292153 123 Main Street Salem 97310 United States SAP:Address:4292153
20012 4292154 123 Main Street Salem 97310 United States SAP:Address:4292154

Result

When the data is loaded, Network first checks the custom keys for matches. In this example, the keys for the HCP and the Address do not match. Network uses the duplicate detection rules to determine if the objects are duplicates. The addresses are the same so one address is loaded. In this example, the address with VID 20012 wins because it was the last row to be added. All of the custom keys are added to the sub-object.

HCP

VID First Name Last Name NPI Number Custom Key
VID1 Bob Smith 9925125 SAP:HCP:7513B
SAP:HCP:59259

Address

VID Entity_ID Address_ID Address Line 1 City ZIP Country Custom Key
VID1A   4292154 123 Main Street Salem 97310 United States SAP:Address:4292154
SAP:Address:4292153
SAP:Address:4292152
SAP:Address:4292151

Example 2 - Duplicate detection rules

A record is loaded with duplicate addresses. There are no duplicate custom keys, so Network uses duplicate detection rules to deduplicate the sub-objects.

The custom key uses the following configuration:

  • Source: SAP
  • Item: ADDRESS
  • Value: Network ID (VID)
VID Address Line 1 City ZIP Country Phone Custom Key
2215A 123 Main Street Salem 97310 United States 978-224-5000 SAP:Address:2215A
7513B 123 Main Street Salem 97310 United States 351-555-0000 SAP:Address:7513B

Result

When the data is loaded, Network first checks the custom keys for matches. In this example, the keys do not match, so Network uses the duplicate detection rules to determine if the objects are duplicates. The addresses are the same, so one address is loaded. In this example, the address with Source ID 7513B wins because it was the last row to be added. The phone number from the other address object is not included.

VID Address Line 1 City ZIP Country Phone Custom Key
VID2A 123 Main Street Salem 97310 United States 351-555-0000 SAP:Address:7513B
SAP:Address:2215A

Sub-object merging occurs after the HCP and HCO entities are matched together through source deduplication.

Job properties

When source subscriptions run, Network uses the following advanced properties for source deduplication:

Entity source dedupe - "job.match.dedup": "true". The value can be either true or false. The value is false, by default.

Child source dedupe - "job.merge.childDedup": "ADDRESS,PARENTHCO". The value can be any combination of ADDRESS, LICENSE, or PARENTHCO. By default, there is no value. The value must be in uppercase letters for the configuration to work.

Selecting the source deduplication options in the source subscription populates these job properties, but they can also be manually updated using Advanced Mode.

Job settings summary

To review the deduplication settings for the subscription, see Source Dedupe in the Match Settings. The list of objects that were selected for deduplication are listed. Entity refers to both HCPs and HCOs.