Match logs

AD
DM

When you create source subscriptions and data duplication maintenance subscriptions, you can specify options to assist in analyzing and tracking match data.

Data group analysis log

Within the subscription settings, you can choose to export the data groupClosed A component of the match process that groups data into blocks to reduce the overall amount of data to compare. details to your FTP server.

The exported file identifies the different data group definitions that have been created as part of this job, as well as how many different data groups exist for each definition, in both the incoming data as well as your Network instance.

Network adds default match rulesClosed A definition that determines which fields in a record are a possible match and when a record comparison is considered a suspect match. See features and feature sets. into every instance for every country. The match rules include data groups, so every subscription has data groups that can be used or modified. You can also add your own data groups. For more information about data groups, considerations, and recommendations, see Create data groups.

Export the logs

Within the source or data dedupe subscription, you can choose the objects for the data group export. A .csv file is created for each selected object.

In the Export Settings section, expand the Data Group Analysis option. Select the objects that you want to export logs for. Veeva standard objects and custom objects (source subscriptions only) are supported.

When the job runs, the data group analysis is exported to a .csv file; for example, the HCO file name is <subscription_name>HCO-DATA-GROUP-ANALYSIS-<date>-job-xxx.csv.

Data group information

The exported .csv file contains the following information:

  • Key attibutes - lists the data group definition. The data groups that are created are the unique combination of these fields. For example, a data group definition could be: first_name__v + last_name__v + primary_country__v. The definition might create the following data groups: John + Smith + US and Jane + Smith + US. One definition can create many data groups.

    Note: Because all matching is done within a country, primary_country__v is automatically added to each data group definition. This isn't visible and cannot be changed.

  • Number of blocks - the number of distinct data groups created for the definition. If zero (0) data groups are created, the search didn't find any records for that data group definition.

  • Source Size - the columns for source minimum, maximum, and median size identify how the records in the incoming data were grouped based on the data group definition.

  • Master Size - the columns for master minimum, maximum, and median size identify how the records were grouped in your Network instance based on the data group definition.

Data group performance

In addition, the following columns can help you understand how the data groups are performing.

  • Search Time - The time it took to create the data groups. You can see which blocks are taking a long time to create to help you know if you should reconfigure your data groups.

  • Number of entities matched by this group - Displays the number of matches that were found by the group.

  • Number of entitles belonging to this group but without a winning match - Matches were found for these groups, but the records weren't matched within in the group.

  • Number of entities not belonging to this group - Displays the groups that are not finding matches. Use this information to understand which data groups are the least helpful.

Use this information to see which groups produced matches and which ones did not.

Example

Using the data in this log, we can make the following conclusions about the groups:

  • The first data group didn't take long to create but it didn't find any matches. It is not a useful group.

  • The middle rows indicate multiple data groups found matches. Not all of them may be required.

  • The last data group took a much longer time to create and didn't yield any matches. It should be removed.

Match analysis log

You can configure your source and data dedup subscriptions to export the match results output to your FTP server. The list contains all Veeva standard and objects, sub-objects, and relationship objects that are enabled in your Network instance. Custom objects are supported for source subscriptions only.

When the job runs, the match analysis is exported to a .csv file; for example, the HCO file name is <subscription_name>HCO-MATCH+DATA-GROUP-ANALYSIS-<date>-job-xxx.csv.

This .csv file provides details on which records matched and which rule found the match. It also includes details about both the incoming data and the data in the matching record. All records found in the incoming file that did not match are also listed. Separate logs files are created for sub-objects.

The match analysis log contains the following columns of information:

  • Features: The actual feature names included in the feature set shown in the previous column, or blank, if no match was found.
  • Advice: Indicates whether this rule will result in a merge (ACTClosed A high confidence match between two records. ACT matches result in a merge without any human review.), or a suspect match task (ASKClosed A single incoming record matches multiple incoming or Network records without one clear "best" match. Ambiguous matches typically require human review.) or if no match was found (unmatched).

  • Mode: Indicates where or how the match was found: in the local instance (Local Network Link), in the master instance (OpenData Master Link), within the incoming file (Dedup), or not at all (NA).

  • Source Archive: The temporary ID assigned to the incoming record when the file is loaded.

  • Source Type: The name of system used by source subscription.

  • Source Value: The value of the matching key.

    Tip: Use the values in the Source Value and Source Item Type columns to understand and identify key matches that occur during jobs.

  • Source Item Type: The "item" value of the matching custom key. This can help you identify the exact key that matched.
  • Source Country: The primary country of the incoming record.

  • Match ID: The ID of the matching record including the instance number (this could be a Veeva ID (VIDClosed A unique record identifier across all Network customers and instances. Also referred to as a VID. ) from the local instance or master instance, or a source archive ID if a match was found within the incoming file).

  • Numeric Match ID: The VID or archive ID.

  • Instance: The instance where the matching VID was found.

  • Match Country: The primary country of the matching record.

  • Source field_namev: The value in this field in the incoming file

  • Match field_namev: The value in this field in the matching record.

  • Data Group: field names: All data groups created by this job are listed in the remaining columns. For more information, see the section below on Data group information.

In fields with multiple values, a tilde (~) is used to separate each attribute. In fields with multiple rows, a tilde (~) is used instead of a carriage return.

Add fields to the match log

An advanced setting called job.match.additionalColumns can be used to provide a list of additional columns for entity and sub-object fields in the log. By default, the log shows both match and source values for all fields included in the match rules for that subscription. Use this advanced setting to add more values to that log.

For source files, in order for the fields and values to be shown in the match log, the fields must either be included in the match rules, or in both data groups and the job.match.additionalColumnn setting.

For matches, any rule that matches will display the field and value in the match log if the field is included in the job.match.additionalColumns setting.

To show fields, list the fields separated by commas (,). The name of the field and child field must be used exactly as it is used in the data model and the names are case-sensitive.

Each column name should be in the following format:

Match or Source: fieldName or fieldName.childFieldName

Note: If the column name is not in the correct format, the subscription job might fail.

Examples:

  • fieldName = addresses__v
  • childFieldName = thoroughfare__v
  • childFieldName = locality__v

To show the match value for the thoroughfare__v sub-object field:

"job.match.additionalColumns": "Match:addresses__v.thoroughfare__v"

To show both the source and match values for the thoroughfare__v and locality__v fields:

"job.match.additionalColumns": "Match:addresses__v.thoroughfare__v,Match:addresses__v.locality__v, Source:addresses__v.thoroughfare__v,Source:addresses__v.locality__v"

Any field from an HCP, HCO, or sub-object can be included in the match log but only attributes from records in the instance are shown. Additional columns might display for fields used in the match rules for other countries (ZZ) or China (CN). The additional columns will be added to the match logs for all entities (HCO and HCP).

For unmatched records, the columns will be empty.

Add fields by entity type

Additional columns of fields can also be added for each entity type (HCO and HCP).

Each column name should be in the following format:

HCO{Match or Source: fieldName or fieldName.childFieldName}

or

HCP{Match or Source: fieldName or fieldName.childFieldName}

A comma-separated list of additional columns can be defined for each entity type. The entity types must be separated by a semi-colon (;).

Example

"job.match.additionalColumns": "HCO{Match:primary_country__v};HCP{Source:addresses__v.vid__v, Match:first_name__v,Match:primary_country__v,Match:addresses__v.address_line_1__v}"

Add Veeva IDs to the match log

The Veeva IDs (VIDs) of matched addresses can also be included in the match log.

Use the following advanced setting and value:

"job.match.additionalColumns":"Match:addresses__v.vid__v"

To add this property to the source subscription, click Advanced Mode and include it in the Edit Module Properties dialog.

Data group information

The match log analysis file includes information about the data groups that contain the matching records and which data group produced the winning match. For any records with ACT or ASK outcomes, the following information is included in the match analysis file:

  • (X) – shows the data groups that each record was placed in.
  • (-) – shows the data groups that each record was not placed in.
  • (M) – shows the data groups that produced the match.

Note: If you have "key matching" turned on in a source subscription, and the incoming file matches solely on keys, the new data group columns do not display in the log. When a record matches on an external key, it doesn't go through the rest of the matching process; no data groups are created and the regular rules are not run.

For more information about data groups and examples of fields with null values, see Create data groups.

Sub-object match logs

Match logs for sub-objects (previously called child objects) help administrators and data managers track the sub-objects that are matched and merged during a source subscription job. A separate log file is created for each sub-object. Sub-object match logs can be used independently of sub-object source deduplication. All Veeva standard and custom sub-objects are supported.

Create a match log

To create a sub-object match log, select the object from the Match Analysis list and ensure that the Apply Updates & Merge setting is selected. This must be selected for the sub-object logs to be created.

The logs are created when the source subscription job completes.

Note: Job performance will slow if match logs are exported for all sub-objects.

Exported match log

Match analysis logs are exported as a .csv file to the outbound folder in your FTP server. Each log provides details on which records matched and how it matched. It also includes details about both the incoming data and the data in the matching record.

The sub-object match analysis log contains the following columns of information:

  • Reason / Rule Name - The method the match was found: N/A (Key Match), Field Match (duplicate detection rules), Network Entity ID (matched using the VID).
  • Advice: ACT, ASK, or UNMATCHED. Sub-objects are only ACT or UNMATCHED.
  • Entity id - The Network ID (VID) of the matching main object.
  • Entity Type - The entity type of the main object; either HCP or HCO.
  • Source Address ID - The Network ID (VID) or Archive ID of the source object.
  • Match Address VID - The Network ID (VID) or Archive ID of the matching object.
  • Source Custom Key - The custom keys of the source object.
  • Match Custom Key - The custom keys of the matching object.

Example

Review the example to see the address match analysis log that is created when the following HCP and address data is loaded into Network.

HCP

The HCP.csv file contains the following information:

IDfirst_name__vlast_name__vnpi_num__vprimary_country__v
2152ChrisWoodson9251950021US
992ChristopherWoodson9251950021US

When the subscription runs, Network matches these HCP records together using the NPI number

Address

The address.csv file contains the following information:

AddressIDAddress_IDaddress_Line_1locality__vadministrative_area__vcountry__vpostal_code__v
1215212988915 York StNew YorkUS-NYUS52152
2215212988915 York StNew YorkUS-NYUS52152
3215212989062 York StNew YorkUS-NYUS52152
4215212989162 York StNew YorkUS-NYUS52152
599212989262 York StNew YorkUS-NYUS52152
699212989262 York StNew YorkUS-NYUS52152
799212989452 Water StNew YorkUS-NYUS2150

Note: The Address column is added to help explain this example - it is not part of the .csv file.

Address match log

The exported match log tells you what existing address the incoming addresses matched on.

Results

  • Address 1 and 2 matched using key match because the custom keys are identical. In the match log, one entry displays. The Advice column displays UNMATCHED because the record did not match an existing record in Network. A new address record is created in Network.

    If addresses are matched using key matching the match log shows one entry; the new address.

  • Address 3 was processed next. There were no existing records in Network for this address to match, so the match log shows it as UNMATCHED; a new record is created in Network.
  • Addresses 4, 5, and 6 were processed next and they matched to Address 3 using duplicate detection rules. The match log advises that these records ACT matched and displays Address 3's custom key as the matching key in the Match Custom Key column.
  • Address 7 did not match any existing records in Network so the match log advises that it was UNMATCHED. A new record for the address is created in Network.

Log file names

When sub-object match logs are exported, they are zipped together in one file.

File Name: <System>-CHILD-MATCH-ANALYSIS-<Date and Time Stamp>-job-<Job ID>.zip

Example: SAP-CHILD-MATCH-ANALYSIS-2017-05-15T09-29-0700-job-5001.zip

Extract the file to view each sub-object match log.

File Name: <System>-<Sub-ObjectName>-MATCH-ANALYSIS-<Date and Time Stamp>-job-<Job ID>.csv

Example: SAP-ADDRESS-MATCH-ANALYSIS-2017-05-15T09-29-0700-job-5001.csv

Sub-object match logs are deleted from the FTP folder after three days.

Advanced job property

When source subscriptions run, Network uses the following advanced property for match analysis logs:

"Job.match.export": "HCP,HCO,Address,License,ParentHCO".

The value can be any combination of HCP, HCO, Address, License, ParentHCO, or custom object name. By default, there is no value.

Selecting the match analysis log options in the source subscription populate this job property, but it can also be manually updated using Advanced Mode.

Analyze the logs with Network Reports

Administrators and data managers can run reports on match logs. The logs contain many records, so using reports makes it easier to analyze the data to discover the effectiveness of your match rules and data groups. For more information, see Match log report.

Logs that have been created in the past three days are immediately available for use in Network Reports. Logs that are older than three days can be retrieved using the Analyze Match Log button on the Job Details page.