Data quality

Data validation rules

22R2

Data Validation Rules have been renamed to Profile Validation Rules. Validation rules for data loading jobs are introduced in version 22R2.0. This name change helps to differentiate the two types of validation rules that Network now provides.

  • Profile Validation Rules - Apply to individual records and are used as data quality checks; for example, to ensure that mandatory fields are populated or that HCPs have an address.

  • Job Validation Rules - Apply to data loading jobs to prevent unintended changes to large numbers of records.

Validating suspect match data

22R2

Administrators can configure profile validation rules (previously called data validation rules) to run immediately after suspect match tasks are processed. Data Stewards no longer need to click Validate on the profile to run the validation rules.

This feature is not enabled by default.

How it works

After Data Stewards apply a suspect match task, the profile page opens to display the surviving record and profile validation rules will run. If any of the validation rules fail, the Validation box displays the issues. If no validation rules fail, a success message displays.

Note: If Data Stewards navigate from the profile page without resolving the validation issues, it does not prevent the merge. The records are merged at this point, but the data issues will remain until they are resolved.

Enable the feature

Administrators can enable this enhancement for their Network instance.

  1. In the Admin console, click Settings > General Settings.

  2. Select Edit.

  3. In the Application Setting section, under Validation Rules, select the Automatically run profile validation rules after a suspect match task is processed.

  4. Save your changes.

Whenever suspect matches are processed in your Network instance, the applicable profile validation rules will run.

Job validation rules

22R2

Administrators can create and manage data validation rules for source subscription and data updater jobs. Job validation rules will fail the data loading job if critical value changes try to occur on a large number of records.

Examples of critical value changes

  • updating mandatory name fields to NULL

  • deleting, invalidating, and inactivating records

  • opting out records

If a validation rule is set and a defined number of records violate the rule, the job will fail before the data is loaded and these changes take effect.

Note: Job validation rules reject certain changes for existing records; they do not ensure data quality for newly loaded records. Validation for newly added records is not supported.

Job Validation Rules apply to source subscriptions and Data Updater jobs.

This feature is available by default in your Network instance.

A set of predefined rules are provided for Veeva standard objects. These rules are enabled by default but they do not impact existing data loading jobs until Administrators enable the rule settings for these jobs.

Enable job validation rules for source subscriptions

The job validation rules setting is enabled on new source subscriptions by default. The setting is not enabled on existing subscriptions to ensure the rules have no impact until Administrators choose to enable it.

Job validation rules are supported for source subscriptions that load data from all source systems, including third party data providers.

Important: Job validation rules run during the merge stage of the source subscription job, so the rules are only applied if the Apply Updates & Merge setting is also selected.

To set the option for an existing source subscription:

  1. In the Admin console, click System Interfaces > Source Subscription.

  2. Select an existing subscription.

  3. In the General Settings section, select Enable All Job Validation Rules.

    To review the rules, click the Job Validation Rules link to navigate to that page.

Job validation rules are applied if the source feed contains fields that are part of a rule.

Enable job validation rules for Data Updater jobs

Administrators can enable job validation rules to run on data updater jobs. By default, this option is not enabled so it does not impact data updater jobs.

  1. In the Admin console, click Settings > General Settings.

  2. Select Edit.

  3. In the Application Settings section, select Apply all Enabled Job Validation Rules to Data Updater.

  4. Save your changes.

The job validation rules are applied if fields defined in the rules are part of the data feed.

Job validation rules page

Network provides a set of predefined job validation rules for each Veeva standard object.

To view the rules in the Admin console, click System Interfaces > Job Validation Rules.

The page contains a section for each enabled object in your Network instance. The objects are listed alphabetically by main object and then sub-objects and relationship objects.

The predefined rules are enabled by default but they do not impact data loading jobs until they are enabled to run for a source subscription or they are enabled for data updater jobs. For a list of the provided rules, see the Predefined rules section below.

Creating validation rules

Create validation rules to prevent critical field values changes.

Rules apply to all countries. If you create a rule using a field that is not available in a country and you run a job for that country, the job validation rule is not applied.

To create a rule:

  1. In the Admin console, click System Interfaces > Job Validation Rules.

  2. On the Job Validation Rules page, all objects that are enabled in your Network instance display. Expand an object and click Add Rule.

    Example

    Create a validation rule to ensure that a large number of HCP types are not changed from Prescriber.

    The rule will check that the hcp_type__v field value does not change from Prescriber to any value that is not Prescriber.

  3. In the New Rule section, define the following settings:
    • Rule Name - Type the name of the rule. The name must be unique.
    • Error Message - Type a message that Administrators and Data Managers will see if the rule triggers the job to fail.
    • Description - Type a description of the rule that displays on the Job Validation Rules page.
    • Threshold - Define the number of records to meet the rule criteria for the job to fail. The number must be between 1 and 20,000.
    • Reject Records Below Threshold - Select this option to reject any number of records below the threshold that meet the rule criteria. If this option is not set, records that meet the rule conditions below the threshold will be updated during the job.

      Tip: Select this option when the change will result in bad data quality; for example, if the update changes the HCP name to Null.

    • Conditions - Identify the field and values as the criteria for this rule. For most rules, this is a comparison between the old (existing value) and the new (incoming) value.
      • Field - Expand the list and choose the field. Only the fields for the object display.
      • Old Value - The existing field value on the record. Expand the list to choose the operator (Equals, Not Equals, Find, and so on) and then type the field value.

        The available operators depend on the selected field.

        Not all operators required a value. For example, the Is Changed, Is Null and Is Not Null operators do not require values.

      • New Value - The incoming value of the field that will update the record. Expand the list to choose the condition and then type the field value.

      Supported operators

      The following operators are available, depending on the selected field type:

      • Equals / Not Equals - Values are not case-sensitive.

      • Find / Not Find - Use this operator for substring matching. Think of it as "contains / not contains". For example, if you specify the FIND operator value as "abc" then the rule is violated for all fields that contain the substring "abc" ("abcxyz", "xyzabcxyz", "xyzabc", and so on). Values are not case-sensitive.
      • In / Not In - Used for string and reference fields. It specifically looks for values in fields and it ignores blanks or nulls. Values are not case-sensitive.
      • Is Null / Is Not Null - Use to check when there is no value or when existing values are removed and are not replaced. This is helpful to ensure that records contain critical values (for example, HCP name).
      • Is Changed - Used to determine if a field value is changed. If this operator is used, the New Value is not required.
      • Match Regular Expression / Not Match Regular Expression - Can be used to check if the old or new value match a certain expression. (Network uses Java Regular Expressions). Values are case sensitive.
      • Less Than / Greater Than - Can be used for fields containing numeric values; for example, fields with rankings or dates.
      • Between - Can be used for fields containing numeric values; for example, fields with rankings or dates.

        This operator looks at the values between what's specified, not including what's specified. For example, if you provide the values of 1 and 3, the rule will only fire/fail for a value of 2. If you provide values of 1 and 2, the rule will never fire because there is no value between 1 and 2.

      Click Add Condition to define another field. Each rule can link multiple conditions using the AND or the OR operator. All conditions must be joined through the same operator; you cannot mix AND and OR within the same rule.

  4. Save the rule.

The rule displays in the object section. It is enabled by default.

Edit rules

Predefined and custom rules can be edited; for example, you can raise or lower the threshold. All of the rule properties can be changed except the object and the Code. The Code is used for exporting configurations to other Network environments.

Copy rules

Copy a validation rule to create a similar rule.

To copy a rule:

  1. Expand a rule and click Copy.

    You can also copy a rule from the row in the list view. Click Options > Copy

  2. In the Copy Rule pop-up, click Yes, Copy.

    The copied rule opens so you can edit it. The Rule Name is appended with _Copy.

  3. When you have finished editing the rule, select Save.

    The rule is enabled by default.

Delete rules

All predefined and custom rules can be deleted.

To delete a rule:

  1. Expand a rule and click Delete.

    You can also delete a rule from the row in the list view. Click Options > Delete.

  2. In the Delete Job Validation Rule pop-up, click Yes, Delete.

    The rule is removed from your Network instance.

Triggered validation rules

Validation rules run on source subscription and data updater jobs if the source feed or file contains the fields defined in the rule conditions.

The job outcome depends on the configuration of any triggered rules.

Failed jobs

A rule is violated and the job fails if the number of records that meet that rule condition is equal or higher than the defined threshold. For example, if the rule threshold is 100 and the job tries to change a critical value for 100 or more records, the job fails and no records are updated.

Rejected records

If a rule threshold is not met and the job completes, individual records that violate the rule might be rejected depending on the Reject Records Below Threshold rule setting. Each rule contains this setting so you can determine the number of acceptable records to be updated.

Example

A rule threshold is 100 records and 3 records met the rule conditions.

  • Reject Records Below Threshold option is not set - The 3 records are updated by the completed job.

  • Reject Records Below Threshold option is set - The 3 records are rejected by the completed job.

Job errors

Errors are logged when a job fails because of a validation rule and when records are rejected because of a validation rule.

Source subscription job error log

The source subscription error log contains entries for the following issues:

  • Job fails because a validation rule threshold was met. The log entry contains the rule name and error message defined in the rule.

  • Job completes but records were rejected because the validation rule was configured to reject records below the rule threshold. The error log contains the rule and the record that the job tried to change so you can investigate the update.

    Source subscription jobs

  • Data updater jobs

    For data updater jobs, if the job completes but a number of records below the threshold were rejected, they will not be updated and will display as Skipped Records on the Job Details page. The Job Error Log will display the job validation rule that failed.

Job validation error log

All records that met any rule conditions are logged in a separate log file so you can investigate the records that caused the job rule violations. This includes records that violated rules without hitting the rule thresholds (records that were updated). The logged records stop near the defined threshold; for example, if the threshold is 100, the log might contain 100 or more records.

The log file is located in the outbound/job_validation_rules folder. All job validation rule logs for source subscriptions and data updater jobs are contained in this folder.

Log file naming convention:

  • Source subscription jobs - <subscription_name>-<timestamp>-job-<job_ID>.zip

  • Data updater jobs - data_updater_update_records__v-<timestamp>-job-<job_ID>.zip

Download the file to review information about the records and rules that were violated.

Example log file

This job failed because a large number of HCPs were inactivated. The log file displays all of the critical changes that the job made based on the rules that were triggered.

This log displays the records that violated the HCPIsInactivated rule; this is the rule that caused the job to fail. It also displays some records that violated the HCPTypeIsChanged validation rule. These records did not trigger the job to fail because it was below the threshold, but they are still logged so you can see that a critical value would be changed.

The log files contain the following columns:

  • VID - The Veeva ID.

  • Type - The object type.

  • Parent VID - Veeva ID of the parent object.

  • Parent Type - Type of parent object.

  • Native Key - If the source file contains a native key, it displays so you can identify which incoming record experienced the error.

  • Rule Name - The job validation rule name.

  • Rule ID - The rule code (automatically defined by Network)

  • Old Value - The existing field value on the record.

  • New Value - The new incoming value from the source feed or file.

Predefined rules

These following validation rules are available in your Network instance. They are enabled by default. You can edit these rules to customize them for your requirements.

 

Bulk updates

If you are intentionally updating a large number of records, for example, you are inactivating more than 1000 addresses, you will not want the job validation rule to run.

There are three options to perform the bulk update:

  • Increase the threshold of the job validation rule. For example, if there are 1000 addresses to be activated, temporarily increase the rule threshold to 2000. This will impact any jobs started by other users.

  • Temporarily disable the individual job validation rule on the Job Validation Rule page. This will impact all other source subscription and data updater jobs that are running while the rule is disabled.

  • Disable all job validation rules in the source subscription. This means that all other job validation rules that can prevent other critical value changes during the job are disabled as well.

All of these actions are tracked in the System Audit Log.

Logging

The System Audit Log contains any changes that are made to job validation rules.

The log tracks the following changes:

  • Enable or disable the setting to apply job validation rules to Data Updater jobs.

    Search by the GeneralSettings Object Types to filter the log records.

  • Change job validation rule settings for a subscription.

    Search by the Subscription Object Types to filter the log records. Open the log to view the change.

  • Create, edit, or delete job validation rules.

    Search for the JobValidationRule Object Types to filter the log records. Open the record to review the JSON so you can see the details of the change.

Exporting configurations

Administrators can export the job validation rules to a target environment using configuration packages.

All of the job data validation rules and the setting to enable job validation rules for the data updater can be exported.

  • Job validation rules - Export one rule, all rules for an object, or the Job Validation Rules section to move all of the rules.

    If you include job validation rules for custom objects, the custom fields and objects for those rules are automatically added to the export package.

  • Data updater setting - Select the General Settings configuration to export the setting.

For more information about exporting configurations, see the Managing configurations topic in the Veeva Network Online Help.

Data cleansing

22R1.1

Administrators and data managers can now define rules to cleanse and standardize data in fields.

Addresses are cleansed using Network's third party cleansing tool, but previously, there was no efficient way to cleanse and standardize other fields. Now you can create a data cleansing dictionary to do things like remove noise characters and words and replace strings with better strings.

Use the data cleansing dictionary with a new Network expression function in a source subscription or a data model field to update specific field values.

This feature is available by default in your Network instance.

Standardize field values

Cleansing fields helps you to improve the values. This can be helpful for matching and to ensure high data quality in general.

Example use cases:

  • Standardize letter case - Make the first character of each word uppercase (for example, cleanse the Kroger company as The Kroger Company).

  • Standardize names - Standardize the spelling or capitalization of names (for example, cleanse L'oreal to L'Oreal or usa to USA).

  • Formatting legal entities - Use the correct format for all entities (for example, cleanse co to Co.)

  • Remove noise characters or noise words - Remove commas, dashes, parentheses, and so on.

  • Add or remove spaces - Example: Add a space between the number and the measurement (for example, 2 mg).

  • Remove values - Blank out placeholder values ("unknown", "not available", and so on) that users submit.

  • Remove special control characters - Replace characters like tab, returns, line feed, and so on.

Data cleansing process

Cleansing field values involves the following steps:

  1. Profile your data to understand where data issues are and understand what cleansing rules are needed to fix these data issues.

  2. Create cleansing rules in a .csv file. This is your data cleansing dictionary.
  3. Upload the .csv file as a lookup table in your Network instance.

  4. Create a NEX rule in your source subscription or in a data model field.

If the NEX function is used on a source subscription, the incoming data is cleansed when the job runs. If the NEX rule is applied to a data model field, then the rule triggers and cleanses the data every time the record is updated (for example, in a source subscription or through a DCR).

Important considerations

When you are creating cleansing rules and defining the NEX rule, consider these key practices:

  • Profile the data and understand the issues so you can define the correct cleansing rules.

  • Test the rules to ensure that they cleanse your data in the right way and do not make any unexpected replacements.

  • Store the cleansed value and the original (raw) value in separate fields so you have a record of the change. This will help you to troubleshoot any unexpected replacements.

Create the data cleansing dictionary

In any spreadsheet application, create a .csv file that contains your data cleansing rules. This is your data cleansing dictionary. Each cleansing rule addresses a specific issue.

Example data cleansing dictionary

This data cleansing dictionary, company_name_dcd, contains rules to standardize a field called Company Name. It has rules for specific company names (for example, L'Oreal and RB Health) but it also has rules to standardize capitalization and legal entities for all company names.

Mandatory columns

The file must contain the following columns with a value:

  • string or regex - Each rule must include either a string or a regular expression (regex).

    Only one of these columns can be populated for a rule. If you have a value in both columns, an error occurs.

    • string - Means that the value is interpreted as a string constant.

    • regex - Means that the value is interpreted as a regular expression.

      These two alternative columns are there to make it easier for you if you want to match and replace just on a string. In that case, enter the value into the string column. You do not have to take care of escaping any characters that have a special meaning in regular expressions because everything in that column is treated as a string constant.

      However, if you want to use regular expressions to do some advanced matching and replacing, then enter the expression into the regex column. Everything in this column is treated as a regular expression, meaning that you might have to escape some characters depending on your specific expression and use case.

      Example

      If you specify a dot (.) character in the string column, then this means only every occurrence of a dot is replaced (for example, if you replace (.) with an empty string ("), n.a. is cleansed to na).

      However, in regular expression, the dot (.) character has a completely different meaning; it represents any character or number. So, if you replace the

      dot (.) character as a regular expression in the regex column with an empty string (") , all characters are replaced ("").

  • replacement - The value that you want the matching string or substring to be replaced with. The value can also be empty if you are stripping the field of the value.

Optional columns

The following columns can be included in the file:

  • case_sensitive - Supported values are Y or N. If the column is not included in the file, the default value is N. If you want the rule to match only on a specific case, specify Y in the column.

    Case applies to both string column values and regex column values.

    Example

    These regex rules are similar, but one is case sensitive.

    regex,replacement,case_sensitive
    [a-z],x,Y
    [a-z],x,N
    • The first rule replaces only all lowercase characters with x.

    • The second rule replaces all characters (regardless of letter case) with x.

  • matching_behavior - Identify the type of string or substring to match for replacement. If the column is not included in the file, the default matching behavior is SUBSTRING.

    • SUBSTRING - Every occurrence of the matching string is replaced.

    • TOKEN - Occurrences are replaced only if they are words separated by other words through white spaces. A TOKEN is a sequence of non-white space characters separated by white spaces (blanks, tabs, and so on). If a word is separated by a dash (-), comma (,), parentheses (), or other character, it is not treated as a TOKEN.

Tip: Document your cleansing rules by including an optional column to describe each rule.

Example rules

Example 1

This cleansing rule standardizes all occurrences of USA as a token.

string regex replacement case_sensitive matching_behavior
usa   USA N TOKEN

Column values

  • string - Match the entire string usa.

  • replacement - Replace the matching string with USA.

  • case_sensitive - Replace any occurrence of usa regardless of letter case. This includes usa, Usa, USa, and so on.

  • matching_behavior - TOKEN means only match when usa is an entire word (it is separated by white space). A string that includes usa with other characters; for example, thousand, would not be cleansed as thoUSAnd.

Example 2

This cleansing rule standardizes the legal entity Inc..

string regex replacement case_sensitive matching_behavior
  (, inc$| inc$) , Inc. N SUBSTRING

Column values

  • regex - Match any occurrence of ,inc or inc at the end of the string.

  • replacement - Replace the matching string with , Inc..

  • case_sensitive - Replace any occurrence of the matching substring regardless of letter case.

  • matching_behavior - SUBSTRING means match any occurrence of the substring.

Sequence of the rules

The sequence of the columns does not matter, but the sequence of the rules does matter. Depending on the sequence, you could get a different result.

Example

In this example, we want to cleanse the string n.a..

Dictionary example 1

Result: With this sequence of cleansing rules, the output is na.

Dictionary example 2

Result: With this sequence of cleansing rules, the output is an empty string.

Ensure that the rules are ordered so that you get the expected output.

Upload the file to Network

When you have created the data cleansing rules, upload the .csv file as a lookup table.

  1. In the Admin console, click Data Model > Lookup Tables.

  2. Click Create Lookup Table.

  3. Type the Table Name and Description, confirm whether the file contains third party data, and upload the .csv file.

    The file is validated to ensure that it complies with supported lookup table requirements, but the file is not checked to ensure that the mandatory data cleansing columns are included.

    For more information about lookup tables, see Create a lookup table.

  4. On the File Preview tab, review the data and click Create Table.

The __t suffix is automatically appended to the table. Remember to add the suffix when you reference the data cleansing dictionary file in the NEX rule.

Edit the dictionary

To change the data cleansing rules, download the lookup table. When the changes are complete you can re-upload the .csv file.

Apply the data cleansing function

The data cleansing function can be applied to a NEX Rule in a source subscription or a on data model field.

NEX rule syntax

CLEANSE_STRING(<field_name>, '<data_cleansing_dictionary_name>')
  • field_name - The field that will be cleansed.

  • data_cleansing_dictionary_name - The name of the dictionary that you added as a lookup table.

Example rule

[
    "company_original_name__c = company_name__c",
    "company_name__c = CLEANSE_STRING(company_name__c, 'company_name_dcd__t')"
]

This rule means that first the existing company name will be saved to a second field, company_original_name__c, and then cleanse the company name. The cleansed name will be stored in the company_name__c field.

To apply data cleansing for a field, add it to a source subscription or a data model field.

  • Source subscription - In the Network Expression Rules section, add the rule.

    You will get better match results if you cleanse and standardize the values of the fields that you use in matching (for example, fields like hospital names, company names, product names, and so on). To ensure that the fields are cleansed and standardized before matching, add the rule to the earliest possible stage (File Preparation or Transformation rule points).

  • Data model field - Expand the Country Visibility and Field Rules section and add the NEX rule.

Replacing special characters

You can remove special control characters like tab, returns, line feeds, and so on by replacing them with a defined string (for example, an empty string).

Considerations

  • All special characters must be defined in the regex column (not in the 'string' column).

  • You can replace any special characters supported by Java Regex; only encodings that are supported by Java RegEx can be used. For more information, see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.

  • All backslashes (\) must be escaped with another backslash (for example: "\\t" instead of "\t")

  • The value in the regex column must be in quotes (").

Nulling existing values

If the final outcome of the cleansing is an empty string, then the function will return a NULL value.

Example

This data cleansing rule expects to replace field values that are n/a with an empty string.

string,replacement
n/a,

With this cleansing rule, the data cleansing function will return NULL, not an empty string.

To nullify any existing field values through the data cleansing function in a source subscription, add the following property to the Advanced Mode:

    "feed.retain.null": "true"

Trimming white space

By default, the string cleansed by the function is trimmed; the whitespace on the left and right of the cleansed string is removed.

Delete the data cleansing dictionary

You can remove the dictionary from the Lookup Tables page. When you remove a data cleansing dictionary, you should also remove the NEX rule. Run time errors will occur for any NEX rule that references a deleted data cleansing dictionary.

Data cleansing errors

When you upload the .csv file as a lookup table, the data cleansing rules are not validated for consistency and completeness. If mandatory information is missing or if there are ambiguities, the data cleansing function will throw a run-time error with details about these issues.

Errors will occur in the following situations:

  • An expected column occurs more than once in the data cleansing dictionary (for example, two columns named string).

  • Values are present in both the string and regex columns.

  • The string and regex columns are both missing from the dictionary (at least one of them is required).

  • The replacement is missing from the dictionary.

  • The lookup table referenced by the data cleansing function does not exist.

  • Unsupported values are present in the columns.

Source subscription

When the data cleansing function is called and an error occurs, the errors display in the Job Error Log section on the Job Details page. The first five errors display.

In this example the following errors occurred:

  • Substring is spelled incorrectly.

  • The case_sensitive column contained an unsupported value (F). Only Y or N are supported.

  • A rule contained values in both the string and regex columns.