Transforming inbound data

AD
DM

Use transformation queries to pre-process data before it is processed and loaded during source subscription jobs.

Queries can also be used to post-process data before it is exported by target subscriptions. For more information, see Transforming outbound data.

Examples of inbound data transformations

Use transformation queries to pre-process inbound data.

Process the output of a report

Example: Run a report to find HCPs that are flagged for opt out, but that are not opted out yet. The output of that query would then be the input file for the source subscription to opt out these HCPs.
Create custom tables by loading data directly from source subscriptions

For detailed information, see Loading data into custom tables using transformation queries.
Process dynamic attributes from Veeva CRM

Example: This requires pivoting the data that is coming from CRM (converting rows into columns).
Simplify data feeds and model maps

Example: If a source system provides a single file containing many entities, split the data into individual files so the model map can be simplified.

For details, see the example scenario below.
Define field values based on certain conditions
Split or concatenate fields
Create lookup tables by loading data directly from source subscriptions

For detailed information, see Creating lookup tables using transformation queries.

How transformation queries work

Use transformation queries for the following activities:

Transform files before they are loaded by source subscriptions.
Use a report to generate the input for your source subscription.
Join incoming source files loaded by source subscriptions with tables (custom tables, reporting tables, lookup tables) from the reporting database.

Note: If you use the query to read from the reporting database, all applicable records from all countries and with all record states will be included in the inbound file; data visibility profiles are not considered and records are not filtered by their record state.

Query process overview

Transformation queries are SQL based (SELECT statements). Test them in the SQL Query Editor, create the query on the Transformation Query page, and then link them to source subscriptions.

Follow this process:

Upload source files as custom tables.

This enables you to test your query before you run it in a source subscription.
Write and test your query in the SQL Query Editor (Reports).
Create a transformation query (System Interfaces).

Copy and paste the tested SQL query from the SQL Query Editor into the transformation query configuration.
Link the transformation query to a source subscription.

When the source subscription runs, the following process occurs:
- Apply queries - Transformation queries pre-process the source files from the inbound FTP folder.
- Process files - The source files and query output file are processed by the source subscription, as defined by the source subscription, and the data is loaded into Network.

Source file requirements

Source files are created as temporary tables when they are read by transformation queries during the subscription job.

The file names and their column headers must comply with the following requirements:

Supported characters - File names and column headers can contain alphanumeric characters (0-9, a-z, A-Z) and underscore (_) characters only.
- File names must start with an alphabetic (a-z, A-Z) or underscore (_) character.
- Column headers can start with a number (0-9).
Number of characters - File names and column headers must be between 1 and 100 characters.
SQL keywords
- File names can be reserved SQL keywords (for example, order.csv and new.csv).
- Column headers can be reserved SQL keywords but they must be in double quotes("") in the transformation query. (for example, select "select" from test_table__csv).
File format - Files must use delimited formats. Files with fixed length formatting will be skipped.
File extensions - Delimited text files with any file extension are supported.

If the file does not have a .csv extension, you can use a file name alias in the tranformation query.

Example - Use delimited files with any file extension

The HCP source file has a .txt extension (for example, HCP.txt).

In the source subscription configuration, define the file name as HCP and define the alias as HCP_INPUT_FILE.

In the transformation query, refer to the source file by its alias (HCP_INPUT_FILE). Notice that the __csv file extension is required to distinguish this as source file; otherwise, it will be considered a reporting table name.
Headers - Column headers must be unique in each source file.
File names - Static or non-static file names are supported.

You can use file aliases to identify non-static file names in your query.

Example - Using non-static file names

The HCP source file that you want to process has a timestamp (for example, HCP_2023-01-30_04-45-07.csv) in its file name. In the source subscription configuration, define the file name as HCP* and define the alias as HCP_INPUT_FILE.

Aliases must be defined with uppercase letters.

In the transformation query, refer to the source file by its alias (HCP_INPUT_FILE) instead of its file name. Notice that the __csv file extension is required to distinguish the file from other regular table names in reporting.

Tip: File names and file aliases are not case-sensitive in transformation queries.

Validation

Source files are validated when the source subscription job runs.

If the column headers do not comply with the naming requirements, the job will fail.
Source files that uses a fixed length format are skipped by transformation queries.

Example data transformation scenario

We'll use the following example to walk through the steps for this feature.

Scenario

We need to split up a complex source file (pharma_data.csv) from our data provider, Pharma.

The file contains multiple product entities (Market Basket, Brand, Product) in one row. We'll use transformation queries to create individual files for each entity so the model map is simplified in the source subscription.

Example file

Upload source files as custom tables

Transformation queries for inbound files cannot be tested until runtime because the inbound table doesn't exist yet. To test the query to ensure that it is valid, create a custom table based on the source file (pharma_data.csv).

To create a custom table:

On the Network menu bar, click Reports > SQL Query Editor.
In the tree view, beside My Custom Tables, click Create > Custom Table.
On the Upload File step, define the required information and upload the sample file.
On the File Preview step, review the data. Click Create Table.
The table is added to the Custom Table section in the tree view.

Write and test the transformation query

In the SQL Query Editor (Reports), write the transformation query.

Use the custom table so the SQL syntax and field names can be validated and to ensure that it performs the data transformation as expected.

Example query

This query transforms the data in the incoming feed by creating a separate file for the Brand entity:

SELECT
        DISTINCT bid AS id,
        parent_mid AS parent_id,
        brand AS brand_v_name__c
    FROM
        pharma_data__ct

Note: Instead of the source file name, pharma_data.csv, the query uses the custom table name, pharma_data_ct, to ensure that the query is valid. When we create the transformation query in the next step, we'll replace the custom table name with the source file name or file name alias (defined in the source subscription).

Create a transformation query

After the SQL query is tested, create the transformation query.

For our example scenario, we'll create a transformation query for each entity (Market Basket, Brand, and Product) that we want to separate from the pharma_data.csv file.

In the Admin console, click System Interfaces > Transformation Queries.
Click Add Query
Type a Name and Description.

Using our example, we'll create the transformation query to separate the Brand entity data into its own file.
Choose the Inbound Query Type.

The Outbound query type is used to post-process data before it is exported using a target subscription.

Note: After the query is saved, the Query Type cannot be changed.
In the Query Output section, you can select CSV File, Custom Table, Lookup Table, or all options.

Example
- CSV File - If selected, type a File Name for the output file. The file name is automatically appended with .csv.
  
  File and table name support:
  - can contain numbers (0-9), letters (a-z, A-Z), and underscore (_) characters
  - must begin with a letter or underscore (_)
  - cannot contain more than 100 characters
  - cannot be any SQL keyword
  - cannot be reference.csv
- Custom Table - If selected, complete the following:
  - Type a Table Name and Table Description. The table name will automatically be appended with the __ct suffix.
  - Choose the folder where the table will be saved. If you have folders created in the Shared Folders category in the SQL Query Editor, they will display in this list.
    
    Transformation queries cannot be saved to personal folders (My Custom Tables) because the data is loaded through subscriptions, which are not applied to users.
  - Choose Table Name Options.
    - Static table name - The table name is always the same, so it means any existing custom table with the same name is replaced each time the query runs. For example: hco_table__ct
    - Add timestamp to table name - Append a timestamp to the table name. For example: hco_table_20221102t045341z__ct.
    - Add job ID to table name - Append the subscription job ID to the table name. For example: hco_table_15953__ct
- Lookup table - If selected, complete the following:
  - Table Name - It is automatically appended with the __t suffix.
    
    Table name support:
    - Supported characters: lowercase letters (a-z), numbers (0-9) and underscore (_) characters.
    - Maximum length: 100 characters.
    - The name cannot begin with a number.
    Note: Transformation queries replace lookup tables that have the same name. If you want to append or upsert data to an existing lookup table, you can define that logic in the transformation query.
  - Table Description - Define a meaningful description.
For our example, we'll choose the CSV File option and provide a name.

Note: The query output file will not be saved to the inbound FTP folder where the source subscription reads the source files. It can be downloaded from the Job Details page after the source subscription job runs.
Paste the tested SQL query into the query box.

Query considerations
- Table names - If you tested the query using a custom table, replace the __ct suffix on the table name with the __csv file extension if you want to pre-process a feed loaded by the source subscription.
  
  Note: If you use a file name alias (defined in the source subscription), the __csv extension is still required so it is identified as a file and not a reporting table name.
  
  To read directly from a table in the reporting database, do not add the __csv extension.
  
  Inbound queries can query from any table as long as it exists in reporting or ends with the __csv extension.
- Unique column names - The source file must contain unique column names. Identical column names are valid in the SQL Query Editor for reports but must be changed here so the transformation query can be saved.
- Record state - All record states are included in the results if the transformation query reads directly from the table in the reporting database. You can use a WHERE clause to filter records; for example, to only include valid records in the results, add WHERE record_state__v = VALID to your query.
- Number and date fields - When transformation queries process data that is loaded by the source subscription, all fields are internally treated as varchar fields. So, if your transformation query is intended to aggregate numbers or to do calculations on number or data fields, use the CAST function to convert the data type from varchar to the correct data type. This prevents issues because the export file processing uses varchar as the data type. For more information about CAST, see SQL functions in the Veeva Network Online Help.
- LIMIT clause - The LIMIT clause is not supported for transformation queries. If you want to limit the number of rows returned, wrap another query around the query that has the LIMIT clause.
Save your query.
When the transformation query is saved, Network validates the query; for example, to ensure that column names are unique, to check if the query contains the LIMIT clause, or if the table names and output files are valid. If warnings display, fix the issues and save your changes.

For our example, we'll also create transformation queries for the Market Basket and Product entities.

Market Basket query

SELECT
        DISTINCT mid AS id,
        marketbasket AS marketbasket_v_name__c
    FROM
        pharma_data__csv

Product query

SELECT
        DISTINCT pid AS id,
        product AS product_v_name__c,
        parent_bid AS parent_id
    FROM
        pharma_data__csv

Configure a source subscription

After the transformation query is created, add it to a source subscription so you can pre-process the inbound files.

Create or edit a source subscription

In the Admin console, click System Interfaces > Source Subscriptions.
Open an existing subscription or click Add Subscription to create a new one.

Note: Transformation queries are supported for Classic Mode only. If you create a subscription using Wizard Mode, you can add it to the subscription after the configuration is saved.
Job Run Outcome - Choose to commit or test the outcome of the job.
- Save Changes to the Database - The data will be committed to the database.
- Run Job in Test Mode (Default) - Test the subscription and transformation queries. The subscription job stops after the match stage so no changes are applied. You can review the job stats on the Job Details page.
  - Enable Simulation Mode - Test the subscription and transformation queries. Preview the changes to the production data. No changes are committed to the database. For details, see Simulating data updates.
Tip: First run the job in test or simulation mode to test the outcome. Then, run the job again and choose Save Changes to the Database to commit the data.
Details section
- Type a Name and Description for the subscription.
- System - Choose the source system.
- Status - The subscription is enabled by default.
Settings section - Accept all of the other default options in this section.

For detailed information about the settings in these sections see the Add a source subscription.
Source Files section
- Network FTP Path - Identify the inbound folder where the source files are located.
  
  Note: If the transformation query reads directly from the reporting database, an inbound folder is still required, but it will be empty.
- File Definitions - Click Add File to provide the following information about each file that you want to process through the source subscription.
  
  Note: Include all of the source files that will be processed by transformation queries (even if it isn't used by the model map) so Network knows which delimiter and text qualifier to use when parsing the file.
  - File name - The name of the file. Do not include the .csv extension.
  - Alias - The simplified name of the file that is used to reference it in the model map.
    
    Tip: Use the alias in the transformation query instead of the source file name. This enables you to transform delimited text files with any file extension and non-static file names. Add the __csv extension to the alias in the query so it is identified as a file and not a reporting table.
  - Key Columns - Type the keys from the source system, delimited by commas (,).
  - Text Qualifier: Select a qualifier to use for the beginning and end of text.
  - Format: Choose Delimited. Files with Fixed Length formats are not supported for data transformations; they will be skipped during the job
  - Delimiter: Select the delimiter that is used to separate text in the source file.
  - Header Row?: Select the checkbox to indicate that a comma separated list of headers exist for the header row.
Example file definitions

In this example, we've added definitions for the source file (pharma_data.csv) and a file for each of the output files for entities (Market Basket, Brand, Product) that we want created from the transformation queries.

If the output file is not used in the model map (for example, if the output file of the first transformation query is the input file for the next transformation query) the file definitions for the query output files are not mandatory.

If the file definitions for the output file are not defined, double quotes (") are used as the default text qualifier and comma (,) is used as the default delimiter.
Modelling & Normalization - Add a model map and any field normalizations. The transformation query output files are used in the model map; the original source file is not.

Example model map
Network Expression Rules - Add any NEX rules that you want to apply to the subscription job.
Match Configuration - Define any match rules by country for this subscription.
Transformation Queries section - Click Add Query to add a transformation query to the subscription. Only inbound queries display in the list.

You can apply multiple queries to each subscription. The queries will always run in sequence as they are listed in the subscription. Change the order of the queries using the Handle icon.

Example with .csv and custom table output

Note: If the custom table name appends a timestamp or job ID, a placeholder is added to the filename until the job runs.
Job Trigger Configuration - Optionally add a schedule and any emails or jobs that should be triggered for this subscription. For example, you can trigger a target subscription to export the data to your downstream system.
Advanced properties (optional) - Transformation queries make all column headers lower case. For example, if your source file has a column "AMS_id", then this column will change to "ams_id" when processed and outputted by a transformation query. This can be relevant for key columns since they are case sensitive by default. To avoid data loading issues, click Advanced Mode and add the following advanced property to force all incoming column headers to be lower case:
```
"parser.attribute.rules": "network:force_lowercase_attributes"
```
Save your changes.

If you ran the job in test mode or simulation mode, review the output. When you are confident that the transformation query and job are making the expected changes, run the job with the Save Changes to the Database setting on to commit the changes to the database

Data Flow View

After you save a source subscription, the Data Flow View is added to the Details section to visualize all of the steps of the job.

Click the Data Flow View thumbnail to open the view.

These stages of the job are defined:

Start - Indicates if the job has been triggered by another job.
Files in Inbound Folder - The file path of the inbound folder. Click the link to open the File Explorer in a new tab.
Apply Transformation Queries - Details about each query that is applied to the subscription. If there are multiple queries, they are listed in the order that they run. Click the query name to open the transformation query configuration. Click View Query to see the query. This step does not display if transformation queries are not applied.
Import Files - The file definitions for each file that is loaded. The Import Options section defines the options that are set in the source subscription configuration.
End - Indicates if this job triggers email notification or another job.

To return to the job configuration, click Back to Subscription Page.

Unsaved changes

If you have made changes to the source subscription configuration, the Data Flow View does not reflect those changes until the subscription is saved. A message displays if you open the view before saving the subscription.

Job details

After a source subscription completes, review the Job Details page.

Files Loaded Summary

This section displays the files that were processed during the job. The query output files are listed in this section but they are not added to the inbound FTP folder with the original source file. You can download the output files in the Transformation Queries section on the Job Details page.

Example

Transformation Queries

This section displays if one or more queries was applied to the job. If multiple transformation queries were applied, they display in the sequence that they ran.

Details

Query - The transformation query name.
Description - A description of the query.
Output File - The file created by the query. Click the file name to download it.
Number of Records - The number of records that were returned by the transformation query.
Query Duration - The runtime of the query during the job. For example, 2s means that the query runtime was 2 seconds.
View Query - Click to display a snapshot of the query that was applied when this job ran. It might be different than the query that is currently saved in Network. Viewing the query as it was at run-time can help you to troubleshoot any issues that might have occurred.

Job errors

Source subscriptions fail if the transformation query creates a runtime error because of an invalid or incorrect SQL statement or if the query input file does not comply with the naming conventions for file and column names. An error is logged to help you to troubleshoot.

Example

If the source file names have unsupported characters or duplicate column names, an Unknown column error occurs.

Troubleshooting errors

The following issues in source files and the subscription configuration can cause job errors.

Source files

A column name contains unsupported characters or words.
A source file name contains unsupported characters.
A source file contains duplicate column headers.

Source subscription configuration

The input file for the transformation query was not defined.
An incorrect delimiter was defined.

Saved transformation queries

After you save a transformation query, it is listed on the Transformation Queries page. Each query row contains the name, description, type (Inbound or Outbound), and the modified and created dates by user.

Edit transformation queries

Click the transformation query to open it and view the details or make changes. The Query Type and Code cannot be edited.

The Subscriptions section displays any source subscriptions that use that transformation query.

Delete transformation queries

If the query is no longer needed or was created by mistake, click the Delete icon to remove it from your Network instance.

The dialog displays any linked subscriptions . Click Delete to confirm.

When a query is deleted, any related source subscriptions are also updated to remove the link to the query.

Exporting configurations

Administrators can move transformation queries to target environments using export packages (Settings > Configuration Export). For example, you might create and test transformation queries in your Sandbox instance and then move them to your Production instance.

When you create a package, expand the Transformation Queries category and choose the queries that you want to add to the package.