Grouping data for normalization
DM
Data sources can be in a de-normalized format, consisting of a small number of potentially wide files (containing a high number of columns). De-normalized files contain redundant data.
Group aliases enable you to create groups to effectively normalize the files. This is similar to using an SQL GROUP BY operation. You define groups by selecting attributes that arrange identical records into those groups.
For example, orange_book.csv contains Organization, Brand, and Product.
Organization | Brand | Product |
---|---|---|
Verteo | Cholecap | Cholecap Tab 10 mg |
Verteo | Cholecap | Cholecap Tab 20 mg |
Verteo | Restolar | Restolar Tab 1 mg |
WagnaMed | Acetaminophen | Acetaminophen Tab 10 mg |
Delbian | Acetaminophen | Acetaminophen Tab 10 mg |
To load the Organization and Brand entities from this feed, the file must be normalized into separate files for the organization and brands. Before Network Product Master 3.5.1, ETL pre-processing was required to normalize files into the Network data model prior to loading.
Normalization process
The following diagram illustrates the normalization process.
Module properties
The following module properties enable you to define the alias, attributes, and primary key for a group.
Module property | Description |
---|---|
"native.group.[File Alias].alias.[Group Alias]": "<attribute1, attribute2, ... >"
|
|
"native.key.[Group Alias]": "<attribute1, attribute2, ... >"
|
|
For example:
{ "feed.item.alias.ORANGE": "orange_book", "native.key.ORANGE": "Organization, Brand, Product", "native.group.ORANGE.alias.ORGANIZATION": "Organization", "native.key.ORGANIZATION": "Organization", "native.group.ORANGE.alias.BRAND": "Organization, Brand", "native.key.BRAND": "Organization, Brand" }
Model map
The process for modeling groups is similar to that for modeling individual files, with the following considerations:
- Groups appear as a pseudo-table.
- Attributes defined in the group alias are available as any other attributes.
[{ "entity": "ORGANIZATION", "from": "ORGANIZATION", "attributes": [ "ORGANIZATION.* AS *", "ORGANIZATION.VDM_* AS *" ], "customkeys": [{ "value": "ORGANIZATION.Applicant" }] }, { "entity": "BRAND", "from": "BRAND", "attributes": [ "BRAND.* AS *", "BRAND.VDM_* AS *" ], "customkeys": [{ "value": "CONCAT(BRAND.Applicant, BRAND.trade_Name)" }] }]
Modeling relationship objects
Modeling of parent relationships follows the same pattern with groups as with modeling them from files. There are, however, a few considerations to facilitate the loading process.
With respect to custom key definition:
- It is usually a concatenation of the keys for the parent and child entities.
- The custom key of a child entity (such as Brand) might also be a concatenation of the parent and child key.
- The use of a prefix or suffix in the custom key definition differentiates the custom key of the relationship object from that of the child entity.
Object | Custom key value definition |
---|---|
Brand | "value": "CONCAT(BRAND.Organization, BRAND.Brand)"
|
ParentOrganization | "value": "CONCAT('ParentOrganization', BRAND.Organization, BRAND.Brand)"
|
With respect to attributes:
- Objects are not loaded if no attributes exist in the result set of the model map (not including PK or FK).
- Most Product Master relationship objects do not have attributes to load beyond the parent and child definition.
To bypass the attribute check, add a dummy attribute. For example:
"attributes": [ "this_attribute_is_intentionally_blank__i = 'blank'", "BRAND.VDM_* AS *", ...
The following example shows a model map for the ParentOrganization object. It contains the relationship between the Organization and the Brand.
{ "entity": "PARENTORGANIZATION", "from": "BRAND join ORGANIZATION PARENT on BRAND.Applicant = PARENT.Applicant", "attributes": [ "this_attribute_is_intentionally_blank__i = 'blank'", "BRAND.VDM_* AS *", "BRAND.VDM_ENTITY_ID AS VDM_FKENTITY_ID", "PARENT.VDM_ENTITY_ID AS VDM_FKENTITY2_ID", "VDM_PARENT_ENTITY_TYPE = 'ORGANIZATION'", "VDM_ENTITY_ID = @NEW_ENTITY_ID" ], "parentkey": { "value": "PARENT.Applicant", "item": "ORGANIZATION" }, "customkeys": [{ "value": "CONCAT('ParentOrganization', BRAND.Applicant, BRAND.trade_Name)" }] }