Grouping data for normalization

AD
DM

Data sources can be in a de-normalized format, consisting of a small number of potentially wide files (containing a high number of columns). De-normalized files contain redundant data.

Group aliases enable you to create groups to effectively normalize the files. This is similar to using an SQL GROUP BY operation. You define groups by selecting attributes that arrange identical records into those groups.

For example, orange_book.csv contains Organization, Brand, and Product.

Organization Brand Product
Verteo Cholecap Cholecap Tab 10 mg
Verteo Cholecap Cholecap Tab 20 mg
Verteo Restolar Restolar Tab 1 mg
WagnaMed Acetaminophen Acetaminophen Tab 10 mg
Delbian Acetaminophen Acetaminophen Tab 10 mg

To load the Organization and Brand entities from this feed, the file must be normalized into separate files for the organization and brands. Before Network Product Master 3.5.1, ETL pre-processing was required to normalize files into the Network data model prior to loading.

Normalization process

The following diagram illustrates the normalization process.

Module properties

The following module properties enable you to define the alias, attributes, and primary key for a group.

Module property Description
"native.group.[File Alias].alias.[Group Alias]": "<attribute1, attribute2, ... >"
  • Defines the name of the group alias
  • Defines the attributes that uniquely identify the alias
"native.key.[Group Alias]": "<attribute1, attribute2, ... >"
  • Defines the primary key of the group

For example:

{
   "feed.item.alias.ORANGE": "orange_book",
   "native.key.ORANGE": "Organization, Brand, Product",

   "native.group.ORANGE.alias.ORGANIZATION": "Organization",
   "native.key.ORGANIZATION": "Organization",
			
   "native.group.ORANGE.alias.BRAND": "Organization, Brand",
   "native.key.BRAND": "Organization, Brand"
}

Model map

The process for modeling groups is similar to that for modeling individual files, with the following considerations:

  • Groups appear as a pseudo-table.
  • Attributes defined in the group alias are available as any other attributes.
[{
   "entity": "ORGANIZATION",
   "from": "ORGANIZATION",
   "attributes": [
      "ORGANIZATION.* AS *",
      "ORGANIZATION.VDM_* AS *"
   ],
   "customkeys": [{
      "value": "ORGANIZATION.Applicant"
   }]
}, {
   "entity": "BRAND",
   "from": "BRAND",
   "attributes": [
      "BRAND.* AS *",
      "BRAND.VDM_* AS *"
   ],
   "customkeys": [{
      "value": "CONCAT(BRAND.Applicant, BRAND.trade_Name)"
   }]
}]

Modeling relationship objects

Modeling of parent relationships follows the same pattern with groups as with modeling them from files. There are, however, a few considerations to facilitate the loading process.

With respect to custom key definition:

  • It is usually a concatenation of the keys for the parent and child entities.
  • The custom key of a child entity (such as Brand) might also be a concatenation of the parent and child key.
  • The use of a prefix or suffix in the custom key definition differentiates the custom key of the relationship object from that of the child entity.
Object Custom key value definition
Brand "value": "CONCAT(BRAND.Organization, BRAND.Brand)"
ParentOrganization "value": "CONCAT('ParentOrganization', BRAND.Organization, BRAND.Brand)"

With respect to attributes:

  • Objects are not loaded if no attributes exist in the result set of the model map (not including PK or FK).
  • Most Product Master relationship objects do not have attributes to load beyond the parent and child definition.

To bypass the attribute check, add a dummy attribute. For example:

"attributes": [
   "this_attribute_is_intentionally_blank__i = 'blank'",
   "BRAND.VDM_* AS *", ...

The following example shows a model map for the ParentOrganization object. It contains the relationship between the Organization and the Brand.

{
   "entity": "PARENTORGANIZATION",
   "from": "BRAND join ORGANIZATION PARENT on BRAND.Applicant = PARENT.Applicant",
   "attributes": [
      "this_attribute_is_intentionally_blank__i = 'blank'",
      "BRAND.VDM_* AS *",
      "BRAND.VDM_ENTITY_ID AS VDM_FKENTITY_ID",
      "PARENT.VDM_ENTITY_ID AS VDM_FKENTITY2_ID",
      "VDM_PARENT_ENTITY_TYPE = 'ORGANIZATION'",
      "VDM_ENTITY_ID = @NEW_ENTITY_ID"
   ],
   "parentkey": {
      "value": "PARENT.Applicant",
      "item": "ORGANIZATION"
   },
   "customkeys": [{
      "value": "CONCAT('ParentOrganization', BRAND.Applicant, BRAND.trade_Name)"
   }]
}