Defining features
DM
Features determine how individual field pairs are matched and can be uniquely assigned a match algorithm, match filter, confidence level, and null matching treatment. Multiple features make up a feature set.
Define features
To add a feature:
- Click the Match Rules tab to view the current match rules configuration.
- To the right of the match rules header, click the Entity drop-down list and select HCP or HCO to work with the match rules for that entity type.
- Click the Add Feature at the bottom of the list (or the + at the top right of the list) to add more features.
- Type a descriptive name for the new feature.
- In the Fields list, select the fields to include in the feature. As you type, auto-complete options appear.
To match on the parent HCO fields, select Compare fields from Parent HCO records. For example, the parent HCOs of John Smith can be compared during loading. This provides more flexibility for matching records.
- In the Apply Filters section, click +Add Filter to include or exclude values from specific fields in their match rules. For example, a license match rule can exclude DEA type licenses, or an HCO match rule can include a specific list of HCO types.
- Expand the Field list and select a data model field.
- Expand the Value list and select the specific values for the condition.
- In the Function list, choose either Include or Exclude.
- To create additional filters, click +Add Filter again and configure the filter.
Filters are supported only for the Direct field comparison method. Filters created for any other comparison method are ignored.
For more information, see Conditional matching.
- In the Comparison method drop-down list, select the comparison method to use for the feature. You can hover over the Help ? icon to see examples.
- In the Null value options drop-down list, select the method of treatment to use for null values. For more information, see Configuring null matching in features.
- In the Success criteria section, select an algorithm to use for the matching process. You can click Add Algorithm at the bottom of this section to add more. For more information about the available algorithms, see Leveraging algorithms for comparison.
- Click the Enabled checkbox to include this feature in the matching process.
- Type descriptive comments for the feature and click Done to add the new feature.
Note: If a feature does not appear in any feature set defined in the previous section, an information icon appears next to the left of the feature name.
Create features in Advanced mode
In Advanced Mode, you would define the same feature as follows.
<feature> <name>names are identical</name> <enabled>true</enabled> <comments></comments> <collate> <direct> <field>first_name__v</field> <field>last_name__v</field> <nullMatching>STRICT</nullMatching> <jaroWinklerComparison> <usingWinklerExtention>false</usingWinklerExtention> <usingLargeStringTolerance>false</usingLargeStringTolerance> <threshold>0.82</threshold> </jaroWinklerComparison> </direct> </collate> </feature>
Feature set definitions can contain the following elements (additional elements are described in the following samples):
<feature>
and</feature>
- These elements must be included as the first and last lines respectively.<name>
and</name>
- These elements must be included to denote the name of the feature. The name must match what is referenced from corresponding feature sets. It can be anything, but is referenced throughout matching, so should be logical.- This feature uses Jaro-Winkler to compare the strings, and includes the following options:
-
<usingWinklerExtention>
and<usingLargeStringTolerance>
- These elements loosen the match comparison and should be set to false for HCP matching. -
<threshold>
and</threshold>
- These elements determine the confidence level for the comparison. This feature has a threshold of 0.82, meaning any string pairing with a score of 0.82 or higher is considered a match.
-
<collate>
and</collate>
- These elements are required when matching should consider the combined results of multiple fields.<direct>
and</direct>
- These elements indicate that the incoming records must be compared by field, for example, first name to first name, and last name to last name.<field>
and</field>
- These elements surround field names used for the comparison; for example,<field>first_name__v</field>
. In this example,last_name__v
is included because comparing first names independently doesn't follow best practice for matching. The collated results of the first name and last name comparisons should be used.<nullMatching>
and</nullMatching>
- These elements determine how null values should be treated when records are compared.
Example: Names are similar
The following example uses a similar definition, but with a lower match threshold. Unlike the previous feature, this feature would find strings that are less closely matched. In this situation, lower confidence matches are not automatically merged, but become suspect matches.
This example looks like this in Advanced Mode:
<feature> <name>names are similar</name> <enabled>true</enabled> <comments></comments> <collate> <direct> <field>first_name__v</field> <field>last_name__v</field> <nullMatching>STRICT</nullMatching> <jaroWinklerComparison> <usingWinklerExtention>false</usingWinklerExtention> <usingLargeStringTolerance>false</usingLargeStringTolerance> <threshold>0.77</threshold> </jaroWinklerComparison> </direct> </collate> </feature>
Example: Licenses match
This example is similar to the previous one, but uses a set
collation instead of direct
, because the collation is being used to compare a sub-object: licenses, addresses, or parent HCOs. It uses Jaro-Winkler, but with a high threshold of 0.9, to eliminate over-matching.
This example looks like this in Advanced Mode:
<feature> <name>licenses match</name> <enabled>true</enabled> <comments></comments> <collate> <set> <field>licenses__v</field> <setIntersectionComparison> <collate> <direct> <field>license_number__v</field> <nullMatching>IGNORE</nullMatching> <jaroWinklerComparison> <usingWinklerExtention>false</usingWinklerExtention> <usingLargeStringTolerance>false</usingLargeStringTolerance> <threshold>0.9</threshold> </jaroWinklerComparison> </direct> </collate> </setIntersectionComparison> </set> </collate> </feature>
-
<set>
and</set>
- These elements are used for the collation of a sub-object. - Next, the particular sub- object is specified, in this case
licenses__v
. (The other options areaddresses__v
orparenthcos__v
.) <setIntersectionComparison>
- This element begins the section that identifies the fields from the sub-object that are to be compared and how. In this example, only thelicense_number__v
field is being compared, so a direct collation is used.
Example: Address matches
This example is similar to the license feature, but includes more fields for the direct comparison.
This example looks like this in Advanced mode:
<feature> <name>address matches</name> <enabled>true</enabled> <comments></comments> <collate> <set> <field>addresses__v</field> <setIntersectionComparison> <collate> <direct> <field>premise__v</field> <field>thoroughfare__v</field> <field>locality__v</field> <nullMatching>STRICT</nullMatching> <jaroWinklerComparison> <usingWinklerExtention>false</usingWinklerExtention> <usingLargeStringTolerance>false</usingLargeStringTolerance> <threshold>0.77</threshold> </jaroWinklerComparison> </direct> </collate> </setIntersectionComparison> </set> </collate> </feature>
Example: Comparing sets of fields
This feature uses a Cartesian collation. This collation is used when comparing sets of fields: specialties, credentials, emails, faxes, and so on. A Cartesian comparison compares all fields in the set to each other, instead of comparing just field to field.
Because specialty
fields are reference fields, their values are drawn from a fixed list of values. Equal comparison is used because you want an exact match. Other comparison methods would result in incorrect matching because you don't want similar entries to be considered the same.
In Advanced mode, a Cartesian collation is indicated by the <cartesian>
and </cartesian>
elements, following the <collation>
element.
<feature> <name>Specialties are identical</name> <enabled>true</enabled> <comments></comments> <collate> <cartesian> <field>specialty_1__v</field> <field>specialty_2__v</field> <field>specialty_3__v</field> <field>specialty_4__v</field> <field>specialty_5__v</field> <field>specialty_6__v</field> <field>specialty_7__v</field> <field>specialty_8__v</field> <field>specialty_9__v</field> <field>specialty_10__v</field> <nullMatching>IGNORE</nullMatching> <equalComparison/> </cartesian> </collate> </feature>