Defining feature sets

AD
DM

Feature sets group features together to determine which comparisons Network should make for each pair of records. Feature sets also include a confidence level that is distinct from the confidence defined for features.

Confidence levels for feature sets determine whether records are considered a match based on the definition of the feature set. If the confidence level produced by the match comparison is equal to or greater than what is specified in the feature set, the record pair is considered a match. Otherwise, the record pair fails and, for the feature set, are not considered a match.

Define feature sets

Administrators define features from within a source subscription, ad hoc match configuration, or default match configuration. When you define feature sets, you select previously defined features.

  1. Click the Match Rules tab to view the current match rules configuration.
  2. To the right of the match rules header, click the Entity drop down list and select HCP or HCO to work with the match rules for that entity type.
  3. Click the + Add Feature Set link at the bottom of the list (or the + at the top right of the list) to add more feature sets.
  4. Type a descriptive name for the new feature set.
  5. Type feature names to include in the feature set. As you type, auto-complete options appear.
  6. Click the Enabled checkbox to include this feature set in the matching process.
  7. Type optional descriptive comments for the feature set and click the Done button to add the new feature set.
  8. Click the handle to the left of the feature set and drag it to the position and category (ACT or ASK) that it belongs to. This position indicates its treatment order within the list of feature sets.
  9. Click Save at the top of the page to save these changes to the subscription.

Define feature sets in Advanced mode

In Advanced mode, you would define the same feature sets as follows. Note that in Basic mode, the confidence levels are determined based on the ordering of the feature sets in the Feature Sets sections.

<featureSet>
   <name>names are identical and licenses match</name>
   <confidence>0.94</confidence>
   <feature>names are identical</feature>
   <feature>licenses match</feature>
</featureSet>
<featureSet>
   <name>ME is identical</name>
   <confidence>0.98</confidence>
   <feature>ME is identical</feature>
</featureSet>
<featureSet>
   <name>names are similar and address line 1 is the same</name>
   <confidence>0.87</confidence>
   <feature>names are similar</feature>
   <feature>address line 1 is the same</feature>
</featureSet>

Feature set definitions must contain the following elements:

  • <featureSet> and </featureSet> - These elements must be included as the first and last lines respectively.
  • <name> and </name> - These elements denote the name of the feature set. The name should detail what is being compared and how. It can be antyhing, but is referenced throughout matching, so should be logical.
  • <confidence> and </confidence> - These elements indicate the confidence level you want to assign to the feature set. A value of 0.9 or greater means that any record pair satisfying the feature set rules will be automatically merged by Network; they are an ACT pair. A value less than 0.9 but equal to or greater than 0.8 means that any record pair satisfying the feature set rules will become a suspect match and is sent to data stewards to review; they are an ASK pair. If no feature sets have a confidence level of 0.9 or greater, no automatic merges will take place during the load.
  • <feature> and </feature> - These elements reference a previously defined feature name. This name is case sensitive. Feature sets can contain multiple features, each listed within separate <feature> elements.