Defining features

AD
DM

Features determine how individual field pairs are matched and can be uniquely assigned a match algorithm, confidence level, and null matching treatment. Multiple features make up a feature set.

Define features

Administrators define features from within a source subscription, ad hoc match configuration, or default match configuration.

  1. Click the Match Rules tab to view the current match rules configuration.
  2. To the right of the match rules header, click the Entity drop-down list and select HCP or HCO to work with the match rules for that entity type.
  3. Click the Add Feature link at the bottom of the list (or the + at the top right of the list) to add more features.
  4. Type a descriptive name for the new feature.
  5. In the Fields list, select the fields to include in the feature. As you type, auto-complete options appear.

    To match on the parent HCO fields, select Compare fields from Parent HCO records. For example, the parent HCOs of John Smith can be compared during loading. This provides more flexibility for matching records.

  6. In the Apply Filters section, click +Add Filter to include or exclude values from specific fields in their match rules. For example, a license match rule can exclude DEA type licenses, or an HCO match rule can include a specific list of HCO types.
    1. Expand the Field list and select a data model field.
    2. Expand the Value list and select the specific values for the condition.
    3. In the Function list, choose either Include or Exclude.
    4. To create additional filters, click +Add Filter again and configure the filter.

    For more information, see Conditional matching.

  7. In the Comparison method drop-down list, select the comparison method to use for the feature. You can hover over the Help ? icon to see examples.
  8. In the Null value options drop-down list, select the method of treatment to use for null values. For more information, see Configuring null matching in features.
  9. In the Success criteria section, select an algorithm to use for the matching process. You can click Add Algorithm at the bottom of this section to add more. For more information about the available algorithms, see Leveraging algorithms for comparison.
  10. Click the Enabled checkbox to include this feature in the matching process.
  11. Type descriptive comments for the feature and click Done to add the new feature.

Note: If a feature does not appear in any feature set defined in the previous section, an information icon appears next to the left of the feature name.

Create features in Advanced mode

In Advanced Mode, you would define the same feature as follows.

<feature>
    <name>names are identical</name>
    <enabled>true</enabled>
    <comments></comments>
    <collate>
        <direct>
            <field>first_name__v</field>
            <field>last_name__v</field>
            <nullMatching>STRICT</nullMatching>
            <jaroWinklerComparison>
                <usingWinklerExtention>false</usingWinklerExtention>
                <usingLargeStringTolerance>false</usingLargeStringTolerance>
                <threshold>0.82</threshold>
            </jaroWinklerComparison>
        </direct>
    </collate>
</feature>

Feature set definitions can contain the following elements (additional elements are described in the following samples):

  • <feature> and </feature> - These elements must be included as the first and last lines respectively.
  • <name> and </name> - These elements must be included to denote the name of the feature. The name must match what is referenced from corresponding feature sets. It can be anything, but is referenced throughout matching, so should be logical.
  • This feature uses Jaro-Winkler to compare the strings, and includes the following options:
    •  <usingWinklerExtention> and <usingLargeStringTolerance> - These elements loosen the match comparison and should be set to false for HCP matching.
    •  <threshold> and </threshold> - These elements determine the confidence level for the comparison. This feature has a threshold of 0.82, meaning any string pairing with a score of 0.82 or higher is considered a match.
  • <collate> and </collate> - These elements are required when matching should consider the combined results of multiple fields.
  • <direct> and </direct> - These elements indicate that the incoming records must be compared by field, for example, first name to first name, and last name to last name.
  • <field> and </field> - These elements surround field names used for the comparison; for example, <field>first_name__v</field>. In this example, last_name__v is included because comparing first names independently doesn't follow best practice for matching. The collated results of the first name and last name comparisons should be used.
  • <nullMatching> and </nullMatching> - These elements determine how null values should be treated when records are compared.

Example: Names are similar

The following example uses a similar definition, but with a lower match threshold. Unlike the previous feature, this feature would find strings that are less closely matched. In this situation, lower confidence matches are not automatically merged, but become suspect matches.

This example looks like this in Advanced Mode:

<feature>
    <name>names are similar</name>
    <enabled>true</enabled>
    <comments></comments>
    <collate>
        <direct>
            <field>first_name__v</field>
            <field>last_name__v</field>
            <nullMatching>STRICT</nullMatching>
            <jaroWinklerComparison>
                <usingWinklerExtention>false</usingWinklerExtention>
                <usingLargeStringTolerance>false</usingLargeStringTolerance>
                <threshold>0.77</threshold>
            </jaroWinklerComparison>
        </direct>
    </collate>
</feature>

Example: Licenses match

This example is similar to the previous one, but uses a set collation instead of direct, because the collation is being used to compare a sub-object: licenses, addresses, or parent HCOs. It uses Jaro-Winkler, but with a high threshold of 0.9, to eliminate over-matching.

This example looks like this in Advanced Mode:

<feature>
    <name>licenses match</name>
    <enabled>true</enabled>
    <comments></comments>
    <collate>
        <set>
            <field>licenses__v</field>
            <setIntersectionComparison>
                <collate>
                    <direct>
                        <field>license_number__v</field>
                        <nullMatching>IGNORE</nullMatching>
                        <jaroWinklerComparison>
                            <usingWinklerExtention>false</usingWinklerExtention>
                            <usingLargeStringTolerance>false</usingLargeStringTolerance>
                            <threshold>0.9</threshold>
                        </jaroWinklerComparison>
                    </direct>
                </collate>
            </setIntersectionComparison>
        </set>
    </collate>
</feature>
  • <set> and </set> - These elements are used for the collation of a sub-object.
  • Next, the particular sub- object is specified, in this case licenses__v. (The other options are addresses__v or parenthcos__v.)
  • <setIntersectionComparison> - This element begins the section that identifies the fields from the sub-object that are to be compared and how. In this example, only the license_number__v field is being compared, so a direct collation is used.

Example: Address matches

This example is similar to the license feature, but includes more fields for the direct comparison.

This example looks like this in Advanced mode:

<feature>
    <name>address matches</name>
    <enabled>true</enabled>
    <comments></comments>
    <collate>
        <set>
            <field>addresses__v</field>
            <setIntersectionComparison>
                <collate>
                    <direct>
                        <field>premise__v</field>
                        <field>thoroughfare__v</field>
                        <field>locality__v</field>
                        <nullMatching>STRICT</nullMatching>
                        <jaroWinklerComparison>
                            <usingWinklerExtention>false</usingWinklerExtention>
                            <usingLargeStringTolerance>false</usingLargeStringTolerance>
                            <threshold>0.77</threshold>
                        </jaroWinklerComparison>
                    </direct>
                </collate>
            </setIntersectionComparison>
        </set>
    </collate>
</feature>

Example: Comparing sets of fields

This feature uses a Cartesian collation. This collation is used when comparing sets of fields: specialties, credentials, emails, faxes, and so on. A Cartesian comparison compares all fields in the set to each other, instead of comparing just field to field.

Because specialty fields are reference fields, their values are drawn from a fixed list of values. Equal comparison is used because you want an exact match. Other comparison methods would result in incorrect matching because you don't want similar entries to be considered the same.

In Advanced mode, a Cartesian collation is indicated by the <cartesian> and </cartesian> elements, following the <collation> element.

<feature>
  <name>Specialties are identical</name>
  <enabled>true</enabled>
  <comments></comments>
  <collate>
    <cartesian>
      <field>specialty_1__v</field>
      <field>specialty_2__v</field>
      <field>specialty_3__v</field>
      <field>specialty_4__v</field>
      <field>specialty_5__v</field>
      <field>specialty_6__v</field>
      <field>specialty_7__v</field>
      <field>specialty_8__v</field>
      <field>specialty_9__v</field>
      <field>specialty_10__v</field>
      <nullMatching>IGNORE</nullMatching>
      <equalComparison/>
    </cartesian>
  </collate>
</feature>

Conditional matching

Within match rules, you can use conditional matching to include or exclude values from specific fields in their match rules. For example, a license match rule can exclude DEA type licenses, or an HCO match rule can include a specific list of HCO types. When using the Include filter, only records with that value in the specified field will be compared to each other using that rule. When using the Exclude filter, records with that value in the specified field will be excluded from the rule.

The Include or Exclude filter is applied to both incoming records and records in the instance and then the records are compared by the match rule. This functionality can be used for entities (HCPs and HCOs) and sub-objects (licenses, addresses, and parent HCOs).

The matching process uses exact match for defined filters, not similar match; for example, if the filter specifies John for a first name, the filter portion uses only exact matches for John.

Cartesian and concatenation collations cannot be used with conditional matching. Conditional matching can only be used for set collations and direct collations.

Conditional matching can be used in the basic match UI or in advanced mode.

Basic match UI

In the example below, a filter is applied to the "licenses match" rule so licenses that have the status Inactive, Status Unknown, or Unknown are excluded from the match rule.

Advanced mode

In Advanced mode, the INCLUDE and EXCLUDE filter must be in all upper case. If the <type> line is omitted, the default is INCLUDE.

Limiting criteria

The following examples illustrate how to create a match rule so it only applies to records that meet the criteria.

This example matches corporate names but only for records where values in the HCO type field equal 4:1 or 4:15.

<feature>
   <name>Corporate Names are the same for selected HCO types</name>
   <collate>
      <direct>
         <field>corporate_name__v</field>
         <filter>
            <field>hco_type__v</field>
            <value>4:1</value>
            <value>4:15</value>
            <type>INCLUDE</type>
         </filter>
         <nullMatching>IGNORE</nullMatching>
         <nGramComparison>
            <gramLength>2</gramLength>
            <threshold>0.7</threshold>
         </nGramComparison>
      </direct>
   </collate>
</feature>

The following example matches licenses where the type value is DEA:

<feature>
   <name>DEA licenses match</name>
   <collate>
      <set>
         <field>licenses__v</field>
         <filter>
            <field>type_value__v</field>
            <value>DEA</value>
            <type>INCLUDE</type>
         </filter>
         <setIntersectionComparison>
            <collate>
               <direct>
                  <field>license_number__v</field>
                  <equalComparison>
                     <compareAs>INTEGERS</compareAs>
                  </equalComparison>
               </direct>
            </collate>
         </setIntersectionComparison>
      </set>
   </collate>
</feature>

Excluding criteria

The following examples illustrate how to create a match rule to exclude records that meet the criteria. This example matches on the corporate name but only for records where the values in the HCO type field are not equal to 4:1 or 4:15.

   <feature>
      <name>Corporate Names are the same for HCO types not equal
            to 4:1 or 4:15</name>
      <collate>
         <direct>
            <field>corporate_name__v</field>
            <filter>
               <field>hco_type__v</field>
               <value>4:1</value>
               <value>4:15</value>
               <type>EXCLUDE</type>
            </filter>
            <nullMatching>IGNORE</nullMatching>
            <nGramComparison>
               <gramLength>2</gramLength>
               <threshold>0.7</threshold>
            </nGramComparison>
         </direct>
      </collate>
   </feature>

This example excludes Puerto Rican addresses within this address matching rule:

<feature>
  <name>address matches but excludes Puerto Rico</name>
  <collate>
    <set>
      <field>addresses__v</field>
      <filter>
        <field>country__v</field>
        <value>PR</value>
        <type>EXCLUDE</type>
      </filter>
      <setIntersectionComparison>
        <collate>
          <direct>
            <field>premise__v</field>
            <field>thoroughfare__v</field>
            <field>locality__v</field>
            <jaroWinklerComparison>
              <threshold>0.80</threshold>
              <usingWinklerExtention>false</usingWinklerExtention>
              <usingLargeStringTolerance>false</usingLargeStringTolerance>
            </jaroWinklerComparison>
          </direct>
        </collate>
      </setIntersectionComparison>
    </set>
  </collate>
</feature>

Using multiple filters

More than one filter can be used in conditional matching. If multiple filters are used, they are all considered AND conditions, not OR conditions.

In the example below, two filters are applied. Records are considered if the HCO type field (hco_type__v) equals any of the specified values (for example, 4:15, 11:2) AND the specialty field (specialty_1__v ) does not equal IM or ACU.

<field>corporate_name__v</field> 
<filter>
<field>hco_type__v</field>
<value>4:15</value>
<value>11:2</value>
<value>11:3</value>
<value>13:1</value>
<value>13:3</value>
<type>INCLUDE</type>
</filter>
<filter>
<field>specialty_1__v</field>
<value>ACU</value>
<value>IM</value>
<type>EXCLUDE</type>
</filter>