Common

Objects H

Objects V

Use this page to Register Objects.

To access this page:

  1. Click Common > Analyze in the Navigation pane.
  2. Click the Duplicates icon for a data source.

Field

Description

Columns

Click to specify columns to be analyzed for duplicate records.

Results

Click to view duplicate results for the object.

BDD

Click to open the Bulk Duplicate Detection page in System Administration to view details about the search tables used in duplicate detection.

NOTE: A user must have access to System Administration to access this page.

OBJECT

Displays name of object being analyzed for duplicate records.

Build

Click to find duplicate records.

RECORD COUNT

Displays number of records returned by the duplicate detection process.

CANDIDATES

Displays count of potential duplicate records in the data source.

Unresolved Candidates

Displays count of duplicate candidates that are not yet resolved. Click to resolve duplicate candidates.

RESOLVED CANDIDATES

Displays count of resolved duplicates in the data source.

PERCENT RESOLVED

Displays percent of total duplicate candidates that have been resolved.

Objects V

Objects H

Use this page to Register Objects.

This page contains the following tabs:

General tab

Field

Description

Dictionaries

Click to open the Dictionaries page in System Administration to add, edit and delete dictionary entries. Users must have access to System Administration to manage dictionaries.

Stop List

Click to open the Stop Lists page in System Administration to add, edit and delete stop list entries. Users must have access to System Administration to manage stop lists.

BDD

Click to open the Bulk Duplicate Detection page in System Administration to run the bulk duplicate detection administration process for the object.

NOTE: Users must have access to System Administration to run duplicate detection.

Object Settings

Object

Displays name of object being analyzed for duplicate records.

View Name

Displays name of object. Click to view details about the object.

View

Click to open the Objects page to view object’s data.

Advanced Settings

Search ID

Displays ID of the search table that controls which record pairs in the source data are stored as a duplicate. The Stewardship Tier is delivered with a default search table, DSPCommon.ttDuplcate, that has been set up for the BDD process.

NOTE: If a search table other than DSPCommon.ttDuplicate is to be used, it must be set up in System Administration.

Non Searchable Characters

Displays characters excluded from the duplicate detection search.

Stop List ID

Displays ID for list of words ignored during the duplicate detection search. Default value is managed in Configuration > Modules > Parameters-Duplicates.

Search Threshold

Displays level to ignore false positives.

Duplicate Detection Threshold

Displays weight percent of the calculated value for matched words. Words that match carry more weight than words that sound alike. Default value is managed in Configuration > Modules > Parameters-Duplicates.

Synonym Weight

Displays weight value of synonym matches.

Sound Ex Weight

Displays percentage of combined calculated value for words found within the search (number of words found plus the number of words that sound alike divided by the total number of words). Words that match carry more weight than words that sound alike.

Custom Sound Ex Function ID

Displays ID for custom SQL Server Sound Ex function. Selecting a custom function improves accuracy of duplicate detection, but consequently, decreases performance.

Index Batch Size

Displays number of records to process in one pass through the data. Default valus is 1000.

Duplicate Detection Batch Size

Displays number of records queued up in the duplicate detection process. This field allows a subset of large files to be processed and at the same time, limits the resources required.

Word Ratio Threshold

Displays number of words in each duplicate pair. A value less than 50% marks a duplicate value for removal. For example, if A has 10 words and B has 1 word, which matches on one of A’s words, then A-B matches 10%, but B-A matches 100%. This 100% is the false positive; the Word Ratio will remove this as a potential match. Default value is managed in Configuration > Modules > Parameters-Duplicates.

Remove Blank Lines

Click to remove blank lines from the HTML formatted output in the Candidates page. This action makes each object block smaller because white space is removed. When comparing objects with multiple lines, such as address data, multiple lines may cause the data to not line up on the page. If needed, the Remove Blank Lines check box can be disabled and the object can be re-built. Default value is managed in Configuration > Modules > Parameters-Duplicates.

Unicode Separate Characters

If enabled, Unicode characters (double-byte) are included in the duplicate detection process.

Actions tab

Field

Description

Build

Click to find duplicates.

History

Click to open the Object History page to view details on previous builds (i.e., duplicate searches) for the object.

Reset Status

Click to change the status of the build from Processing to Procedures Completed. Only reset status if the build process is aborted and the status is still processing

Reset Results

Click to remove all previous duplicates and non-duplicate results for the object. Button is only available for objects that returned duplicates (which display on the Results page). This action is not reversible; it is recommended to back up the drResultsDuplicate table before the results are reset.

Post Process

Click to continue running a stopped process.

Duplicate Detection Results tab

Field

Description

Duplicate Detection Status

Displays current status of the duplicate detection build process.

Duplicate Detection Records

Displays number of records to be processed.

Duplicate Detection Processed

Displays actual number of records processed at this point in time; value changes during processing.

Duplicate Detection Queued

Displays number of records that still need to be processed.

Duplicate Detection Duplicates

Displays number of records marked as duplicates.

Duplicate Detection Execution Time

Displays time to run the duplicate detection process.