Common
Objects H
Use this page to Register Objects.
To access this page:
Field |
Description |
Columns |
Click to specify columns to be analyzed for duplicate records. |
Results |
Click to view duplicate results for the object. |
BDD |
Click to open the Bulk Duplicate Detection page in System Administration to view details about the search tables used in duplicate detection. NOTE: A user must have access to System Administration to access this page. |
OBJECT |
Displays name of object being analyzed for duplicate records. |
Build |
Click to find duplicate records. |
RECORD COUNT |
Displays number of records returned by the duplicate detection process. |
CANDIDATES |
Displays count of potential duplicate records in the data source. |
Unresolved Candidates |
Displays count of duplicate candidates that are not yet resolved. Click to resolve duplicate candidates. |
RESOLVED CANDIDATES |
Displays count of resolved duplicates in the data source. |
PERCENT RESOLVED |
Displays percent of total duplicate candidates that have been resolved. |
Objects V
Use this page to Register Objects.
This page contains the following tabs:
General tab
Field |
Description |
Dictionaries |
Click to open the Dictionaries page in System Administration to add, edit and delete dictionary entries. Users must have access to System Administration to manage dictionaries. |
Stop List |
Click to open the Stop Lists page in System Administration to add, edit and delete stop list entries. Users must have access to System Administration to manage stop lists. |
BDD |
Click to open the Bulk Duplicate Detection page in System Administration to run the bulk duplicate detection administration process for the object. NOTE: Users must have access to System Administration to run duplicate detection. |
Object Settings |
|
Object |
Displays name of object being analyzed for duplicate records. |
View Name |
Displays name of object. Click to view details about the object. |
View |
Click to open the Objects page to view object’s data. |
Advanced Settings |
|
Search ID |
Displays ID of the search table that controls which record pairs in the source data are stored as a duplicate. The Stewardship Tier is delivered with a default search table, DSPCommon.ttDuplcate, that has been set up for the BDD process. NOTE: If a search table other than DSPCommon.ttDuplicate is to be used, it must be set up in System Administration. |
Non Searchable Characters |
Displays characters excluded from the duplicate detection search. |
Stop List ID |
Displays ID for list of words ignored during the duplicate detection search. Default value is managed in Configuration > Modules > Parameters-Duplicates. |
Search Threshold |
Displays level to ignore false positives. |
Duplicate Detection Threshold |
Displays weight percent of the calculated value for matched words. Words that match carry more weight than words that sound alike. Default value is managed in Configuration > Modules > Parameters-Duplicates. |
Synonym Weight |
Displays weight value of synonym matches. |
Sound Ex Weight |
Displays percentage of combined calculated value for words found within the search (number of words found plus the number of words that sound alike divided by the total number of words). Words that match carry more weight than words that sound alike. |
Custom Sound Ex Function ID |
Displays ID for custom SQL Server Sound Ex function. Selecting a custom function improves accuracy of duplicate detection, but consequently, decreases performance. |
Index Batch Size |
Displays number of records to process in one pass through the data. Default valus is 1000. |
Duplicate Detection Batch Size |
Displays number of records queued up in the duplicate detection process. This field allows a subset of large files to be processed and at the same time, limits the resources required. |
Word Ratio Threshold |
Displays number of words in each duplicate pair. A value less than 50% marks a duplicate value for removal. For example, if A has 10 words and B has 1 word, which matches on one of A’s words, then A-B matches 10%, but B-A matches 100%. This 100% is the false positive; the Word Ratio will remove this as a potential match. Default value is managed in Configuration > Modules > Parameters-Duplicates. |
Remove Blank Lines |
Click to remove blank lines from the HTML formatted output in the Candidates page. This action makes each object block smaller because white space is removed. When comparing objects with multiple lines, such as address data, multiple lines may cause the data to not line up on the page. If needed, the Remove Blank Lines check box can be disabled and the object can be re-built. Default value is managed in Configuration > Modules > Parameters-Duplicates. |
Unicode Separate Characters |
If enabled, Unicode characters (double-byte) are included in the duplicate detection process. |
Actions tab
Field |
Description |
Build |
Click to find duplicates. |
History |
Click to open the Object History page to view details on previous builds (i.e., duplicate searches) for the object. |
Reset Status |
Click to change the status of the build from Processing to Procedures Completed. Only reset status if the build process is aborted and the status is still processing |
Reset Results |
Click to remove all previous duplicates and non-duplicate results for the object. Button is only available for objects that returned duplicates (which display on the Results page). This action is not reversible; it is recommended to back up the drResultsDuplicate table before the results are reset. |
Post Process |
Click to continue running a stopped process. |
Duplicate Detection Results tab
Field |
Description |
Duplicate Detection Status |
Displays current status of the duplicate detection build process. |
Duplicate Detection Records |
Displays number of records to be processed. |
Duplicate Detection Processed |
Displays actual number of records processed at this point in time; value changes during processing. |
Duplicate Detection Queued |
Displays number of records that still need to be processed. |
Duplicate Detection Duplicates |
Displays number of records marked as duplicates. |
Duplicate Detection Execution Time |
Displays time to run the duplicate detection process. |