Distinct Record Check
The Distinct Record Check rule verifies that a column contains the expected number of unique values.
This rule is commonly used to:
- Validate expected cardinality in categorical or lookup fields
- Ensure proper distribution of distinct values in key columns
- Maintain data quality standards for reporting and analysis
Example Usage:
- Check that a
Country
column contains exactly 25 unique country codes - Verify that a
Product Category
column has between 5 and 10 distinct values
Configuration Fields
Success Criteria Configuration
This section defines how the rule’s outcome is measured against expected thresholds.
Field Name | Description | Required | Options / Format |
---|---|---|---|
Operator | Comparison operation for distinct value count | Yes | GreaterThan , LessThan , EqualTo , Between |
Threshold Value | Value for comparison (single value for most operators) | Conditional | Number |
Threshold Min | Minimum value (for Between operator) | Conditional | Number |
Threshold Max | Maximum value (for Between operator) | Conditional | Number |
Is Percentage | Whether the threshold represents a percentage of total rows | No | true / false (default: false ) |
Allow Nulls | Whether null values should be counted as distinct values | No | true / false (default: false ) |
Sample Input Data
ProductID | Category | Region |
---|---|---|
1001 | Electronics | East |
1002 | Furniture | West |
1003 | Electronics | East |
1004 | Apparel | North |
1005 | Electronics | NULL |
Sample Configurations
Example 1: Exact Distinct Count
Configuration Field | Value |
---|---|
Column | Category |
Operator | EqualTo |
Threshold Value | 3 |
Is Percentage | false |
Allow Nulls | false |
Explanation:
Validates that the Category
column contains exactly 3 distinct values (Electronics
, Furniture
, Apparel
).
Example 2: Range-Based Distinct Count
Configuration Field | Value |
---|---|
Column | Region |
Operator | Between |
Threshold Min | 2 |
Threshold Max | 4 |
Is Percentage | false |
Allow Nulls | true |
Explanation:
Validates that the Region
column contains between 2 and 4 distinct values, counting NULL
as a distinct value if present.
Sample Output
Column Name | Rule Name | Success Count | Failure Count | Null Count | Within Threshold |
---|---|---|---|---|---|
Category | Distinct Record Check | 3 | 2 | 0 | Yes |
Region | Distinct Record Check | 3 | 1 | 1 | No |