Distinct Record Check
The Distinct Record Check rule verifies that a column contains the expected number of unique values.
This rule is commonly used to:
- Validate expected cardinality in categorical or lookup fields
- Ensure proper distribution of distinct values in key columns
- Maintain data quality standards for reporting and analysis
Example Usage:
- Check that a
Countrycolumn contains exactly 25 unique country codes - Verify that a
Product Categorycolumn has between 5 and 10 distinct values
Configuration Fields
Success Criteria Configuration
This section defines how the rule’s outcome is measured against expected thresholds.
| Field Name | Description | Required | Options / Format |
|---|---|---|---|
| Operator | Comparison operation for distinct value count | Yes | GreaterThan, LessThan, EqualTo, Between |
| Threshold Value | Value for comparison (single value for most operators) | Conditional | Number |
| Threshold Min | Minimum value (for Between operator) | Conditional | Number |
| Threshold Max | Maximum value (for Between operator) | Conditional | Number |
| Is Percentage | Whether the threshold represents a percentage of total rows | No | true / false (default: false) |
| Allow Nulls | Whether null values should be counted as distinct values | No | true / false (default: false) |
Sample Input Data
| ProductID | Category | Region |
|---|---|---|
| 1001 | Electronics | East |
| 1002 | Furniture | West |
| 1003 | Electronics | East |
| 1004 | Apparel | North |
| 1005 | Electronics | NULL |
Sample Configurations
Example 1: Exact Distinct Count
| Configuration Field | Value |
|---|---|
| Column | Category |
| Operator | EqualTo |
| Threshold Value | 3 |
| Is Percentage | false |
| Allow Nulls | false |
Explanation:
Validates that the Category column contains exactly 3 distinct values (Electronics, Furniture, Apparel).
Example 2: Range-Based Distinct Count
| Configuration Field | Value |
|---|---|
| Column | Region |
| Operator | Between |
| Threshold Min | 2 |
| Threshold Max | 4 |
| Is Percentage | false |
| Allow Nulls | true |
Explanation:
Validates that the Region column contains between 2 and 4 distinct values, counting NULL as a distinct value if present.
Sample Output
| Column Name | Rule Name | Success Count | Failure Count | Null Count | Within Threshold |
|---|---|---|---|---|---|
| Category | Distinct Record Check | 3 | 2 | 0 | Yes |
| Region | Distinct Record Check | 3 | 1 | 1 | No |