Column Uniqueness Check
The Column Uniqueness Check rule ensures that all values in a specified column are distinct within a dataset.
This rule is commonly used to:
- Validate primary key or identifier columns
- Ensure unique fields like email addresses, SKUs, or serial numbers are not duplicated
- Maintain data integrity for critical business attributes
Example Usage:
- Ensure all
ProductCode
values are distinct - Verify
SerialNumber
is unique for each product entry - Confirm
Email
addresses have no duplicates in a user database
Configuration Fields
Success Criteria Configuration
This section defines how the rule’s outcome is measured against expected thresholds.
Field Name | Description | Required | Options / Format |
---|---|---|---|
Operator | Comparison operation for the unique value count | Yes | GreaterThan , LessThan , EqualTo , Between |
Threshold Value | Value for comparison (single value for most operators) | Conditional | Number |
Threshold Min | Minimum value (for Between operator) | Conditional | Number |
Threshold Max | Maximum value (for Between operator) | Conditional | Number |
Is Percentage | Whether the threshold represents a percentage of total rows | No | true / false (default: false ) |
Allow Nulls | Whether null values should count as unique | No | true / false (default: false ) |
Sample Input Data
ID | ProductCode | SerialNumber |
---|---|---|
1 | PC-100 | SN-001 |
2 | PC-101 | SN-001 |
3 | PC-100 | SN-002 |
4 | PC-102 | NULL |
5 | PC-103 | NULL |
6 | NULL | SN-003 |
Sample Configurations
Example 1: Strict Uniqueness Check
Configuration Field | Value |
---|---|
Column | ProductCode |
Operator | EqualTo |
Threshold Value | 4 |
Is Percentage | false |
Allow Nulls | false |
Explanation:
Validates that the ProductCode
column contains exactly 4 unique values (PC-100, PC-101, PC-102, PC-103). Null values are treated as non-unique.
Example 2: Percentage-Based Uniqueness Check
Configuration Field | Value |
---|---|
Column | SerialNumber |
Operator | GreaterThan |
Threshold Value | 50 |
Is Percentage | true |
Allow Nulls | true |
Explanation:
Ensures that over 50% of SerialNumber
values are unique, with null values being considered unique.
Sample Output
Column Name | Rule Name | Success Count | Failure Count | Null Count | Within Threshold |
---|---|---|---|---|---|
ProductCode | Column Uniqueness Check | 3 | 2 | 1 | No |
SerialNumber | Column Uniqueness Check | 4 | 2 | 0 | Yes |