Skip to content

Distinct Value Set

The Distinct Value Set rule in data quality ensures that a specified data field contains only unique values. It verifies that each entry in the field is distinct, maintaining data accuracy and integrity. This rule helps prevent redundancy and ensures the dataset remains reliable.

Rule configurations

A value is marked as a success when it matches the expected distinct set and complies with the match type. If the value is unique and fits within the defined set, the rule is considered passed

Match type This configuration refers to the method used to compare values in a dataset. It defines how strict or flexible the comparison should be. It helps determine uniqueness and data matching rules.

Contained by

Contains

Equals

Expected distinct set This configuration specifies the set of distinct values that the data is expected to contain, helping to identify any discrepancies or missing values in the dataset.

Success criteria

The success criteria for the Distinct Value Set check in data quality is met when the number of distinct, non-duplicate values in the specified column matches or exceeds the threshold set in the Expected Distinct Set. If the count of unique values meets the success criteria, the column is flagged as “withinThreshold”, indicating that the data is in compliance with the rule’s requirements.

  • The success condition depends on how the Match Type is configured.

  • For example “Alice” should be distinct, meaning no two names in the database can have the same “Alice” value.

    Configuration fields

  • Operator options

    Greater than

    Less than

    Equal to

    Between (requires specifying a start and end range)

  • Operator Defines the comparison operation (Greater Than, Less Than, Equal To, or Between).

  • Value The threshold value used for success criteria. Required for Greater than, Less than, and Equal to operators.

  • Value range Required only when the Between operator is selected, specifying the start and end range.

  • Threshold type Indicates whether the Value or Value Range to be considered as percentage or an absolute count.

  • Allow null values Determines if null values are permitted.

Sample Input

IDCustomerCountry
1FallonGreat Britain
2FranklynFrance
3KathleenUnited States
4JudieFrance
5Etta

Sample rule configuration

  • Match type Contains
  • Expected distinct set Customer = Fallon,Kathleen,Judie Country =Brazil,United States
  • Case sensitive True

Sample success criteria configuration

  • Operator Greater than
  • Value 50%
  • Threshold type Absolute Count

alt text

Sample output

Column NameRule NameSuccess CountFailure CountWithin ThresholdNull Count
CustomerDistinct Value Set check32Yes0
CountryDistinct Value Set check14No1