Skip to content

SQL Pattern Matching

The SQL pattern matching in data quality involves using SQL queries to identify and validate specific patterns in data, such as phone numbers, emails, or dates. By using operators like LIKE, REGEXP, or SIMILAR TO, it ensures data accuracy, consistency, and integrity. This technique helps detect errors or inconsistencies, playing a key role in data cleansing to maintain high-quality datasets for analysis and decision-making.

Rule configurations

These rules are used to identify issues such as inconsistencies, inaccuracies, or missing data. One common type of rule is the SQL LIKE pattern, which specifies patterns. This rule helps check whether data entries conform to expected formats, such as matching specific text patterns, numbers, or structures, ensuring that the data meets predefined standards and improves overall data quality.

SQL like patterns The SQL LIKE pattern is an operator used in SQL queries to search for a specified pattern in data. It uses wildcards like % (any sequence of characters) and _ (a single character) to find partial matches in string values.

Success criteria

  • The success condition depends on how the Sql like pattern is configured.
  • For example, using LIKE 'a%' would find all entries starting with the letter “a,” while LIKE '%@gmail.com' would match all email addresses ending with “@gmail.com.”

Configuration fields

  • Operator options

    Greater than

    Less than

    Equal to

    Between (requires specifying a start and end range)

  • Operator Defines the comparison operation (Greater Than, Less Than, Equal To, or Between).

  • Value The threshold value used for success criteria. Required for Greater than, Less than, and Equal to operators.

  • Value range Required only when the Between operator is selected, specifying the start and end range.

  • Threshold type Indicates whether the Value or Value Range to be considered as percentage or an absolute count.

  • Allow null values Determines if null values are permitted.

  • Check for match Determines if data values align with predefined standards, formats, or reference values to ensure accuracy, consistency, and integrity

Sample Input

IDEmailName
1alice@gmail.comAlice
2bob@yahoo.comBob
3charlie@gmail.comCharlie
4dave@hotmail.comDave
5eve@gmail.comEve

Sample rule configuration

  • Sql like Pattern
    SELECT * FROM users WHERE Email LIKE ’%@gmail.com’;

Sample success criteria configuration

  • Operator Greater than
  • Value 50%
  • Threshold type Absolute Count
  • Allow null values True
  • Check for match False

alt text

Sample output

Column NameRule NameSuccess CountFailure CountWithin ThresholdNull Count
Emailsql pattern matching check32Yes0
Namesql pattern matching check05No0