SQL Pattern Matching
The SQL pattern matching in data quality involves using SQL queries to identify and validate specific patterns in data, such as phone numbers, emails, or dates. By using operators like LIKE
, REGEXP
, or SIMILAR TO
, it ensures data accuracy, consistency, and integrity. This technique helps detect errors or inconsistencies, playing a key role in data cleansing to maintain high-quality datasets for analysis and decision-making.
Rule configurations
These rules are used to identify issues such as inconsistencies, inaccuracies, or missing data. One common type of rule is the SQL LIKE pattern, which specifies patterns. This rule helps check whether data entries conform to expected formats, such as matching specific text patterns, numbers, or structures, ensuring that the data meets predefined standards and improves overall data quality.
SQL like patterns
The SQL LIKE pattern is an operator used in SQL queries to search for a specified pattern in data. It uses wildcards like %
(any sequence of characters) and _
(a single character) to find partial matches in string values.
Success criteria
- The success condition depends on how the
Sql like pattern
is configured. - For example, using
LIKE 'a%'
would find all entries starting with the letter “a,” whileLIKE '%@gmail.com'
would match all email addresses ending with “@gmail.com.”
Configuration fields
-
Operator options
Greater than
Less than
Equal to
Between
(requires specifying a start and end range) -
Operator Defines the comparison operation (Greater Than, Less Than, Equal To, or Between).
-
Value The threshold value used for success criteria. Required for
Greater than
,Less than
, andEqual to
operators. -
Value range Required only when the
Between
operator is selected, specifying thestart
andend
range. -
Threshold type Indicates whether the
Value
orValue Range
to be considered as percentage or an absolute count. -
Allow null values Determines if null values are permitted.
-
Check for match Determines if data values align with predefined standards, formats, or reference values to ensure accuracy, consistency, and integrity
Sample Input
ID | Name | |
---|---|---|
1 | alice@gmail.com | Alice |
2 | bob@yahoo.com | Bob |
3 | charlie@gmail.com | Charlie |
4 | dave@hotmail.com | Dave |
5 | eve@gmail.com | Eve |
Sample rule configuration
- Sql like Pattern
SELECT * FROM users WHERE Email LIKE ’%@gmail.com’;
Sample success criteria configuration
- Operator Greater than
- Value 50%
- Threshold type Absolute Count
- Allow null values True
- Check for match False
Sample output
Column Name | Rule Name | Success Count | Failure Count | Within Threshold | Null Count |
---|---|---|---|---|---|
sql pattern matching check | 3 | 2 | Yes | 0 | |
Name | sql pattern matching check | 0 | 5 | No | 0 |