Description
The Group Longtail Values activity helps streamline datasets by consolidating lesser-used, low-frequency, or non-priority values in a column into a single replacement value.
It is often used to reduce category fragmentation and simplify downstream analysis by focusing on the most relevant or allowed values and grouping the remaining entries under a common label (e.g., Others).
Use this activity to:
- Clean and normalize long-tail categorical data
- Replace values not in an allow-list with a defined label
- Focus analysis on key brands, categories, or terms
- Minimize noise from low-frequency entries in visualization or reporting
Use case:
A dataset contains numerous product brands, many of which appear only once or twice. To improve chart readability, you can group all brands not in the top 3 (Apple, Samsung, Google) as Others, using this activity before visualizing brand performance.
| Input Type | Description |
|---|
| Data | Input dataset to transform |
Output
| Output Type | Format | Description |
|---|
| Data | Table | Transformed data with grouped values |
Configuration Fields
| Field Name | Description |
|---|
| Column Name | The name of the column where longtail values should be grouped. |
| Allow List | List of allowed values. Any value in the column not in this list will be replaced. |
| Replacement Value | The value used to replace entries not in the allow list (e.g., Others, Misc, Unknown). |
| product_id | product_name | brand_names |
|---|
| P001 | Smartphone | Apple, Samsung, Google |
| P002 | Laptop | Dell, HP, Lenovo |
| P003 | Headphones | Bose, Sony, Sennheiser |
| P004 | TV | LG, Samsung, Sony |
| P005 | Smartwatch | Fitbit, Garmin, Apple |
Sample Configuration
| Field | Value |
|---|
columnName | product_name |
allowList | Smartphone, Headphones |
replacementValue | Others |
Sample Output
| product_id | product_name | brand_names |
|---|
| P001 | Smartphone | Apple, Samsung, Google |
| P002 | Others | Dell, HP, Lenovo |
| P003 | Headphones | Bose, Sony, Sennheiser |
| P004 | Others | LG, Samsung, Sony |
| P005 | Others | Fitbit, Garmin, Apple |
In the above example, only Smartphone and Headphones were part of the allow list. All other values in the product_name column were replaced with Others.