Mutation Detection¶
Current Document Location
This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.
By comparing the absolute change or relative percentage change of the same metric across two different time periods, it determines whether an anomaly has occurred. This method is commonly used to track metric peaks or fluctuations. When an anomaly is detected, it can generate event records more precisely for subsequent analysis and processing.
Suitable for monitoring short-term relative changes or change rates compared to long-term data. For example, setting the MySQL connection count metric to detect a percentage difference greater than 500% between the average of the last 15 minutes and the average of the past day. This indicates that if the average connection count in the last 15 minutes exceeds 5 times the average connection count of the past day, the system will trigger an alert.
It is recommended to use statistical functions such as average (avg), maximum (max), minimum (min) to calculate these metrics, rather than the last value (last) function, to reduce the impact of anomalous data and improve monitoring accuracy.
Detection Metric¶
Define the detection data source and aggregation method based on DQL (❗️ Please avoid selecting high-cardinality fields as detection dimensions. If configured improperly, overly lenient trigger conditions may lead to frequent alerts. The current query returns a maximum of 100,000 records).
Detect mutation anomalies by comparing metric data from two time periods:
Result = [Difference/Difference Percentage] of the detection metric between [Time Period A] and [Time Period B]
Configuration Elements¶
| Configuration Item | Description |
|---|---|
| Workspace | Defaults to the current workspace, can be switched to other authorized workspaces After authorization, you can use detection metrics from other workspaces under the current account to create monitors. Once the rule is created successfully, cross-workspace alert configuration can be achieved. Please note that when you select another workspace, the data type dropdown in the detection metric list will only display data types that have been authorized for use in the current workspace. |
| Data Source Type | Metrics, LOG, APM, RUM data, etc. |
| Query Method | Simple Query, Expression Query |
| Filter Conditions | Filter the data of the detection metric based on its tags to limit the data scope for detection; supports adding one or multiple tag filters; supports fuzzy match and fuzzy not-match filter conditions. |
| Aggregation Algorithm | Avg by (take average), Min by (take minimum), Max by (take maximum), Sum by (summation), Last (take last value), First by (take first value), Count by (count data points), Count_distinct by (count distinct data points), p50 (take median value), p75 (take value at 75th percentile), p90 (take value at 90th percentile), p99 (take value at 99th percentile) |
| Detection Dimension | Any string-type (keyword) field in the configuration data can be selected as a detection dimension. Currently, a maximum of three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined. The system will judge whether the statistical metric for a detection object meets the threshold of the trigger condition. If the condition is met, an event is generated.(For example, selecting detection dimensions host and host_ip, the detection object could be {host: host1, host_ip: 127.0.0.1}.) |
| Alias | Custom detection metric name |
Time Period Configuration¶
| Configuration Item | Description |
|---|---|
| Time Period A | The recent data time period used as the baseline |
| Time Period B | The historical data time period used for comparison |
| Comparison Method | Difference: The absolute difference between the two time periods (A - B) Difference Percentage: The relative change percentage ((A - B) / B × 100%) |
- Optional Time Periods:
| Time Period Type | Options |
|---|---|
| Historical Same Period | Last Month, Last Week, Yesterday, 1 Hour Ago, Compared to Previous Period, Last 15 Minutes, Last 30 Minutes, Last 1 Hour, Last 4 Hours, Last 12 Hours, Last 1 Day |
| Recent Time | Last 1 Minute, Last 5 Minutes, Last 15 Minutes, Last 30 Minutes, Last 1 Hour, Last 4 Hours, Last 12 Hours, Last 1 Day |
Note
For detection intervals "Yesterday" and "1 Hour Ago", the comparison is the difference or difference percentage of the detection metric within the same time range; other detection intervals compare the difference or difference percentage of the detection metric between two different time periods.
Click to view Query Method Details.
Detection Frequency¶
The execution frequency of the detection rule, automatically matching the larger time range between the two selected detection intervals.
- Default selection: 5 minutes
Trigger Conditions¶
Configure trigger conditions and severity levels. When the query result contains multiple values, an event is generated if any value meets the trigger condition.
Supports configuring four levels of thresholds: Fatal, Severe, Important, Warning, as well as Normal recovery conditions.
| Level | Configuration | Description |
|---|---|---|
| Fatal | When Result >= [Value] |
Highest level alert, requires immediate handling |
| Severe | When Result >= [Value] |
High level alert, requires priority handling |
| Important | When Result >= [Value] |
Medium level alert, requires attention |
| Warning | When Result >= [Value] |
Low level alert, requires notice |
| Normal | No events generated for [N] consecutive detections |
After the detection rule takes effect, if the data detection result changes from abnormal (Fatal, Severe, Important, Warning) back to normal within the configured number of custom detections, a recovery alert event is triggered. ❗️ Recovery alert events are not restricted by Alert Silence. If the number of detections for recovery alert events is not set, the alert event will not recover and will remain in the Events > Unrecovered Events List |
For more details, refer to Event Level Description.
Trigger Precondition¶
Enabled by default in the system. Serves as an entry threshold for mutation detection. Mutation detection rule judgment proceeds only when the detection value meets the threshold set by the precondition.
- Configuration format: Execute the following judgment only when the detection value for
[Time Period][Operator][Threshold] - Supported operators:
>,>=,<,<=(default selected>) - After disabling this configuration, the system will directly perform mutation detection rule judgment without setting an entry threshold.
Mutation Direction¶
Trigger an event when the mutation direction is [Direction]:
- Upward: Detects mutations where data increases (rises)
- Downward: Detects mutations where data decreases (drops)
- Upward or Downward: Detects mutations with bidirectional fluctuations (increase or decrease)
Bulk Alert Protection¶
Enabled by default in the system.
When the number of alerts generated in a single detection exceeds a preset threshold, the system automatically switches to a status summary strategy: Instead of processing each alert object individually, it generates a small number of summary alerts based on event status and pushes them.
This ensures notification timeliness while significantly reducing alert noise and avoiding timeout risks caused by processing too many alerts.
When this switch is enabled, subsequent Event Details generated by this monitor after detecting anomalies will not display historical records and related events.
Data Gap¶
Handling strategy when the query result for the detection metric is empty within the detection interval:
| Option | Description |
|---|---|
| Do Not Trigger Event (Default) | Links to the time range of the detection interval. Determines whether to generate an event based on the query results of the detection metric in the last several minutes. Suitable for scenarios where data gaps are allowed. |
| Treat Query Result as 0 | Links to the time range of the detection interval. Treats the query result of the detection metric in the last several minutes as 0, and re-compares it with the thresholds configured in the Trigger Conditions above to determine whether to trigger an abnormal event. |
| Custom Fill and Trigger Event | Supports custom filling of the detection interval value and triggers the following event types respectively: Data Gap Event, Fatal Event, Severe Event, Important Event, Warning Event, and Recovery Event. ❗️ When choosing this strategy, it is recommended that the configured custom data gap time be ≥ the time interval of the detection interval; if the configured time is ≤ the detection interval time, situations where both data gap and anomaly conditions are met may occur. In such cases, the data gap handling result will be applied first. |
When Trigger Conditions, Data Gap, and Information Generation are configured simultaneously, the triggering priority is judged as follows: Data Gap > Trigger Conditions > Information Event Generation.
That is: first judge whether there is a data gap, then judge whether the threshold is triggered, and finally judge whether to generate an information event.
Information Generation¶
After enabling this option, the system will write all detection results that do not match the above trigger conditions as "Information" events.
Suitable for scenarios where recording normal status changes or low-priority information is needed.
Subsequent Configuration¶
After completing the above detection configuration, please continue to configure:
-
Event Notification: Define event title, content, notification members, data gap handling, and associated incidents;
-
Alert Configuration: Select alert strategies, set notification targets, and mute periods;
-
Association: Associate dashboards for quick jump to view data;
-
Permissions: Set operation permissions to control who can edit/delete this monitor.