Skip to content

Mutation Detection


By comparing the absolute change or relative percentage change of the same metric in two different time periods, it can determine whether an anomaly has occurred. This method is often used to track peaks or fluctuations in metrics. When an anomaly is detected, it can generate event records more accurately for subsequent analysis and processing.

Use Cases

Mutation detection is suitable for monitoring short-term relative changes or change rates compared to long-term data. For example, setting the MySQL connection count metric to a percentage difference greater than 500% between the last 15 minutes and the past day's average means that if the average connection count in the last 15 minutes exceeds 5 times the average connection count of the past day, the system will trigger an alert.

It is recommended to use statistical functions such as average (AVG), maximum (MAX), minimum (MIN) to calculate these metrics, rather than using the last value (LAST) function, to reduce the impact of abnormal data and improve monitoring accuracy.

Detection Configuration

Detection Metrics

The metric data being monitored. It can compare the difference or percentage difference of this metric in two time periods.

Field Description
Data Type The data type of the current detection rule.
Measurement The measurement where the current detection metric is located.
Metric The metric targeted by the current detection.
Aggregation Algorithm Includes Avg by (average), Min by (minimum), Max by (maximum), Sum by (sum), Last (last value), First by (first value), Count by (data point count), Count_distinct by (unique data point count), p50 (median value), p75 (value at the 75% position), p90 (value at the 90% position), p99 (value at the 99% position).
Detection Dimension Any string type (keyword) field in the configuration data can be selected as a detection dimension. Currently, up to three fields can be selected as detection dimensions. By combining multiple detection dimension fields, a specific detection object can be determined, TrueWatch will determine whether the statistical metric of a certain detection object meets the threshold of the trigger condition. If the condition is met, an event is generated.
(For example, selecting detection dimensions host and host_ip, the detection object can be {host: host1, host_ip: 127.0.0.1}.)
Filter Condition Filters the data of the detection metric based on the tags of the metric, limiting the data range of the detection; supports adding one or more tag filters; supports fuzzy matching and fuzzy non-matching filter conditions.
Alias Custom detection metric name.
Query Method Supports simple query and expression query.

The time periods that can be selected for detection intervals include last month, last week, yesterday, 1 hour ago, compared to the previous period, last 15 minutes, last 30 minutes, last 1 hour, last 4 hours, last 12 hours, and last 1 day.

Note

The detection intervals "yesterday" and "one hour ago" compare the difference or percentage difference of the detection metric within the same time range, while other detection intervals compare the difference or percentage difference of the detection metric in two time periods.

Cross-Workspace Query Metrics

After authorization, detection metrics from other workspaces under the current account can be selected. After the monitor rule is successfully created, cross-workspace alert configuration can be achieved.

Note

After selecting another workspace, the detection metric dropdown options will only display the data types that have been authorized in the current workspace.

Detection Frequency

The execution frequency of the detection rule, automatically matching the larger time range of the two detection intervals selected by the user. Default is 5 minutes.

Trigger Conditions

Set the trigger conditions for alert levels: You can configure any one of the trigger conditions for emergency, important, warning, data gap, and information:

  1. Trigger precondition configuration: Enabled by default; when the detection value meets the threshold set by the trigger precondition (operators supported are >, >=, <, <=, default is >), the mutation detection rule judgment continues; disabling this configuration only performs the mutation detection rule judgment;

  2. Mutation rule configuration: Mutation upward (data increase), downward (data decrease), upward or downward, three forms of data comparison, to perform the mutation detection rule judgment.

Configure the trigger conditions and severity level. When the query result is multiple values, any value that meets the trigger condition will generate an event.

For more details, refer to Event Level Description.

Alert Levels
  1. Alert Level Emergency (red), Important (orange), Warning (yellow): Based on the configured condition judgment operators.

  2. Alert Level Normal (green): Based on the configured detection count, explained as follows:

    • Each execution of a detection task is counted as 1 detection, e.g., detection frequency = 5 minutes, then 1 detection = 5 minutes;
    • The detection count can be customized, e.g., detection frequency = 5 minutes, then 3 detections = 15 minutes.
    Level Description
    Normal After the detection rule takes effect, if emergency, important, or warning abnormal events are generated, and the data detection result returns to normal within the configured custom detection count, a recovery alert event is generated.
    ⚠ Recovery alert events are not subject to Alert Silence restrictions. If the recovery alert event detection count is not set, the alert event will not recover and will always appear in the Events > Unrecovered Events List.

Data Gap

For data gap status, seven strategies can be configured.

  1. Link the detection interval time range, judge the query result of the detection metric for the recent minutes, do not trigger an event;

  2. Link the detection interval time range, judge the query result of the detection metric for the recent minutes, the query result is considered as 0; at this time, the query result will be re-compared with the threshold configured in the Trigger Conditions above to determine whether to trigger an abnormal event.

  3. Custom fill the detection interval value, trigger data gap event, trigger emergency event, trigger important event, trigger warning event, and trigger recovery event; for this type of configuration strategy, it is recommended to configure the custom data gap time >= detection interval time interval. If the configured time <= detection interval time interval, there may be situations where both data gap and abnormal conditions are met, in which case only the data gap processing result will be applied.

Information Generation

Enabling this option will generate "information" events for detection results that do not match the above trigger conditions.

Note

If trigger conditions, data gap, and information generation are configured simultaneously, the priority of judgment is as follows: data gap > trigger conditions > information event generation.

Other Configurations

For more details, refer to Rule Configuration.