Skip to content

Detection Rules


The system comes with a rich set of built-in detection rules, which can accurately match various data monitoring needs and effectively avoid false alarms and missed alarms.

Configuration Process

Creating a monitor requires completing the configuration in the following order:

  1. Select Rule Type: Determines the data scope and algorithm logic for detection configuration.
  2. Detection Configuration: Different types correspond to different configuration items.
  3. Event Notification: Defines event title, content, related information, and data gap handling.
  4. Alert Configuration: Sets notification strategies and silence periods.
  5. Association: Configures dashboard association.
  6. Permission: Sets operation permissions.

Except for "Detection Configuration," which varies with the rule type, the configuration logic for event notification, alert configuration, association, and permission is consistent across all rule types.

Rule Types

You can choose the appropriate detection logic based on your monitoring targets:

Rule Name
Data Scope
Basic Description
Threshold Detection All Performs anomaly detection on metric data based on set thresholds.
Mutation Detection Metrics (M) Detects anomalies in metrics based on sudden abnormal performance against historical data, often suitable for business data and short time window scenarios.
Interval Detection Metrics (M) Detects abnormal data points in metrics based on dynamic threshold ranges, often suitable for stable trend time series.
Interval Detection V2 Metrics (M)
Traces (T)
RUM Data (R)
Detects abnormal data points in metrics based on dynamic threshold ranges, often suitable for stable trend time series.
Outlier Detection Metrics (M) Detects whether metrics/statistical data of detection objects under specific groups have outlier deviations.
Log Detection Logs (L) Performs anomaly detection for business applications based on log data.
Process Anomaly Detection Process Objects (O::host_processes) Periodically detects process data to understand process anomalies.
Infrastructure Liveness Detection V2 Objects (O) Sets liveness conditions based on infrastructure object data to monitor infrastructure stability.
Application Performance Metrics Detection Traces (T) Sets threshold rules based on APM data to detect anomalies.
Real User Metrics Detection RUM Data (R) Sets threshold rules based on RUM data to detect anomalies.
Composite Detection All Combines results from multiple monitors into one monitor via expressions and alerts based on the combined result.
Synthetic Testing Anomaly Detection Synthetic Data (D::type) Sets threshold rules based on synthetic testing data to detect anomalies.
Network Data Detection Network (N) Sets threshold rules based on network data to detect network performance stability.
Third-party Event Detection Others Generates event data by sending abnormal events or records from third-party systems to an HTTP server via POST requests to a specified URL address.
Infrastructure Change Detection Objects (O) Monitors various change behaviors based on tracking the infrastructure lifecycle, accurately identifying abnormal conditions such as configuration drift and illegal operations.
Programmable Detection All Writes detection rules via scripts, often suitable for monitoring scenarios with complex and frequently changing rules.

After selecting the rule type, the optional parameters in the "Detection Configuration" module will change accordingly. Event notification and subsequent configurations remain consistent across all types.

Detection Configuration

You can configure the corresponding detection frequency, detection interval, and detection metrics based on different detection rule types.

Detection configurations vary significantly across different rule types. Please refer to the detailed configuration documentation for the corresponding type.

Event Notification

Defines the event title, content, notified members, and related handling when the monitor triggers.

Event Title

Defines the event name for the alert trigger condition. Predefined template variables can be used.

Note

In the latest version, the monitor name will be generated synchronously after entering the event title. In older monitors, there may be inconsistencies between the monitor name and the event title. It is recommended to synchronize to the latest version.

Event Content

Write the event notification content. When the trigger condition is met, the system will send this content externally. It generally includes the following information:

Click + Link, and the system automatically generates jump links based on the current detection metrics. The link address includes the current domain, workspace ID, detection time range ({{df_check_range_start}} ~ {{df_check_range_end}}), and dynamic filter conditions.

Link Type Description Configuration Requirements
Custom Link Supports any URL, template variables can be used. Requires manual entry of the complete link address.
View Related Logs Jumps to the Log Explorer. Automatically generated, filter conditions and time range can be adjusted after insertion.
View Related Traces Jumps to the Trace Explorer. Automatically generated, automatically associates the current trace_id or service name.
View Related Profile Jumps to the Profile Explorer. Automatically generated, automatically fills service name and time range.
View Related Container Jumps to the Container Object Details. Automatically generated, automatically matches container name and host tags.
View Related Pod Jumps to the Pod Object Details. Automatically generated, automatically fills Pod name and namespace.
View Related Process Jumps to the Process Object Details. Automatically generated, automatically matches host and process name.
View Related Session Jumps to RUM Session Replay. Automatically generated, automatically fills Session ID.
View Related View Jumps to the RUM View Explorer. Automatically generated, automatically fills view path.
View Related Error Jumps to the RUM Error Explorer. Automatically generated, automatically fills error type and time range.
View Related Resource Jumps to the RUM Resource Explorer. Automatically generated, automatically fills resource path.
View Related Synthetic Tests Jumps to the Synthetic Test Task Details. Automatically generated, automatically associates the synthetic test task name.
View Related Dashboard Jumps to the specified dashboard. Requires manual addition of Dashboard ID and name, supports adjusting view variables and time range.

Link Format Examples:

Log Explorer: [View Related Logs](<{{STUDIO_CONSOLE_BASE_URL}}/logIndi/log/all?time={{df_check_range_start}},{{df_check_range_end}}&w={{df_workspace_uuid}}>)

Trace Explorer: [View Related Traces](<{{STUDIO_CONSOLE_BASE_URL}}/tracing/link/all?time={{df_check_range_start}},{{df_check_range_end}}&w={{df_workspace_uuid}}>)

Template Variables

Click + Variable to insert predefined template variables. Variables are dynamically replaced with actual values when the event triggers:

Variable Description
{{df_dimension}} Detection dimension object.
{{df_monitor_checker_name}} Current monitor name.
{{df_monitor_name}} Associated alert strategy name.
{{Result}} Detection result value.
{{df_status}} Event status (error/warning/ok).
{{df_event_id}} Event unique identifier.

Click to view all supported template variables.

Advanced Settings

Use "Advanced Settings" to embed related data context into events via DQL.

1. Add Related Logs

Click to automatically generate the template:

{% set dql_data = DQL("L::RE(`.*`):(`message`) { `index` = 'default' } LIMIT 1") %}
{{ dql_data.message | limit_lines(10) }}

Configuration Points:

  • Replace {index= 'default' } with the actual index name.
  • RE(``.*``) supports regular expression matching, e.g., RE(``error\|exception``).
  • limit_lines(10) limits the output lines to avoid overly long notifications.

2. Add Related Error Stack

Click to automatically generate the template:

{% set dql_data = DQL("T::re(`.*`):(`error_message`,`error_stack`){ (`source` NOT IN ['service_map', 'tracing_stat', 'service_list_1m', 'service_list_1d', 'service_list_1h', 'profile']) AND (`error_stack` = exists()) } LIMIT 1") %}
{{ dql_data.error_message | limit_lines(10) }}
{{ dql_data.error_stack | limit_lines(10) }}

Configuration Points:

  • source NOT IN [...] excludes statistical aggregated data, retaining only original traces.
  • (error_stack= exists()) ensures errors with stack information are returned.
Notification Members (@)

Click to select workspace members.

Effect Logic:

  1. The @ member configuration only takes effect and sends the event content here to specified members when Incident Association is enabled.
  2. This configuration is independent of the notification objects in Alert Configuration and does not affect the alert notification scope.
Custom Notification Content

By default, the system uses the Event Content as the alert notification content. If you need to customize the actual notification sent externally, you can enable the switch here and enter the notification information.

  • An independent editor is expanded, allowing separate definition of the notification content sent externally.
  • The original event content is still retained within the platform for event detail display.
  • The independent editor also supports Markdown, template variables, related links, and advanced configuration.

Data Gap Events

This refers to customizing the notification content when a data gap (no data reported) occurs. You can synchronously configure the title, content, and other information for such events when they are finally sent externally.

If not customized, the system uses the official default template to send gap alerts.

Incident Association

When enabled, if an abnormal event is generated under this monitor, an Incident will be created synchronously.

Configuration Items
  1. Add labels to automatically created incidents for easy classification and filtering in the Incident Center.
  2. Configure the mapping relationship between event severity and incident severity, supporting the addition of multiple rules.
    • When events of severity Fatal/Severe/Important/Warning/Data Gap occur, a new incident of severity P0/P1/P2/P3/... is created synchronously.

Incidents generated here can be viewed in the Incident Center (❗️ Such incidents include label filter conditions).

Linkage Mechanism
  1. When an event triggers, an incident record is automatically created in the Incident Center, and the incident description automatically synchronizes the event content.
  2. Notifications for new incidents are sent based on the @ member list in the event content.
  3. Incident details can be viewed in the Incident Center, where the system automatically associates and displays full-link data related to the incident (performance metrics, error logs, call traces, infrastructure topology, etc.).

Alert Configuration

When the monitoring trigger condition is met, alert messages are immediately sent to the specified notification objects.

Alert Strategies

Select created alert strategies, multiple selections are supported. Click Strategy Name to expand and view details. Click Edit Alert Strategy to modify the configuration:

Configuration Item Description
Notification Configuration Displays the notification object groups bound to this strategy (e.g., All, etc.).
Repeat Alert For the same event, duplicate alert notifications are not sent within a specified time (e.g., 10 minutes).
Alert Aggregation Aggregation method, e.g., AI aggregation.
Aggregation Period New events within a specified time (e.g., 5 minutes) are aggregated into one alert notification for sending. New events beyond the period will be aggregated into a new alert notification.

Association

Select created dashboards to establish an association relationship between the monitor and dashboards, enabling quick jumps and visual monitoring data viewing.

Permission

Set operation permissions for the monitor to ensure different users perform compliant operations based on their roles and permission levels (❗️ The Owner role of the current workspace is not affected by the operation permission configuration here).

  • Do not enable this configuration: Follows the default permissions for "Monitor Configuration Management."
  • Enable this configuration and select custom permission objects: Only the creator and objects granted permissions can enable/disable, edit, and delete the rules set by this monitor.
  • Enable this configuration but do not select custom permission objects: Only the creator has the permission to enable/disable, edit, and delete this monitor.

Trigger Detection Now

After the rule configuration is completed, click Trigger Detection Now to manually execute a test once, verifying the overall effect of the current rule configuration. The test execution will not generate actual alert notifications.

Alert Cache Protection Mechanism

After configuration is completed, the monitor will execute the following protection strategies during operation. The system prevents excessive detection objects due to high-cardinality aggregation from causing system pressure through the following mechanisms:

Phase Trigger Condition System Behavior
Threshold Warning The number of detection objects reaches 80% of the system limit (80,000). Triggers a system notification (maximum once per day), reminding to check query conditions and grouping settings.
Over-limit Protection The number of detection objects reaches the system upper limit of 100,000. Automatically pauses the monitor and sends a notification. The monitor stops running during the pause.

The alert cache upper limit is 100,000, with a warning ratio of 80%.

Pause Recovery Mechanism

After a monitor is paused by the system due to alert cache over-limit, performing any of the following operations will automatically restore it:

  • Modify the query conditions and re-save the monitor.
  • Directly re-save the monitor.

The system will automatically clear the alert cache mark and restore the monitor to normal operation, requiring no additional steps.

Further Reading

After the monitor rule is successfully created, you may need to: