Detection Rules¶
The system comes with a rich set of built-in detection rules, which can accurately match various data monitoring needs and effectively avoid false alarms and missed alarms.
Configuration Process¶
Creating a monitor requires completing the configuration in the following order:
- Select Rule Type: Determines the data scope and algorithm logic for detection configuration.
- Detection Configuration: Different types correspond to different configuration items.
- Event Notification: Defines event title, content, related information, and data gap handling.
- Alert Configuration: Sets notification strategies and silence periods.
- Association: Configures dashboard association.
- Permission: Sets operation permissions.
Except for "Detection Configuration," which varies with the rule type, the configuration logic for event notification, alert configuration, association, and permission is consistent across all rule types.
Rule Types¶
You can choose the appropriate detection logic based on your monitoring targets:
Rule Name |
Data Scope |
Basic Description |
|---|---|---|
| Threshold Detection | All | Performs anomaly detection on metric data based on set thresholds. |
| Mutation Detection | Metrics (M) | Detects anomalies in metrics based on sudden abnormal performance against historical data, often suitable for business data and short time window scenarios. |
| Interval Detection | Metrics (M) | Detects abnormal data points in metrics based on dynamic threshold ranges, often suitable for stable trend time series. |
| Interval Detection V2 | Metrics (M) Traces (T) RUM Data (R) |
Detects abnormal data points in metrics based on dynamic threshold ranges, often suitable for stable trend time series. |
| Outlier Detection | Metrics (M) | Detects whether metrics/statistical data of detection objects under specific groups have outlier deviations. |
| Log Detection | Logs (L) | Performs anomaly detection for business applications based on log data. |
| Process Anomaly Detection | Process Objects (O::host_processes) |
Periodically detects process data to understand process anomalies. |
| Infrastructure Liveness Detection V2 | Objects (O) | Sets liveness conditions based on infrastructure object data to monitor infrastructure stability. |
| Application Performance Metrics Detection | Traces (T) | Sets threshold rules based on APM data to detect anomalies. |
| Real User Metrics Detection | RUM Data (R) | Sets threshold rules based on RUM data to detect anomalies. |
| Composite Detection | All | Combines results from multiple monitors into one monitor via expressions and alerts based on the combined result. |
| Synthetic Testing Anomaly Detection | Synthetic Data (D::type) |
Sets threshold rules based on synthetic testing data to detect anomalies. |
| Network Data Detection | Network (N) | Sets threshold rules based on network data to detect network performance stability. |
| Third-party Event Detection | Others | Generates event data by sending abnormal events or records from third-party systems to an HTTP server via POST requests to a specified URL address. |
| Infrastructure Change Detection | Objects (O) | Monitors various change behaviors based on tracking the infrastructure lifecycle, accurately identifying abnormal conditions such as configuration drift and illegal operations. |
| Programmable Detection | All | Writes detection rules via scripts, often suitable for monitoring scenarios with complex and frequently changing rules. |
After selecting the rule type, the optional parameters in the "Detection Configuration" module will change accordingly. Event notification and subsequent configurations remain consistent across all types.
Detection Configuration¶
You can configure the corresponding detection frequency, detection interval, and detection metrics based on different detection rule types.
Detection configurations vary significantly across different rule types. Please refer to the detailed configuration documentation for the corresponding type.
Event Notification¶
Defines the event title, content, notified members, and related handling when the monitor triggers.
Event Title¶
Defines the event name for the alert trigger condition. Predefined template variables can be used.
Note
In the latest version, the monitor name will be generated synchronously after entering the event title. In older monitors, there may be inconsistencies between the monitor name and the event title. It is recommended to synchronize to the latest version.
Event Content¶
Write the event notification content. When the trigger condition is met, the system will send this content externally. It generally includes the following information:
- Body text in Markdown format.
- Can insert related links and template variables.
- Add related logs or error information based on Advanced Settings.
- Target notification members to whom the event content is sent.
Related Links¶
Click + Link, and the system automatically generates jump links based on the current detection metrics. The link address includes the current domain, workspace ID, detection time range ({{df_check_range_start}} ~ {{df_check_range_end}}), and dynamic filter conditions.
| Link Type | Description | Configuration Requirements |
|---|---|---|
| Custom Link | Supports any URL, template variables can be used. | Requires manual entry of the complete link address. |
| View Related Logs | Jumps to the Log Explorer. | Automatically generated, filter conditions and time range can be adjusted after insertion. |
| View Related Traces | Jumps to the Trace Explorer. | Automatically generated, automatically associates the current trace_id or service name. |
| View Related Profile | Jumps to the Profile Explorer. | Automatically generated, automatically fills service name and time range. |
| View Related Container | Jumps to the Container Object Details. | Automatically generated, automatically matches container name and host tags. |
| View Related Pod | Jumps to the Pod Object Details. | Automatically generated, automatically fills Pod name and namespace. |
| View Related Process | Jumps to the Process Object Details. | Automatically generated, automatically matches host and process name. |
| View Related Session | Jumps to RUM Session Replay. | Automatically generated, automatically fills Session ID. |
| View Related View | Jumps to the RUM View Explorer. | Automatically generated, automatically fills view path. |
| View Related Error | Jumps to the RUM Error Explorer. | Automatically generated, automatically fills error type and time range. |
| View Related Resource | Jumps to the RUM Resource Explorer. | Automatically generated, automatically fills resource path. |
| View Related Synthetic Tests | Jumps to the Synthetic Test Task Details. | Automatically generated, automatically associates the synthetic test task name. |
| View Related Dashboard | Jumps to the specified dashboard. | Requires manual addition of Dashboard ID and name, supports adjusting view variables and time range. |
Link Format Examples:
Log Explorer: [View Related Logs](<{{STUDIO_CONSOLE_BASE_URL}}/logIndi/log/all?time={{df_check_range_start}},{{df_check_range_end}}&w={{df_workspace_uuid}}>)
Trace Explorer: [View Related Traces](<{{STUDIO_CONSOLE_BASE_URL}}/tracing/link/all?time={{df_check_range_start}},{{df_check_range_end}}&w={{df_workspace_uuid}}>)
Template Variables¶
Click + Variable to insert predefined template variables. Variables are dynamically replaced with actual values when the event triggers:
| Variable | Description |
|---|---|
{{df_dimension}} |
Detection dimension object. |
{{df_monitor_checker_name}} |
Current monitor name. |
{{df_monitor_name}} |
Associated alert strategy name. |
{{Result}} |
Detection result value. |
{{df_status}} |
Event status (error/warning/ok). |
{{df_event_id}} |
Event unique identifier. |
Click to view all supported template variables.
Advanced Settings¶
Use "Advanced Settings" to embed related data context into events via DQL.
1. Add Related Logs
Click to automatically generate the template:
{% set dql_data = DQL("L::RE(`.*`):(`message`) { `index` = 'default' } LIMIT 1") %}
{{ dql_data.message | limit_lines(10) }}
Configuration Points:
- Replace
{index= 'default' }with the actual index name. RE(``.*``)supports regular expression matching, e.g.,RE(``error\|exception``).limit_lines(10)limits the output lines to avoid overly long notifications.
2. Add Related Error Stack
Click to automatically generate the template:
{% set dql_data = DQL("T::re(`.*`):(`error_message`,`error_stack`){ (`source` NOT IN ['service_map', 'tracing_stat', 'service_list_1m', 'service_list_1d', 'service_list_1h', 'profile']) AND (`error_stack` = exists()) } LIMIT 1") %}
{{ dql_data.error_message | limit_lines(10) }}
{{ dql_data.error_stack | limit_lines(10) }}
Configuration Points:
source NOT IN [...]excludes statistical aggregated data, retaining only original traces.(error_stack= exists())ensures errors with stack information are returned.
Notification Members (@)¶
Click to select workspace members.
Effect Logic:
- The
@ memberconfiguration only takes effect and sends the event content here to specified members when Incident Association is enabled. - This configuration is independent of the notification objects in Alert Configuration and does not affect the alert notification scope.
Custom Notification Content¶
By default, the system uses the Event Content as the alert notification content. If you need to customize the actual notification sent externally, you can enable the switch here and enter the notification information.
- An independent editor is expanded, allowing separate definition of the notification content sent externally.
- The original event content is still retained within the platform for event detail display.
- The independent editor also supports Markdown, template variables, related links, and advanced configuration.
Data Gap Events¶
This refers to customizing the notification content when a data gap (no data reported) occurs. You can synchronously configure the title, content, and other information for such events when they are finally sent externally.
If not customized, the system uses the official default template to send gap alerts.
Incident Association¶
When enabled, if an abnormal event is generated under this monitor, an Incident will be created synchronously.
Configuration Items¶
- Add labels to automatically created incidents for easy classification and filtering in the Incident Center.
- Configure the mapping relationship between event severity and incident severity, supporting the addition of multiple rules.
- When events of severity Fatal/Severe/Important/Warning/Data Gap occur, a new incident of severity P0/P1/P2/P3/... is created synchronously.
Incidents generated here can be viewed in the Incident Center (❗️ Such incidents include label filter conditions).
Linkage Mechanism¶
- When an event triggers, an incident record is automatically created in the Incident Center, and the incident description automatically synchronizes the event content.
- Notifications for new incidents are sent based on the @ member list in the event content.
- Incident details can be viewed in the Incident Center, where the system automatically associates and displays full-link data related to the incident (performance metrics, error logs, call traces, infrastructure topology, etc.).
Alert Configuration¶
When the monitoring trigger condition is met, alert messages are immediately sent to the specified notification objects.
Alert Strategies¶
Select created alert strategies, multiple selections are supported. Click Strategy Name to expand and view details. Click Edit Alert Strategy to modify the configuration:
| Configuration Item | Description |
|---|---|
| Notification Configuration | Displays the notification object groups bound to this strategy (e.g., All, etc.). |
| Repeat Alert | For the same event, duplicate alert notifications are not sent within a specified time (e.g., 10 minutes). |
| Alert Aggregation | Aggregation method, e.g., AI aggregation. |
| Aggregation Period | New events within a specified time (e.g., 5 minutes) are aggregated into one alert notification for sending. New events beyond the period will be aggregated into a new alert notification. |
Association¶
Select created dashboards to establish an association relationship between the monitor and dashboards, enabling quick jumps and visual monitoring data viewing.
Permission¶
Set operation permissions for the monitor to ensure different users perform compliant operations based on their roles and permission levels (❗️ The Owner role of the current workspace is not affected by the operation permission configuration here).
- Do not enable this configuration: Follows the default permissions for "Monitor Configuration Management."
- Enable this configuration and select custom permission objects: Only the creator and objects granted permissions can enable/disable, edit, and delete the rules set by this monitor.
- Enable this configuration but do not select custom permission objects: Only the creator has the permission to enable/disable, edit, and delete this monitor.
Trigger Detection Now¶
After the rule configuration is completed, click Trigger Detection Now to manually execute a test once, verifying the overall effect of the current rule configuration. The test execution will not generate actual alert notifications.
Alert Cache Protection Mechanism¶
After configuration is completed, the monitor will execute the following protection strategies during operation. The system prevents excessive detection objects due to high-cardinality aggregation from causing system pressure through the following mechanisms:
| Phase | Trigger Condition | System Behavior |
|---|---|---|
| Threshold Warning | The number of detection objects reaches 80% of the system limit (80,000). | Triggers a system notification (maximum once per day), reminding to check query conditions and grouping settings. |
| Over-limit Protection | The number of detection objects reaches the system upper limit of 100,000. | Automatically pauses the monitor and sends a notification. The monitor stops running during the pause. |
The alert cache upper limit is 100,000, with a warning ratio of 80%.
Pause Recovery Mechanism¶
After a monitor is paused by the system due to alert cache over-limit, performing any of the following operations will automatically restore it:
- Modify the query conditions and re-save the monitor.
- Directly re-save the monitor.
The system will automatically clear the alert cache mark and restore the monitor to normal operation, requiring no additional steps.
Further Reading¶
After the monitor rule is successfully created, you may need to:



