Infrastructure Liveness Detection V2¶
Current Document Location
This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.
Data Scope: Object (O). Used to monitor the stability of data reporting for key objects (such as hosts, containers, Pods, etc.) in the infrastructure. By setting detection conditions and alert levels, anomalies can be discovered and handled promptly to ensure stable infrastructure operation.
Detection Configuration¶
Detection Frequency¶
Set the time cycle for executing detection.
-
Preset Options: 5 minutes, 15 minutes, 30 minutes, 1 hour, 6 hours, 12 hours, 24 hours
-
Crontab Mode: Click "Switch to Crontab Mode" to configure a custom cycle
Note
Since object data is updated every 5 minutes, the detection frequency should be greater than 5 minutes and less than 1 day to avoid false positives or detection delays.
Detection Interval¶
Set the data time range queried for each detection (must be ≥ detection frequency).
| Detection Frequency | Detection Interval (Dropdown Options) |
|---|---|
| 5m | Last 10 minutes / 15 minutes / 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours |
| 15m | Last 15 minutes / 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours |
| 30m | Last 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours |
| 1h | Last 1 hour / 6 hours / 12 hours / 24 hours |
| 6h | Last 6 hours / 12 hours / 24 hours |
| 12h | Last 12 hours / 24 hours |
| 24h | Last 24 hours |
- Custom Format: For example, supports
20m(last 20 minutes),2h(last 2 hours),1d(last 1 day)
Detection Metrics¶
Monitor the data reporting status based on infrastructure object data.
| Configuration Item | Description |
|---|---|
| Infrastructure Type | Select the type of object to monitor: HOST, Process, CONTAINERS, Pod, Service, Deployment, Node, ReplicaSet, Job, CronJob |
| Detection Target | Select the detection scope: |
| Additional Information | After selecting fields, the system performs additional queries to enrich event content, but these do not participate in trigger condition judgment. If multiple matching values are detected, one record is returned randomly. Supports fields such as: unicast_ip, Scheck, instance_id, region, etc. |
Custom Detection Target Configuration¶
After selecting "Custom", the following filtering methods are supported:
- Wildcard Matching: Enter a wildcard expression for fuzzy matching (e.g.,
web-*). Supports entering wildcards for fuzzy matching. If the content contains the special character "backslash\", it needs to be escaped to take effect. - Label Filtering: Perform precise filtering through labels (such as
operating system,tags, etc.) and more fields (datakit_ver,zone_id,cloud_provider, etc.).
Trigger Conditions¶
Configure trigger conditions for each alert level (Fatal, Severe, Important, Warning), as well as normal recovery conditions.
| Level | Configuration | Description |
|---|---|---|
| Fatal | Detection target has not reported data for [N] consecutive minutes |
Highest level alert, requires immediate action |
| Severe | Detection target has not reported data for [N] consecutive minutes |
High-level alert, requires priority handling |
| Important | Detection target has not reported data for [N] consecutive minutes |
Medium-level alert, requires attention |
| Warning | Detection target has not reported data for [N] consecutive minutes |
Low-level alert, requires notice |
| Normal | No events generated for [N] consecutive detections |
After an abnormal event occurs, if no further anomalies are triggered for N consecutive detections, a recovery event (normal event) is generated |
Input Value Range
The input value range for Fatal, Severe, Important, and Warning is 5 to 999 minutes. If the input value is less than 5 minutes, adjust the detection frequency or interval to avoid detection false positives.
Multi-Object Detection Logic
When the query result returns multiple objects, if any one of them meets the set conditions, an event of the corresponding level is triggered.
For more details, refer to Event Level Description.
Subsequent Configuration¶
After completing the above detection configuration, please continue to configure:
- Event Notification: Define event title, content, notification members, data gap handling, and associated faults.
- Alert Configuration: Select alert strategies, set notification targets, and mute periods.
- Association: Associate dashboards for quick jump to view data.
- Permissions: Set operation permissions to control who can edit/delete this monitor.