Skip to content

Infrastructure Liveness Detection V2


Current Document Location

This document is the second step in the detection rule configuration process. After completing the configuration, please return to the main document to continue with the third step: Event Notification.

Data Scope: Object (O). Used to monitor the stability of data reporting for key objects (such as hosts, containers, Pods, etc.) in the infrastructure. By setting detection conditions and alert levels, anomalies can be discovered and handled promptly to ensure stable infrastructure operation.

Detection Configuration

Detection Frequency

Set the time cycle for executing detection.

  • Preset Options: 5 minutes, 15 minutes, 30 minutes, 1 hour, 6 hours, 12 hours, 24 hours

  • Crontab Mode: Click "Switch to Crontab Mode" to configure a custom cycle

Note

Since object data is updated every 5 minutes, the detection frequency should be greater than 5 minutes and less than 1 day to avoid false positives or detection delays.

Detection Interval

Set the data time range queried for each detection (must be ≥ detection frequency).

Detection Frequency Detection Interval (Dropdown Options)
5m Last 10 minutes / 15 minutes / 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours
15m Last 15 minutes / 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours
30m Last 30 minutes / 1 hour / 6 hours / 12 hours / 24 hours
1h Last 1 hour / 6 hours / 12 hours / 24 hours
6h Last 6 hours / 12 hours / 24 hours
12h Last 12 hours / 24 hours
24h Last 24 hours
  • Custom Format: For example, supports 20m (last 20 minutes), 2h (last 2 hours), 1d (last 1 day)

Detection Metrics

Monitor the data reporting status based on infrastructure object data.

Configuration Item Description
Infrastructure Type Select the type of object to monitor: HOST, Process, CONTAINERS, Pod, Service, Deployment, Node, ReplicaSet, Job, CronJob
Detection Target Select the detection scope:

  • All: Detect all objects of this type within the workspace
  • Custom: Limit the detection scope through Wildcard fuzzy matching or label filtering
  • Additional Information After selecting fields, the system performs additional queries to enrich event content, but these do not participate in trigger condition judgment. If multiple matching values are detected, one record is returned randomly. Supports fields such as: unicast_ip, Scheck, instance_id, region, etc.

    Custom Detection Target Configuration

    After selecting "Custom", the following filtering methods are supported:

    • Wildcard Matching: Enter a wildcard expression for fuzzy matching (e.g., web-*). Supports entering wildcards for fuzzy matching. If the content contains the special character "backslash \", it needs to be escaped to take effect.
    • Label Filtering: Perform precise filtering through labels (such as operating system, tags, etc.) and more fields (datakit_ver, zone_id, cloud_provider, etc.).

    Trigger Conditions

    Configure trigger conditions for each alert level (Fatal, Severe, Important, Warning), as well as normal recovery conditions.

    Level Configuration Description
    Fatal Detection target has not reported data for [N] consecutive minutes Highest level alert, requires immediate action
    Severe Detection target has not reported data for [N] consecutive minutes High-level alert, requires priority handling
    Important Detection target has not reported data for [N] consecutive minutes Medium-level alert, requires attention
    Warning Detection target has not reported data for [N] consecutive minutes Low-level alert, requires notice
    Normal No events generated for [N] consecutive detections After an abnormal event occurs, if no further anomalies are triggered for N consecutive detections, a recovery event (normal event) is generated
    Input Value Range

    The input value range for Fatal, Severe, Important, and Warning is 5 to 999 minutes. If the input value is less than 5 minutes, adjust the detection frequency or interval to avoid detection false positives.

    Multi-Object Detection Logic

    When the query result returns multiple objects, if any one of them meets the set conditions, an event of the corresponding level is triggered.

    For more details, refer to Event Level Description.

    Subsequent Configuration

    After completing the above detection configuration, please continue to configure:

    1. Event Notification: Define event title, content, notification members, data gap handling, and associated faults.
    2. Alert Configuration: Select alert strategies, set notification targets, and mute periods.
    3. Association: Associate dashboards for quick jump to view data.
    4. Permissions: Set operation permissions to control who can edit/delete this monitor.