Skip to content

Unrecovered Events


The Unrecovered Events Explorer displays all events at the alert level within the current workspace, helping users fully understand the context of alerts, accelerating event comprehension and awareness, and effectively reducing alert fatigue through associations with monitors and alerting strategies.

The unrecovered events data source aggregates event data using df_fault_id as the unique identifier and shows the most recent results. You can use this explorer—a visualization tool—to intuitively understand a series of key data points ranging from event severity to threshold baselines. Information such as event severity, duration, alert notifications, monitors, event content, and historical trigger trend graphs collectively form a comprehensive view, enabling you to analyze and understand events from different perspectives and make more informed response decisions.

Event Card

Event Severity

Based on the configuration of monitor trigger conditions, the following status statistics are generated: Unrecovered (df_status != ok), Critical (critical), Error (error), Warning (warning), and No Data (nodata).

In the Unrecovered Events Explorer, the severity of each event is defined as the severity of the last triggered event for that monitored object.

For more details, please refer to Event Severity Description.

Event Title

The event title displayed in the Unrecovered Events Explorer directly comes from the title configured in the monitor rule, representing the title used by the monitored object when it last triggered an event.

Duration

Indicates the time span from when the monitored object first triggered an anomaly until the end time of the current time widget, e.g., 5 minutes (08/20 17:53:00 ~ 17:57:38).

Alert Notification

Represents the alert notification status of the monitored object's last triggered event. It mainly includes the following three states:

  • Muted: Indicates that the current event is affected by mute rules and no alert notification has been sent externally;
  • Identifier of the actual sent notification target: includes DingTalk bot, WeCom bot, Lark bot, etc.;
  • -: No external alert notification was triggered.

Monitor Detection Type

Refers to the monitor type.

Monitored Object

If a by grouping query is used in the detection metric when configuring the monitor rule, the filtering condition will be displayed on the event card, such as source:kodo-servicemap.

Event Content

This represents the content of the last triggered event for the monitored object. It originates from the pre-configured content in the monitor rule and reflects the event content when the monitored object last triggered an event.

Historical Trigger Trend Graph

This trend graph uses window functions to display the historical trend of detection result values across the last 60 checks.

Based on the detection result value of the current unrecovered event, the historical abnormal trend of the event is shown. The trigger threshold condition value from the configured monitor detection rule is set as a clear reference line. The system specially marks the detection result of the last triggered event for the monitored object. Additionally, through the vertical line in the trend graph, you can quickly locate the exact time point when the event was triggered. At the same time, the corresponding detection interval of this detection result is also displayed, providing you with an intuitive analysis tool to assess the development process and impact of the event.

Manage Card

Display Options

The unrecovered events list supports the following display styles:

  • Standard: Displays event title, detection dimension, and event content
  • Extended: In addition to standard information, also displays the historical trend of detection result values for unrecovered events
  • List: Displays event data in list format.

Show Only Associated Issue Events

After checking this option, you can instantly filter out all events associated with Issues from the current event list.

For individual events with existing associations, clicking the icon on the right side of the event data allows direct navigation to view the related issue:

Issue & Create Issue

Create an Issue for unrecovered events to notify relevant team members for timely handling.

  • List mode:

  • Standard/Extended mode:

  • Event details:

Mute Event

In large-scale monitoring scenarios, to avoid the tedious steps, time consumption, and potential omissions involved in manually handling numerous similar alerts, you can directly "mute" rules directly on the current page.

  1. Hover over a single event and click Mute on the right;
  2. Select mute time type;
  3. Confirm.

Mute Time Type

Supports customizing the start and end times for muting or quickly setting it to 1 hour, 6 hours, 12 hours, 1 day, or 1 week.


  1. Select the start time and duration of the mute period;
  2. Choose the recurrence cycle starting from a specific time;
  3. Select the expiration time for the mute. You can choose to repeat indefinitely or until a specific time.

Recover Event

An event is considered recovered when its status becomes normal (df_sub_status = ok).

  • To recover a single rule, you can either click the button on the right side of the rule or go to Monitor settings, or manually recover it.

  • Clicking "Recover All" recovers all abnormal events under the current list, with options to associate them with Issues or not.

There are four types of event recovery:

Name
df_status Description
Recovered ok Previously detected "critical", "error", or "warning" anomalies, which were not retriggered after N checks, are considered recovered.
No Data Recovered ok If data stops reporting and then resumes, it is judged as recovered.
No Data Treated as Recovered ok If data stops reporting, it is treated as normal.
Manual Recovery ok Users can manually recover, supporting both single and batch recovery.

Further Reading