Intelligent Monitoring¶
Intelligent Monitoring provides a mechanism for quickly locating abnormal nodes for business analysis, user behavior analysis, and root cause analysis of failures. It is suitable for business-related metrics and highly volatile metrics. By constructing analysis scenarios, it locates key dimensions of multi-dimensional metrics; after locating the dimension range of the business, it quickly locates and analyzes abnormalities around service calls and resource dependencies in microservices.
Monitor through the configuration of various intelligent detection rules. Set the detection scope and notifiers, and based on intelligent detection algorithms, identify abnormal data and predict future trends.
Note
Different from traditional monitoring modes, intelligent monitoring does not require configuring detection thresholds and trigger rules. Simply set the detection scope and notifiers to enable monitoring with one click. It uses intelligent algorithms to identify and locate abnormalities, supporting analysis and reporting of abnormal intervals.
Usage Notes¶
Data Storage
- Due to the need for data archiving, enabling log and application intelligent detection will generate a new number of time series, i.e., the number of detection dimensions filtered by the current monitoring configuration filter conditions (Service, Source) * the number of detection metrics (prerequisite: the metrics are valid values).
Intelligent Monitoring Detection Metrics:
- Log Intelligent Detection: Error Log Count (
error_log_count
), Log Count (log_count
); -
Application Intelligent Detection: P90 Latency (
p90
), Error Request Count (error_request_count
), Request Count (request_count
). -
To reduce overhead, the time series archiving for log and application intelligent detection adopts a minimal storage logic, retaining only the detection dimensions, measurement names, and detection metrics, without storing the filter conditions of the monitor. Therefore, given the current storage archiving logic, if the filter conditions of the monitor are modified, new time series will be generated, which may result in duplicate billing for the day of the modification. The changes take effect immediately.
-
To improve algorithm accuracy and achieve the best detection results, please set the metric storage period to a maximum of 30 days (default is 7 days) before enabling intelligent monitoring.
-
To view the archived metric data (Metric) for log and application intelligent detection, go to the current monitoring alert event > Extended Fields >
df_event_report
> Report Content >smart_monitor_metric:smart_apm_ff5cf0ea792f4bac72ca1afdcd431c82
.
Algorithm Explanation: Intelligent Monitoring uses an anomaly detection algorithm based on time series ADTK.
This monitoring system compares time series values with their previous time window values. If a value shows an unusually large change compared to its previous average or median, that time point is identified as an anomaly. Additionally, the system calculates the expected normal range for the current detection dimension based on past data. This expected range is determined based on the time of day and the day of the week. In this way, the system can verify whether the anomalies detected in the data are genuine.
Rule Types¶
Currently, TrueWatch supports various intelligent detection rules, with different rules covering different data ranges.
Rule Name |
Data Range |
Description |
---|---|---|
Host Intelligent Detection | Metrics(M) | Automatically detects host abnormalities using intelligent algorithms, identifying issues with host CPU and memory. |
Log Intelligent Detection | Logs(L) | Automatically detects anomalies in logs using intelligent algorithms, including log count and error log count. |
Application Intelligent Detection | Traces(T) | Automatically detects application abnormalities using intelligent algorithms, including application request count, error request count, and request latency. |
RUM Intelligent Detection | RUM Data(R) | Automatically detects anomalies in websites/APPs using intelligent algorithms, including page performance analysis, error analysis, with related detection metrics such as LCP, FID, CLS, Loading Time, etc. |
Kubernetes Intelligent Detection | Metrics(M) | Automatically detects Kubernetes abnormalities using intelligent algorithms, including Pod count, Pod restarts, Api QPS, etc. |
Cloud Bill Intelligent Monitoring | Cloud Bills(B) | Automatically detects anomalies in account billing costs across different cloud providers using intelligent algorithms, including billing costs. |
Configuration¶
-
Set corresponding detection conditions for different detection rules;
-
Fill in the event notification content as needed;
- Configure alert strategies;
- Set operation permissions;
- Click Save.
Billing Explanation¶
Host, log, and application intelligent detection is executed every 10 minutes, with each execution counted as 10 trigger costs; RUM intelligent detection counts each execution as 100 trigger costs.
For more details, see Triggers.