Intelligent Monitoring¶
Intelligent Monitoring provides a mechanism for quickly locating abnormal nodes in business analysis, user behavior analysis, and root cause analysis of failures. It is suitable for business Metrics and Metrics with strong volatility. By analyzing scenarios, it locates key dimensions of multi-dimensional Metrics; after locating the business dimension range, it quickly locates and analyzes abnormalities around service calls and resource dependencies in microservices.
Monitor through various intelligent detection rule configurations. Set the detection range and notifiers, and based on intelligent detection algorithms, identify abnormal data and predict future trends.
Note
Different from traditional monitoring modes, Intelligent Monitoring does not require configuring detection thresholds and trigger rules. It only needs to set the detection range and notifiers to enable monitoring with one click. Through intelligent algorithms, it identifies and locates abnormalities, supporting the analysis and reporting of abnormal intervals.
Usage Notes¶
Data Storage
- Due to the need for data archiving, enabling Logs and Application Intelligent Detection will generate new Time Series quantities, i.e., the number of detection dimensions filtered by the current monitoring configuration filter conditions (Service, Source) * the number of detection Metrics (prerequisite: the Metrics are valid values).
Intelligent Monitoring detection Metrics:
- Logs Intelligent Detection: Error Log Count (
error_log_count
), Log Count (log_count
); -
Application Intelligent Detection: P90 Latency (
p90
), Error Request Count (error_request_count
), Request Count (request_count
). -
To reduce overhead, the Logs and Application Intelligent Detection archiving Time Series adopts a minimal storage logic, only retaining the detection dimensions, Measurement names, and detection Metrics, and does not store the monitor's filter conditions. Therefore, given the current storage archiving logic, if the monitor's filter conditions are modified, new Time Series will be generated, so there may be duplicate billing for Time Series on the day the monitor's filter conditions are modified, and the changes will take effect immediately.
-
To improve algorithm accuracy and achieve the best detection effect, please set the Metric storage period to a maximum of 30 days (default configuration is 7 days) before enabling Intelligent Monitoring.
-
To view the Metrics data (Metric) archived by Logs and Application Intelligent Detection, go to the current monitoring alert event > Extended Fields >
df_event_report
> Report Content >smart_monitor_metric:smart_apm_ff5cf0ea792f4bac72ca1afdcd431c82
.
Algorithm Explanation: Intelligent Monitoring uses the algorithm based on the time series anomaly ADTK library.
This monitoring system compares the time series values with the values of the previous time window. If a value changes abnormally compared to its previous average or median, that time point is identified as an anomaly. At the same time, the system calculates the expected normal range for the current detection dimension based on past data. This expected range is determined based on the time of day and the day of the week. In this way, the system can verify whether the anomalies detected by the data are real and valid.
Rule Types¶
Currently, TrueWatch supports various intelligent detection rules, with different rules covering different data ranges.
Rule Name |
Data Range |
Basic Description |
---|---|---|
Host Intelligent Detection | Metrics(M) | Automatically detects host abnormalities through intelligent algorithms, identifying CPU and memory anomalies. |
Logs Intelligent Detection | Logs(L) | Automatically detects anomalies in logs through intelligent algorithms, including Log Count and Error Log Count. |
Application Intelligent Detection | Traces(T) | Automatically detects anomalies in applications through intelligent algorithms, including Application Request Count, Error Request Count, and Request Latency. |
RUM Intelligent Detection | RUM Data(R) | Automatically detects anomalies in websites/APPs through intelligent algorithms, including Page Performance Analysis, Error Analysis, with related detection Metrics such as LCP, FID, CLS, Loading Time, etc. |
Kubernetes Intelligent Detection | Metrics(M) | Automatically detects anomalies in Kubernetes through intelligent algorithms, including Pod Count, Pod Restarts, Api QPS, etc. |
Cloud Bill Intelligent Monitoring | Cloud Bill(B) | Automatically detects abnormal billing costs in different cloud providers through intelligent algorithms, including Billing Costs. |
Configuration¶
-
Set corresponding detection conditions for different detection rules;
-
Fill in the event notification content as needed;
- Configure alert strategies;
- Set operation permissions;
- Click Save.
Billing Explanation¶
Host, Logs, and Application Intelligent Detection are executed every 10 minutes, and each execution is calculated as 10 trigger costs; RUM Intelligent Detection is calculated as 100 trigger costs per execution.
For more details, please refer to Triggers.