SLO¶
TrueWatch SLO monitoring revolves around various Metrics to test whether the availability of system services meets the target requirements.
Concepts¶
Term | Description |
---|---|
SLA | Service-Level Agreement, which refers to the service commitment of the system service provider (Provider) to the customer (Customer). You can rate the service quality SLA of the service provider and monitor the compliance rate of the service in real time. |
SLI | Service Level Indicator, which refers to the Metrics selected to measure the stability of the system. TrueWatch SLI supports setting one or more Metrics based on the monitor. |
SLO | Service Level Objective, the smallest unit for SLA scoring processing in TrueWatch, is the target of cumulative successful SLIs within a time window. We often convert SLO into an error budget to calculate the tolerable number of errors. The time of abnormal events in each detection cycle will be deducted from the tolerable error time. |
As shown in the figure above, the system detects anomalies every 5 minutes, and the coverage time of each anomaly event is precisely calculated based on its actual start and end points (the start time point is based on the detection time window, and the end time point is composed of the start time of the event + duration). The deduction amount is the total coverage time of all anomaly events after merging (overlapping periods are counted only once).