APM¶

APM is a full-stack performance analysis solution built by TrueWatch with distributed tracing at its core. It adheres to standardized protocols such as OpenTracing. By deploying a unified collection agent at the host layer, it enables platform-level correlation analysis of tracing data, infrastructure metrics, and application logs, achieving end-to-end observability from code to resources.

Core Architecture: Adopts a single-host, single-agent architecture. DataKit is deployed as a unified data collector on each application server.

Features¶

Service Observability¶

Service List: The core interface for managing all connected application services, providing an overview of service assets and status. Supports one-click navigation from service entries to associated dashboards for in-depth analysis such as overview, resource invocation, infrastructure dependencies, tracing, and log queries.
Service Map: Visually view the invocation relationships between services through a topology map. Supports viewing key metrics such as request count, error rate, average response time, P99/P95 response time, and maximum response time.
Service Details: View service upstream/downstream, service overview, associated logs and traces. Displays anomaly trend charts for metrics (service request response time, error request distribution) and logs (error log count).
Performance Metrics: Quickly filter service performance based on service type, environment, version, project, and service name.

Tracing¶

Explorer: Search, filter, and export trace data. Supports powerful search functionality to arbitrarily filter and view trace data from any time period, quickly identifying anomalous traces.
Trace Details Page: Conduct comprehensive analysis of trace performance using tools like flame graphs, Span lists, and waterfall charts. Whether synchronous or asynchronous calls, every detail of trace performance data can be clearly tracked.

Error Tracking¶

Incident Explorer: Provides aggregation analysis and tracking capabilities for various errors generated in distributed traces. Supports viewing the historical generation trends of specific error types and their distribution across different services, interfaces, or instances.

Profiling¶

Profiling Explorer: Visualize and analyze application runtime performance, such as CPU usage and method execution time, using deep performance profiling tools like flame graphs.

Trace-related Profiling: Correlate application-level performance bottlenecks (e.g., slow calls, high-latency methods) with underlying infrastructure resource consumption for analysis.

Analysis Dashboard¶

Analysis Dashboard: Aggregates and displays core analysis data for application performance. Primarily includes trace statistics (Span and request volume and errors), related anomalies (error logs), deep performance analysis (response latency, call count, service request distribution, etc.), and resource-anomaly correlation.

Monitoring and Alerting¶

APM Anomaly Detection: Performs rule-based matching and filtering on performance data from traces. Defines specific detection conditions (e.g., response time exceeding a threshold, occurrence of specific errors) to identify and filter out requests meeting anomaly criteria from the full volume of trace data.

Storage and Billing¶

The system counts the number of trace_id in the current workspace, using tiered pricing.

Specific billing rules and data storage policies (e.g., retention period) can be configured separately. Refer to Data Storage Policies.

For more billing rules, refer to Billing Methods.