LLM Monitoring¶
LLM (Large Language Model) is an artificial intelligence technology based on deep learning that can understand and generate natural language text. It associates LLM requests with the entire application chain, tracking the complete process of each conversation and precisely measuring the number of Tokens consumed for each generation task.
In practical use of LLM monitoring services, you can:
-
View the complete chain of a single request: Clearly see the entire process from receiving a user's question, processing it (such as database queries), to calling the LLM model and returning an answer.
-
Analyze performance bottlenecks: Accurately measure the time consumed at each stage (such as model calls, data retrieval) and promptly identify delays.
-
Correlate upstream and downstream services: Associate LLM requests with related application and infrastructure metrics for comprehensive root cause analysis.
Core Capabilities¶
The most critical part of LLM observability is establishing a quantifiable correlation between input (Prompt), output (Completion), and system behavior. Its core capabilities are reflected in three dimensions:
-
Full-chain tracking: In the LLM calling framework, precisely track the full chain of requests through Trace and Span to locate latency bottlenecks.
-
Quality output evaluation: Internally optimize output content based on rule engines and AI evaluation.
-
Cost measurement: Automatically collect and associate the Token consumption (input/output breakdown), model type, and calling parameters for each request, providing cost allocation capabilities based on multiple business dimensions.
Getting Started¶