Openllmetry
- 'OTEL'
- 'APM' __int_icon: 'icon/openllmetry' dashboard:
- desc: 'OpenLLMetry' path: 'dashboard/en/openllmetry'
OpenLLMetry is developed and maintained by the Traceloop team under the Apache 2.0 license. It extends the functionality of OpenTelemetry to provide specialized monitoring and debugging tools for LLM applications. It utilizes OpenTelemetry's standardized telemetry data format to standardize the output of key performance metrics and tracing information for LLM applications.
Configuration¶
Before sending APM data to DataKit using OTEL, ensure that the Collector is configured. Also, adjust the configuration file customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"]
as shown below:
[[inputs.opentelemetry]]
## customer_tags will work as a whitelist to prevent tags send to data center.
## All . will replace to _ ,like this :
customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"]
...
After making the adjustments, restart DataKit.
Install OpenTelemetry SDK¶
pip install opentelemetry-api opentelemetry-instrumentation
pip install opentelemetry-instrumentation-flask
Install OpenLLMetry SDK¶
Initialize OpenLLMetry in the Application¶
from traceloop.sdk import Traceloop
# Initialize OpenLit
# Traceloop.init()
Traceloop.init(app_name="kimi_openllmetry_stream_flask")
OpenLLMetry Example Code¶
import os
import httpx
from flask import Flask, request, Response,jsonify,stream_with_context
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow,task
from openai import OpenAI
from opentelemetry.instrumentation.flask import FlaskInstrumentor
app = Flask(__name__)
# Use FlaskInstrumentor to automatically instrument the Flask application
FlaskInstrumentor().instrument_app(app)
# Initialize OpenLit
Traceloop.init(app_name="kimi_openllmetry_stream_flask")
# Get the API Key from the environment variable
api_key = os.getenv("MOONSHOT_API_KEY")
if not api_key:
raise ValueError("Please set the MOONSHOT_API_KEY environment variable")
client = OpenAI(
api_key=api_key,
base_url="https://api.moonshot.cn/v1",
)
def estimate_token_count(input_messages) -> int:
"""
Calculate the number of tokens in the input messages.
"""
try:
header = {
"Authorization": f"Bearer {api_key}",
}
data = {
"model": "moonshot-v1-128k",
"messages": input_messages,
}
with httpx.Client() as client:
print("Requesting interface")
r = client.post("https://api.moonshot.cn/v1/tokenizers/estimate-token-count", headers=header, json=data)
r.raise_for_status()
response_data = r.json()
print(response_data["data"]["total_tokens"])
return response_data["data"]["total_tokens"]
except httpx.RequestError as e:
print(f"Request failed: {e}")
raise
except (KeyError, ValueError) as e:
print(f"Failed to parse response: {e}")
raise
def select_model(input_messages, max_tokens=1024) -> str:
"""
Select the appropriate model based on the input context messages and the expected max_tokens value.
"""
if not isinstance(max_tokens, int) or max_tokens <= 0:
raise ValueError("max_tokens must be a positive integer")
prompt_tokens = estimate_token_count(input_messages)
total_tokens = prompt_tokens + max_tokens
if total_tokens <= 8 * 1024:
return "moonshot-v1-8k"
elif total_tokens <= 32 * 1024:
return "moonshot-v1-32k"
elif total_tokens <= 128 * 1024:
return "moonshot-v1-128k"
else:
raise ValueError("Token count exceeds the limit 😢")
@app.route('/ask', methods=['POST'])
@workflow(name="ask_workflow")
def ask():
data = request.json
messages = data.get('messages')
max_tokens = data.get('max_tokens', 2048)
if not messages:
return jsonify({"error": "The messages field cannot be empty"}), 400
try:
model = select_model(messages, max_tokens)
completion = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=max_tokens,
temperature=0.3,
stream=True # Enable streaming generation
)
def generate():
for chunk in completion:
# yield chunk.choices[0].delta.content or ''
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="")
yield delta.content or ''
return Response(stream_with_context(generate()), content_type='text/event-stream')
except Exception as e:
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
app.run(debug=True,port=5001)
Configure env to send data to Datakit via OpenTelemetry
Metric Details¶
Metric Name | Description | Unit |
---|---|---|
gen_ai.client.generation.choices |
Number of choices generated by the client | count |
gen_ai.client.operation.duration_bucket |
Histogram bucket for the duration of client operations | milliseconds |
gen_ai.client.operation.duration_count |
Total number of client operations | count |
gen_ai.client.operation.duration_max |
Maximum duration of client operations | milliseconds |
gen_ai.client.operation.duration_min |
Minimum duration of client operations | milliseconds |
gen_ai.client.operation.duration_sum |
Total duration of client operations | milliseconds |
llm.openai.chat_completions.streaming_time_to_first_token_bucket |
Histogram bucket for the time to first token in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_first_token_count |
Total number of times the first token is generated in OpenAI chat completions streaming | count |
llm.openai.chat_completions.streaming_time_to_first_token_max |
Maximum time to first token in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_first_token_min |
Minimum time to first token in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_first_token_sum |
Total time to first token in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_generate_bucket |
Histogram bucket for the total time to generate content in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_generate_count |
Total number of times content is generated in OpenAI chat completions streaming | count |
llm.openai.chat_completions.streaming_time_to_generate_max |
Maximum time to generate content in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_generate_min |
Minimum time to generate content in OpenAI chat completions streaming | milliseconds |
llm.openai.chat_completions.streaming_time_to_generate_sum |
Total time to generate content in OpenAI chat completions streaming | milliseconds |