Skip to content

Openllmetry

  • 'OTEL'
  • 'APM' __int_icon: 'icon/openllmetry' dashboard:
  • desc: 'OpenLLMetry' path: 'dashboard/en/openllmetry'

OpenLLMetry is developed and maintained by the Traceloop team under the Apache 2.0 license. It extends the functionality of OpenTelemetry to provide specialized monitoring and debugging tools for LLM applications. It utilizes OpenTelemetry's standardized telemetry data format to standardize the output of key performance metrics and tracing information for LLM applications.

Configuration

Before sending APM data to DataKit using OTEL, ensure that the Collector is configured. Also, adjust the configuration file customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"] as shown below:

[[inputs.opentelemetry]]
  ## customer_tags will work as a whitelist to prevent tags send to data center.
  ## All . will replace to _ ,like this :
    customer_tags = ["llm.request.type","traceloop.entity.path","llm.is_streaming","gen_ai.openai.api_base","gen_ai.prompt.1.content","gen_ai.response.model","gen_ai.completion.0.content","gen_ai.request.model","gen_ai.request.temperature","gen_ai.system","traceloop.workflow.name"]
  ...

After making the adjustments, restart DataKit.

Install OpenTelemetry SDK

pip install opentelemetry-api opentelemetry-instrumentation
pip install opentelemetry-instrumentation-flask

Install OpenLLMetry SDK

pip install traceloop-sdk

Initialize OpenLLMetry in the Application

from traceloop.sdk import Traceloop

# Initialize OpenLit
# Traceloop.init()

Traceloop.init(app_name="kimi_openllmetry_stream_flask")

OpenLLMetry Example Code

import os
import httpx
from flask import Flask, request, Response,jsonify,stream_with_context
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow,task
from openai import OpenAI

from opentelemetry.instrumentation.flask import FlaskInstrumentor


app = Flask(__name__)
# Use FlaskInstrumentor to automatically instrument the Flask application
FlaskInstrumentor().instrument_app(app)

# Initialize OpenLit
Traceloop.init(app_name="kimi_openllmetry_stream_flask")

# Get the API Key from the environment variable
api_key = os.getenv("MOONSHOT_API_KEY")
if not api_key:
    raise ValueError("Please set the MOONSHOT_API_KEY environment variable")

client = OpenAI(
    api_key=api_key,
    base_url="https://api.moonshot.cn/v1",
)

def estimate_token_count(input_messages) -> int:
    """
    Calculate the number of tokens in the input messages.
    """
    try:
        header = {
            "Authorization": f"Bearer {api_key}",
        }
        data = {
            "model": "moonshot-v1-128k",
            "messages": input_messages,
        }
        with httpx.Client() as client:
            print("Requesting interface")
            r = client.post("https://api.moonshot.cn/v1/tokenizers/estimate-token-count", headers=header, json=data)
            r.raise_for_status()
            response_data = r.json()
            print(response_data["data"]["total_tokens"])
            return response_data["data"]["total_tokens"]
    except httpx.RequestError as e:
        print(f"Request failed: {e}")
        raise
    except (KeyError, ValueError) as e:
        print(f"Failed to parse response: {e}")
        raise

def select_model(input_messages, max_tokens=1024) -> str:
    """
    Select the appropriate model based on the input context messages and the expected max_tokens value.
    """
    if not isinstance(max_tokens, int) or max_tokens <= 0:
        raise ValueError("max_tokens must be a positive integer")

    prompt_tokens = estimate_token_count(input_messages)
    total_tokens = prompt_tokens + max_tokens

    if total_tokens <= 8 * 1024:
        return "moonshot-v1-8k"
    elif total_tokens <= 32 * 1024:
        return "moonshot-v1-32k"
    elif total_tokens <= 128 * 1024:
        return "moonshot-v1-128k"
    else:
        raise ValueError("Token count exceeds the limit 😢")

@app.route('/ask', methods=['POST'])
@workflow(name="ask_workflow")
def ask():
    data = request.json
    messages = data.get('messages')
    max_tokens = data.get('max_tokens', 2048)

    if not messages:
        return jsonify({"error": "The messages field cannot be empty"}), 400

    try:
        model = select_model(messages, max_tokens)

        completion = client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=0.3,
            stream=True  # Enable streaming generation
        )

        def generate():
            for chunk in completion:
                # yield chunk.choices[0].delta.content or ''
                delta = chunk.choices[0].delta
                if delta.content:
                    print(delta.content, end="")
                    yield delta.content or ''

        return Response(stream_with_context(generate()), content_type='text/event-stream')
    except Exception as e:
        return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
    app.run(debug=True,port=5001)

Configure env to send data to Datakit via OpenTelemetry

export TRACELOOP_BASE_URL=http://localhost:9529/otel

Metric Details

Metric Name Description Unit
gen_ai.client.generation.choices Number of choices generated by the client count
gen_ai.client.operation.duration_bucket Histogram bucket for the duration of client operations milliseconds
gen_ai.client.operation.duration_count Total number of client operations count
gen_ai.client.operation.duration_max Maximum duration of client operations milliseconds
gen_ai.client.operation.duration_min Minimum duration of client operations milliseconds
gen_ai.client.operation.duration_sum Total duration of client operations milliseconds
llm.openai.chat_completions.streaming_time_to_first_token_bucket Histogram bucket for the time to first token in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_first_token_count Total number of times the first token is generated in OpenAI chat completions streaming count
llm.openai.chat_completions.streaming_time_to_first_token_max Maximum time to first token in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_first_token_min Minimum time to first token in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_first_token_sum Total time to first token in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_generate_bucket Histogram bucket for the total time to generate content in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_generate_count Total number of times content is generated in OpenAI chat completions streaming count
llm.openai.chat_completions.streaming_time_to_generate_max Maximum time to generate content in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_generate_min Minimum time to generate content in OpenAI chat completions streaming milliseconds
llm.openai.chat_completions.streaming_time_to_generate_sum Total time to generate content in OpenAI chat completions streaming milliseconds

References