Flameshot
Flameshot is a lightweight automated profiling tool running in Sidecar mode. It monitors the resource usage (CPU/Memory) of target processes and automatically triggers underlying Profilers (such as async-profiler) when preset thresholds are reached, enabling non-intrusive on-site snapshot collection.
Core Concepts¶
Operating Mode¶
Flameshot is deployed using the Sidecar Container pattern. It must run in the same Pod as the main business container (Main Container) and have PID namespace sharing enabled.
- Monitor: Flameshot continuously polls the resource levels of target processes within the main container.
- Trigger: When thresholds are met (e.g., CPU > 80%) or an HTTP API request is received, a collection task is triggered.
- Execute: Based on the configured language type (currently supporting Java), it invokes the corresponding Profiler tool to attach to the target process.
- Collect: The generated Profile files (e.g.,
.jfr) are stored in a shared volume and subsequently uploaded to the data observability center. - Timed: After configuring
FLAMESHOT_AUTO_PROFILING, it periodically collects profiling data for all matched processes. The sample duration defaults to 30 seconds and can be adjusted throughFLAMESHOT_AUTO_PROFILING_DURATION. - OOM Summary: When a container
oom_killincrement is detected, Flameshot tries to automatically parse-XX:+HeapDumpOnOutOfMemoryErrorand-XX:HeapDumpPath=...from the target Java process arguments. If the dump file is generated inside the shared volume, Flameshot finds the corresponding.hprofand uploads a summary log.
Use Cases¶
- Production Safety Net: Automatically preserve on-site evidence before a service crashes due to CPU spikes or memory leaks.
- Performance Stress Test Analysis: Cooperate with stress testing platforms to automatically collect performance hotspots under high load.
Configuration¶
All Flameshot behaviors are controlled via environment variables. Configuration is divided into Global Settings and Profiling Policies.
Global Environment Variables¶
These variables control the basic behavior of the Sidecar container.
| Variable Name | Required | Default Value | Description |
|---|---|---|---|
FLAMESHOT_DATAKIT_ADDR |
Yes | - | DataKit's Profiling data receiving interface address. |
FLAMESHOT_PROFILING_PATH |
Yes | /data |
Shared directory path. Used to store tools and generated temporary files; must match the mount path in the main container. |
FLAMESHOT_MONITOR_INTERVAL |
No | 1 |
Monitoring polling interval (seconds). |
FLAMESHOT_LOG_LEVEL |
No | info |
Log level. Options: debug, info, warn, error. |
FLAMESHOT_AUTO_PROFILING |
No | - | Collect Profiling data at regular intervals for all matched processes. The minimum interval must not be less than one minute, such as five minutes: "5m" or one hour: "1h" |
FLAMESHOT_AUTO_PROFILING_DURATION |
No | 30s |
Sample duration for timed profiling mode. |
FLAMESHOT_OOM_HPROF_ENABLED |
No | false |
Enable the post-OOM .hprof summary recovery path. Java only. The target JVM must explicitly enable -XX:+HeapDumpOnOutOfMemoryError and configure -XX:HeapDumpPath=... inside a shared volume. It is recommended to set this explicitly in deployment manifests. |
FLAMESHOT_OOM_HPROF_MATCH_WINDOW |
No | 2m |
Time window used to match an OOM event with the .hprof file modification time. |
FLAMESHOT_POD_MEM_LIMIT |
No | - | Pod memory limit in Mi. When configured, Flameshot prefers Pod-limit memory percentage over host memory percentage. |
FLAMESHOT_POD_CPU_LIMIT |
No | - | Pod CPU limit in millicores. |
FLAMESHOT_HTTP_LOCAL_IP |
Yes | - |
The Sidecar's own HTTP service listening host. |
FLAMESHOT_HTTP_LOCAL_PORT |
Yes | 8089 |
The Sidecar's own HTTP service listening port. |
FLAMESHOT_SERVICE |
No | - | Will replace the 'service' configuration in 'FLAMESHOT_PROCESSES' |
FLAMESHOT_TAGS |
No | - | Suggest configuring host pod_name pod_namespace, such as: "host: host_name,pod_name:pod_a" |
Profiling Policy Configuration¶
Target monitoring rules are defined via the FLAMESHOT_PROCESSES environment variable. The value must be a standard JSON Array string.
To maintain readability in Kubernetes YAML, it is strongly recommended to use YAML's block scalar syntax (|) for writing the JSON configuration, as shown below:
env:
# ... other environment variables ...
- name: FLAMESHOT_PROCESSES
value: |
[
{
"service": "user-service",
"language": "java",
"command": "^java.*user-service\\.jar$",
"duration": "60s",
"events": "cpu,alloc",
"cpu_usage_percent": 80,
"mem_usage_percent": 80,
"mem_usage_mb": 1024,
"mem_usage_percent_emergency": 92,
"mem_usage_mb_emergency": 1536,
"emergency_duration": "10s",
"tags": [
"env:prod",
"version:v1.2"
]
}
]
Common Field Descriptions:
service(String): Service name reported to the observability center.language(String): Target process language. Currently supportsjava.command(String): Regular expression to match the process command line.duration(String): Duration of a single collection (e.g.,30s,1m). Note: To avoid execution timeouts, it is recommended not to exceed 5 minutes.emergency_duration(String): Shorter profiling duration used after an emergency memory hit.10sor15sis recommended.tags(List): List of custom tags; recommended to include meta-information likeenv,version.cpu_usage_percent(Int): CPU trigger threshold (0-N). Values may exceed 100 in multi-core environments.mem_usage_percent(Int): Average memory-percentage threshold (0-100), evaluated by the latest 5 points.mem_usage_mb(Int): Average RSS threshold in MB, evaluated by the latest 5 points.mem_usage_percent_emergency(Int): Instant emergency memory-percentage threshold (0-100). A single hit triggers immediately.mem_usage_mb_emergency(Int): Instant emergency RSS threshold in MB. A single hit triggers immediately.cpu_usage_percent,mem_usage_percent, andmem_usage_mbskip threshold checks when omitted or set to 0.- When
FLAMESHOT_POD_MEM_LIMITis configured,mem_usage_percentandmem_usage_percent_emergencyprefer Pod-limit memory percentage instead of host memory percentage.
Language Specifics¶
Flameshot invokes different underlying tools depending on the technology stack of the monitored application.
Java Profiling¶
For Java applications, Flameshot includes async-profiler (supporting linux-amd64 / linux-arm64).
Key Configuration Fields (FLAMESHOT_PROCESSES):
language: Must be set tojava.events: Supportscpu(CPU cycles),alloc(memory allocation),lock(lock contention),cache-misses,nativemem. Defaults toall.jdk_version: (Optional) JDK version used for metadata display.
Notes:
- No reliance on JVM Safepoint; extremely low overhead.
- If you want Flameshot to automatically discover and upload a post-OOM
.hprofsummary, the JVM must explicitly enable-XX:+HeapDumpOnOutOfMemoryErrorand configure-XX:HeapDumpPath=.... SettingFLAMESHOT_OOM_HPROF_ENABLED=truealone does not modify the target JVM startup options. HeapDumpPathmust point to a directory or file path inside a volume shared by both the application container and the Flameshot Sidecar. A stable and process-distinguishable dump path is recommended.- Declare the
.hprofsummary recovery feature flags explicitly in deployment manifests instead of relying on implicit defaults.
Go Profiling¶
Planned: Integration with the pprof toolchain.
Expected Features:
- Support for Goroutine blocking analysis.
- Support for Heap memory snapshots.
Python Profiling¶
Planned: Integration with non-intrusive tools like py-spy.
Deployment¶
Kubernetes Sidecar Deployment¶
For Flameshot to work correctly, the Pod configuration must meet the following three conditions:
- Shared Process Namespace (
shareProcessNamespace: true). - Shared Storage Volume (EmptyDir).
- System Capabilities (Capabilities).
YAML Example:
apiVersion: v1
kind: Pod
metadata:
name: java-app-profiled
spec:
# 1. [Core] Enable PID sharing so Sidecar can see the Java process
shareProcessNamespace: true
volumes:
- name: shared-data
emptyDir: {}
containers:
# Business Container
- name: my-app
image: my-app:latest
volumeMounts:
- name: shared-data
mountPath: /data # Must match Sidecar configuration
# Flameshot Sidecar
- name: flameshot
image: pubrepo.jiagouyun.com/datakit/flameshot:latest
env:
- name: FLAMESHOT_PROFILING_PATH
value: "/data"
# ... other environment variables ...
# 2. [Core] Grant ptrace capability
securityContext:
capabilities:
add: ["SYS_PTRACE"]
# 3. [Core] Mount the same directory
volumeMounts:
- name: shared-data
mountPath: /data
OOM HProf Summary Requirements¶
If you want Flameshot to automatically recover .hprof summary information after a Java OOM, all of the following conditions must be met:
- Enable
-XX:+HeapDumpOnOutOfMemoryErrorin the application JVM arguments. - Configure
-XX:HeapDumpPath=/data/...in the JVM arguments, and make sure the path is inside the shared volume. - Enable
FLAMESHOT_OOM_HPROF_ENABLED=truefor the Flameshot Sidecar. - It is recommended to set
FLAMESHOT_OOM_HPROF_MATCH_WINDOWexplicitly so the matching window is operationally unambiguous.
For example:
Notes:
- Flameshot now discovers
HeapDumpPathdirectly from the target Java process arguments. There is no separate configuration item for the.hprofpath. FLAMESHOT_OOM_HPROF_ENABLEDonly enables the Flameshot-side recovery workflow; it does not inject HeapDump-related flags into the target JVM.- If the target process does not enable
HeapDumpOnOutOfMemoryError, or ifHeapDumpPathis not inside a shared volume, Flameshot can only record the OOM event and cannot locate the.hproffile. - If the container is terminated before the dump is fully written,
.hprofmay still be unavailable.
Docker Local Testing¶
If you need to test in a local Docker environment, use the following command to start Flameshot and monitor the target container.
Prerequisites:
- Use
--pid="container:<target_id>"or shared volumes (depending on the specific Docker version).
Test Image: pubrepo.jiagouyun.com/datakit/flameshot:1.85.1-testing_testing-iss-2876
Startup Command Example:
docker run -d \
--name flameshot-debug \
--volumes-from <YOUR_JAVA_APP_CONTAINER> \
-e FLAMESHOT_DATAKIT_ADDR="http://datakit:9529/profiling/v1/input" \
-e FLAMESHOT_PROCESSES='[{"service":"local-test","command":"java","language":"java","cpu_usage_percent":10}]' \
pubrepo.jiagouyun.com/datakit/flameshot:1.85.1-testing_testing-iss-2876
API Reference¶
Flameshot provides an HTTP interface allowing users or automated O&M scripts to manually trigger collection tasks.
Manual Triggering¶
Interface Address: GET /v1/profile
Semantic Explanation: This interface is used to generate a Profile dataset on demand, not to retrieve monitoring metrics.
Request Parameters:
| Parameter | Required | Description | Example |
|---|---|---|---|
pid |
One of two | Target Process ID. Takes precedence over command. |
1234 |
command |
One of two | Target process name regex. Used to match the target process. | ^java.*app.jar$ |
duration |
No | Collection duration. Defaults to 30s. |
30s |
events |
No | Collection event types. Defaults to all. |
cpu,alloc |
Usage Examples:
-
Trigger collection by PID:
-
Trigger collection by process name regex:
JFR format¶
async-profiler events notes:
| Event Type | Command Flag | Mechanism | Best Use Case | Key Note |
|---|---|---|---|---|
| CPU Time | cpu | Uses kernel sampling or itimer to see which code is currently on the CPU. | "Performance Tuning: Finding ""hotspots"" in calculation-heavy logic or algorithms." | Only tracks time when the thread is actively running on a CPU. |
| Wall-clock | wall | "Samples all threads at fixed intervals regardless of their state (running,sleeping,blocked)." | "Latency Diagnosis: Finding delays in I/O, database calls or external network requests." | "Shows what threads are doing while they are ""waiting.""" |
| Allocation | alloc | Samples TLAB (Thread Local Allocation Buffer) refills and large object allocations. |
Memory Optimization: Reducing GC pressure by finding code that creates excessive temporary objects. | "Measures the rate of allocation ,not the current heap usage/liveness." |
| Lock | lock | Tracks contention and wait time on intrinsic JVM monitors (synchronized). | Concurrency Bottlenecks: Identifying lock contention or threads blocked by synchronization. | Usually filtered to record only events exceeding a certain duration threshold. |
| Cache Misses | cache-misses | Utilizes Hardware Performance Counters (PMU) to track L1/L2/L3 cache misses. | "Low-level Tuning: Optimizing data structures for CPU cache friendliness (e.g., avoiding false sharing)." | Requires Linux perf_events support and specific hardware access. |
| Context Switch | cs | Tracks how often the OS scheduler swaps threads in and out of the CPU. | Resource Scaling: Identifying if you have too many active threads for your CPU core count. | "High context switching leads to ""wasted"" CPU cycles spent on management." |
| Java Methods | itimer | A timer-based sampling approach provided by the OS kernel. | "Compatibility Mode: Used when perf_events is unavailable (e.g. in some restricted Docker/K8s environments)." | "Good fallback for CPU profiling,though slightly less precise than hardware-based sampling." |
Troubleshooting¶
-
Cannot collect data?
- Check if
shareProcessNamespace: trueis enabled in the Pod. - Check if the Sidecar has
SYS_PTRACEcapability.
- Check if
-
File not uploaded?
- Check if
FLAMESHOT_PROFILING_PATHis correctly mounted between the two containers. - The system automatically manages file life cycles and will attempt to delete temporary files after collection is complete.
- Check if
Changelog¶
0.2.2 (2026-5-12)¶
Bug Fixes¶
- Fix
- Fixed duplicated
host,env,version,service, and other tags in uploaded Profilingtags_profilermetadata.
- Fixed duplicated
- Change
- Removed high-watermark
jcmdsnapshot support and related configuration items. - Added cgroup memory-pressure diagnostic fields to make threshold, current cgroup memory, and cgroup limit easier to verify.
- Removed high-watermark
0.2.1 (2026-2-11)¶
Optimize¶
- optimize
- In a container environment, use the configured resource size as the base value for threshold calculation.
0.2.0 (2026-2-4)¶
New Features¶
- Add config
- Support configuring scheduled Profiling execution via the environment variable
FLAMESHOT_AUTO_PROFILING
- Support configuring scheduled Profiling execution via the environment variable
- optimize
- Optimize the threshold processing logic for configuration.
0.1.0 (2025-12-17)¶
The first official release of Flameshot, focusing on providing automated profiling capabilities for Java applications in containerized environments.
New Features¶
- Core Architecture:
- Support for Kubernetes Sidecar Mode deployment, utilizing shared PID namespaces for non-intrusive monitoring.
- Support for Linux AMD64 and ARM64 multi-architecture execution.
- Language Support:
- Java: Deep integration with
async-profiler, supporting various event collections like CPU, Alloc, Lock, etc. - Automatic detection and adaptation to the target container's JDK environment.
- Java: Deep integration with
- Trigger Mechanism:
- Threshold Trigger: Support for automatic triggering based on CPU usage (
cpu_usage_percent) and memory usage/amount (mem_usage_percent/mem_usage_mb). - API Trigger: Provided HTTP interface
GET /v1/monitor(Note: should be/v1/profileas per API section), supporting manual trigger by PID or regex process name matching.
- Threshold Trigger: Support for automatic triggering based on CPU usage (
- Data Integration:
- Support for automatically reporting generated
.jfror flame graph data to DataKit. - Support for flexible multi-process monitoring policies and tags (
tags) via theFLAMESHOT_PROCESSESenvironment variable.
- Support for automatically reporting generated