How to Enable Application Performance Monitoring¶
Application Performance Monitoring (APM) is primarily used to view the overall system operating status, health level, external API, and database calls, as well as resource consumption or abnormal issues related to its own code and invoked resources. It helps enterprises quickly locate problems from the root cause, ensuring application performance and system stability.
TrueWatch's Application Performance Monitoring supports all APM tools based on the Opentracing protocol, such as ddtrace, Skywalking, Zipkin, Jaeger, etc. By enabling the corresponding collector in DataKit and adding relevant monitoring files to the application code that needs monitoring, after configuration is completed, you can view the reported trace data in TrueWatch workspace. At the same time, it can be associated with infrastructure, logs, and RUM for analysis, helping you quickly locate and resolve faults, improving user experience.
Prerequisites¶
You need to first create a TrueWatch account and install DataKit on your host.
Method/Steps¶
Step1: Enable and Configure the ddtrace.conf Collector¶
Enter the conf.d/ddtrace/
directory under the DataKit installation directory, copy ddtrace.conf.sample
and rename it to ddtrace.conf
. Open ddtrace.conf
, the inputs
are enabled by default and do not require modification.
## Enter the ddtrace directory
cd /usr/local/datakit/conf.d/ddtrace/
## Copy the ddtrace configuration file
cp ddtrace.conf.sample ddtrace.conf
## Open and edit the ddtrace configuration file
vim ddtrace.conf
# After configuration is complete, restart datakit to make the configuration effective
datakit --restart or service datakit restart or systemctl restart datakit
Note: endpoints
are enabled by default; do not modify them.
TrueWatch supports custom tags for application performance monitoring to perform associated queries. You can inject environment variables via command line or enable inputs.ddtrace.tags
in ddtrace.conf
and add tag
. For detailed configuration, please refer to the document ddtrace Environment Variable Settings.
Step2: Install ddtrace¶
To collect trace data through ddtrace, you need to install according to the language of the application you want to monitor. This article uses a Python application as an example; for Java or other language applications, please refer to the document Best Practices for Distributed Tracing (APM).
Execute the command pip install ddtrace
in the terminal to install ddtrace.
Step3: Configure Application Startup Script¶
Method One: Configure the DataKit Service Address in the Application's Initialization Configuration File¶
1) Configure the DataKit service address. You need to enter the application's initialization configuration file and add the following configuration content:
2) Configure the application service startup script file. You need to add the command to start the script file, for example:
For details, see the document Python Example.
Method Two: Directly Configure the DataKit Service Address in the Startup Script File¶
TrueWatch supports configuring the DataKit service address directly through the startup script file without modifying your application code. In this example, a "todoism" Python application has been created; enter the script file directory of this application, execute its script file, which actually follows your own application.
1) Configure the execution command in the startup script file (inject environment variables)
The schematic diagram is as follows:
2) Enter the startup script file directory and execute the startup script file ./boot.sh
to start the Python application. The schematic diagram is as follows:
Note: For security reasons, DataKit's HTTP service is bound to localhost:9529
by default. If you want to allow external access, edit conf.d/datakit.conf
and change listen
to 0.0.0.0:9529
(port is optional). At this point, the access address for ddtrace will be http://<datakit-ip>:9529
. If the source of the trace data is the DataKit itself, there is no need to modify the listen
configuration; simply use http://localhost:9529
.
Step4: Analyze Data in TrueWatch Explorer¶
After starting the script file, you can try accessing the Python application, then view and analyze the trace data in the "Application Performance Monitoring" section of the TrueWatch workspace.
1) Under "Application Performance Monitoring" - "Services", you can view the two collected services. It includes service type, request count, response time, etc.
2) Under "Application Performance Monitoring" - "Traces", you can view the trace creation time, status, duration, etc., of the flask service.
3) Click on a trace to view detailed information, including flame graphs, Span lists, service call relationships, and associated logs and hosts, which helps you quickly identify issues, ensure system stability, and improve user experience.
The explanation of key terms related to traces is as follows. For more introduction to traces, please refer to the document Trace Analysis.
Keyword | Definition |
---|---|
Service | That is, service_name, which can be customized when adding Trace monitoring |
Resource | Resource refers to the entry point of an independent access request processed by the Application during a single request |
Duration | That is, response time, the complete request process starts from the Application receiving the request until the Application returns the response |
Status | Status is divided into OK and ERROR, errors include error rate and number of errors |
Span | The entire method call process of a single operation constitutes a Trace path, and a Trace consists of multiple Span units |
Advanced Reference¶
Configure Associated Logs¶
1) Enter the conf.d/log
directory under the DataKit installation directory /usr/local/datakit
, copy logging.conf.sample
and name it logging.conf
. Edit the logging.conf
file, fill in the storage path of your application's service logs in logfiles
, and fill in the log source name in source
, save and restart DataKit. For more details about log collectors and log pipeline splitting, please refer to the document Logs.
## Enter the log directory
cd /usr/local/datakit/conf.d/log/
## Copy the ddtrace configuration file
cp logging.conf.sample logging.conf
## Open and edit the ddtrace configuration file
vim logging.conf
# After configuration is complete, restart datakit to make the configuration effective
datakit --restart or service datakit restart or systemctl restart datakit
2) Configure the execution command in the startup script file (inject environment variables to associate trace logs). For more detailed configuration, please refer to the document Associate Logs with Application Performance Monitoring.
DD_LOGS_INJECTION="true" DD_AGENT_HOST=localhost DATADOG_TRACE_AGENT_PORT=9529 ddtrace-run python your_app.py
The schematic diagram is as follows:
3) Start the script file, try accessing the Python application, and then you can view the trace flame graph and span list in the log details of the TrueWatch workspace, and view related logs in the application performance monitoring details, helping you quickly perform data correlation analysis. The schematic diagrams are as follows:
- Log Details
- Application Performance Details
Configure Associated Web Applications (User Access Monitoring)¶
User performance monitoring can track the complete front-end to back-end request data of a web application through ddtrace
and RUM
collectors. Using user access data from the front end and the trace_id
injected into the back end allows quick location of call stacks and improves troubleshooting efficiency.
1) In the initialization file of the Python application, add the following configuration to set the header white list for the target server to allow tracking of front-end request responses. For more details, please refer to the document Associating Web Application Access.
@app.after_request
def after_request(response):
...
response.headers.add('Access-Control-Allow-Headers', 'x-datadog-parent-id,x-datadog-sampled,x-datadog-sampling-priority,x-datadog-trace-id')
....
return response
....
The schematic diagram is as follows:
2) Add the following user access configuration in the head of the front-end page index.html (obtained by creating an application in the user access monitoring of TrueWatch workspace).
<script src="https://static.truewatch.com/browser-sdk/v2/dataflux-rum.js" type="text/javascript"></script>
<script>
window.DATAFLUX_RUM &&
window.DATAFLUX_RUM.init({
applicationId: 'appid_68fa6ec4f56f4b78xxxxxxxxxxxxxxxx',
datakitOrigin: '<DATAKIT ORIGIN>', // Protocol (including: //), domain name (or IP address) [and port number]
env: 'production',
version: '1.0.0',
trackInteractions: true,
allowedTracingOrigins: ["https://api.example.com", /https:\/\/.*\.my-api-domain\.com/]
})
</script>
Among these, allowedTracingOrigins
is the configuration item used to integrate the front-end and back-end (rum and apm). It can be configured as needed and should be filled in here with the domain names or IPs of the backend servers that interact with the frontend page. Other configuration items are used to collect user access data. For more user access monitoring configurations, please refer to the document Web Application Monitoring (RUM) Best Practices.
The schematic diagram is as follows:
3) After completing the configuration, start the script file, try accessing the Python application, and then you can view the associated traces in the user access monitoring explorer details of TrueWatch workspace, helping you quickly perform data correlation analysis. The schematic diagram is as follows:
Configure Sampling¶
TrueWatch's "Application Performance Monitoring" feature supports analyzing and managing trace data collected by collectors like ddtrace that conform to the Opentracing protocol. By default, application performance data is collected in full, meaning each call generates data. If left unchecked, the volume of collected data can become large, consuming excessive storage space. You can configure sampling to collect application performance data, reducing data storage usage and lowering costs. For more configuration details, please refer to the document How to Configure Application Performance Monitoring Sampling.