Alibaba Cloud SAE¶
Collect Metrics, Logs, and Tracing information from Alibaba Cloud SAE (Serverless App Engine).
Configuration¶
Applications deployed on SAE can integrate Tracing, Metrics, and Logs data through the following process:
- Applications report Trace data to DataKit by integrating APM
- Log data from applications can be collected via KafkaMQ and then consumed by DataKit
- Metrics data from application containers are collected using Alibaba Cloud's monitoring API and reported to TrueWatch through the Function platform (DataFlux.f(x))
- DataKit collects the corresponding data and uniformly processes and reports it to TrueWatch
Note: Deploying DataKit on SAE can save bandwidth.
Create DataKit Application¶
Create a DataKit application on SAE
- Go to SAE, click Application List - Create Application.
- Fill in the application information
- Application Name
- Select a namespace, if not available, create one
- Select a vpc, if not available, create one
- Select a security group: vswitch must match the NAT switch
- Adjust the number of instances as needed
- CPU 1 core, memory 1G
- Click Next after completion
- Add image: pubrepo.truewatch.com/datakit/datakit:1.31.0
- Add environment variables, configure as follows:
{
"ENV_DATAWAY": "https://openway.truewatch.com?token=tkn_xxx",
"KAFKAMQ": "# {\"version\": \"1.22.7-1510\", \"desc\": \"do NOT edit this line\"}\n\n[[inputs.kafkamq]]\n # addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-2000.alikafka.aliyuncs.com:9093\",\"alikafka-serverless-cn-8ex3y7ciq02-3000.alikafka.aliyuncs.com:9093\"]\n addrs = [\"alikafka-serverless-cn-8ex3y7ciq02-1000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-2000-vpc.alikafka.aliyuncs.com:9092\",\"alikafka-serverless-cn-8ex3y7ciq02-3000-vpc.alikafka.aliyuncs.com:9092\"]\n # your kafka version:0.8.2 ~ 3.2.0\n kafka_version = \"3.3.1\"\n group_id = \"datakit-group\"\n # consumer group partition assignment strategy (range, roundrobin, sticky)\n assignor = \"roundrobin\"\n\n ## kafka tls config\n tls_enable = false\n\n ## -1:Offset Newest, -2:Offset Oldest\n offsets=-1\n\n\n ## user custom message with PL script.\n [inputs.kafkamq.custom]\n #spilt_json_body = true\n ## spilt_topic_map determines whether to enable log splitting for specific topic based on the values in the spilt_topic_map[topic].\n #[inputs.kafkamq.custom.spilt_topic_map]\n # \"log_topic\"=true\n # \"log01\"=false\n [inputs.kafkamq.custom.log_topic_map]\n \"springboot-server_log\"=\"springboot_log.p\"\n #[inputs.kafkamq.custom.metric_topic_map]\n # \"metric_topic\"=\"metric.p\"\n # \"metric01\"=\"rum_apm.p\"\n #[inputs.kafkamq.custom.rum_topic_map]\n # \"rum_topic\"=\"rum_01.p\"\n # \"rum_02\"=\"rum_02.p\"\n",
"SPRINGBOOT_LOG_P": "abc = load_json(_)\n\nadd_key(file, abc[\"file\"])\n\nadd_key(message, abc[\\"message\\"])\nadd_key(host, abc[\"host\"])\nmsg = abc[\"message\"]\ngrok(msg, \"%{TIMESTAMP_ISO8601:time} %{NOTSPACE:thread_name} %{LOGLEVEL:status}%{SPACE}%{NOTSPACE:class_name} - \\\\[%{NOTSPACE:method_name},%{NUMBER:line}\\\\] %{DATA:service_name} %{DATA:trace_id} %{DATA:span_id} - %{GREEDYDATA:msg}\")\n\nadd_key(topic, abc[\"topic\"])\n\ndefault_time(time,\"Asia/Shanghai\")",
"ENV_GLOBAL_HOST_TAGS": "host=__datakit_hostname,host_ip=__datakit_ip",
"ENV_HTTP_LISTEN": "0.0.0.0:9529",
"ENV_DEFAULT_ENABLED_INPUTS": "dk,cpu,disk,diskio,mem,swap,system,hostobject,net,host_processes,container,ddtrace,statsd,profile"
}
Configuration Details:
- ENV_DATAWAY: Required, the gateway address for reporting to TrueWatch
- KAFKAMQ: Optional, KafkaMQ collector configuration, refer to: Kafka Collector Configuration File Introduction
- SPRINGBOOT_LOG_P: Optional, used in conjunction with KAFKAMQ, log pipeline script for splitting log data from Kafka
- ENV_GLOBAL_HOST_TAGS: Required, global tags for the collector
- ENV_HTTP_LISTEN: Required, DataKit port, IP must be 0.0.0.0 otherwise other pods cannot access it
- ENV_DEFAULT_ENABLED_INPUTS: Required, default enabled collectors
Tracing¶
To deploy applications on Alibaba Cloud SAE, APM needs to be introduced into the corresponding containers:
- APM build package files can be uploaded to oss, or the APM build package can be integrated into the application's Dockerfile for building
- Startup loading, the same as the steps for accessing APM in a regular environment.
Metrics¶
Install Func¶
It is recommended to activate TrueWatch Integration - Extensions - DataFlux Func (Automata)
If deploying Func manually, refer to Deploy Func Manually
Activate Script¶
Note: Please prepare the Alibaba Cloud AK in advance (for simplicity, you can directly grant global read-only permission
ReadOnlyAccess
)
Activate Script for Automata¶
- Log in to the TrueWatch console
- Click the 【Integration】 menu, select 【Cloud Account Management】
- Click 【Add Cloud Account】, select 【Alibaba Cloud】, fill in the required information on the interface, if the cloud account information has been configured before, ignore this step
- Click 【Test】, if the test is successful, click 【Save】, if the test fails, please check the relevant configuration information and retest
- Click the cloud account list in 【Cloud Account Management】, you can see the added cloud account, click the corresponding cloud account to enter the details page
- Click the 【Integration】 button on the cloud account details page, find
Alibaba Cloud SAE
in theNot Installed
list, click the 【Install】 button, and the installation interface will pop up for installation.
Activate Script Manually¶
-
Log in to the Func console, click 【Script Market】, enter the TrueWatch script market, search:
integration_alibabacloud_sae_app
,integration_alibabacloud_sae_instance
-
Click 【Install】, then enter the corresponding parameters: Alibaba Cloud AK ID, AK Secret, and account name.
-
Click 【Deploy Startup Script】, the system will automatically create a
Startup
script set and automatically configure the corresponding startup script. -
After activation, you can see the corresponding automatic trigger configuration in 「Management / Automatic Trigger Configuration」. Click 【Execute】 to execute immediately without waiting for the scheduled time. Wait a moment, you can view the execution task record and corresponding logs.
Verification¶
- In 「Management / Automatic Trigger Configuration」, confirm whether the corresponding task has the corresponding automatic trigger configuration, and you can also check the corresponding task record and log to check for exceptions
- In TrueWatch, check whether there is asset information in 「Infrastructure / Custom」
- In TrueWatch, check whether there is corresponding monitoring data in 「Metrics」
Metrics Introduction¶
Metric | Unit | Dimensions | Description |
---|---|---|---|
cpu_Average |
% | userId、appId | Application CPU |
diskIopsRead_Average |
Count/Second | userId、appId | Application Disk IOPS Read |
diskIopsWrite_Average |
Count/Second | userId、appId | Application Disk IOPS Write |
diskRead_Average |
Byte/Second | userId、appId | Application Disk IO Throughput Read |
diskTotal_Average |
Kilobyte | userId、appId | Application Disk Total |
diskUsed_Average |
Kilobyte | userId、appId | Application Disk Used |
diskWrite_Average |
Byte/Second | userId、appId | Application Disk IO Throughput Write |
instanceId_memoryUsed_Average |
MB | userId、appId、instanceId | Instance Memory Used |
instance_cpu_Average |
% | userId、appId、instanceId | Instance CPU |
instance_diskIopsRead_Average |
Count/Second | userId、appId、instanceId | Instance Disk IOPS Read |
instance_diskIopsWrite_Average |
Count/Second | userId、appId、instanceId | Instance Disk IOPS Write |
instance_diskRead_Average |
Byte/Second | userId、appId、instanceId | Instance Disk IO Throughput Read |
instance_diskTotal_Average |
Kilobyte | userId、appId、instanceId | Instance Disk Total |
instance_diskUsed_Average |
Kilobyte | userId、appId、instanceId | Instance Disk Used |
instance_diskWrite_Average |
Byte/Second | userId、appId、instanceId | Instance Disk IO Throughput Write |
instance_load_Average |
min | userId、appId、instanceId | Instance Average Load |
instance_memoryTotal_Average |
MB | userId、appId、instanceId | Instance Total Memory |
instance_memoryUsed_Average |
MB | userId、appId、instanceId | Instance Memory Used |
instance_netRecv_Average |
Byte/Second | userId、appId、instanceId | Instance Received Bytes |
instance_netRecvBytes_Average |
Byte | userId、appId、instanceId | Instance Total Received Bytes |
instance_netRecvDrop_Average |
Count/Second | userId、appId、instanceId | Instance Received Packet Drop |
instance_netRecvError_Average |
Count/Second | userId、appId、instanceId | Instance Received Error Packets |
instance_netRecvPacket_Average |
Count/Second | userId、appId、instanceId | Instance Received Packets |
instance_netTran_Average |
Byte/Second | userId、appId、instanceId | Instance Sent Bytes |
instance_netTranBytes_Average |
Byte | userId、appId、instanceId | Instance Total Sent Bytes |
instance_netTranDrop_Average |
Count/Second | userId、appId、instanceId | Instance Sent Packet Drop |
instance_netTranError_Average |
Count/Second | userId、appId、instanceId | Instance Sent Error Packets |
instance_netTranPacket_Average |
Count/Second | userId、appId、instanceId | Instance Sent Packets |
instance_tcpActiveConn_Average |
Count | userId、appId、instanceId | Instance Active TCP Connections |
instance_tcpInactiveConn_Average |
Count | userId、appId、instanceId | Instance Inactive TCP Connections |
instance_tcpTotalConn_Average |
Count | userId、appId、instanceId | Instance Total TCP Connections |
load_Average |
min | userId、appId | Application Average Load |
memoryTotal_Average |
MB | userId、appId | Application Total Memory |
memoryUsed_Average |
MB | userId、appId | Application Memory Used |
netRecv_Average |
Byte/Second | userId、appId | Application Received Bytes |
netRecvBytes_Average |
Byte | userId、appId | Application Total Received Bytes |
netRecvDrop_Average |
Count/Second | userId、appId | Application Received Packet Drop |
netRecvError_Average |
Count/Second | userId、appId | Application Received Error Packets |
netRecvPacket_Average |
Count/Second | userId、appId | Application Received Packets |
netTran_Average |
Byte/Second | userId、appId | Application Sent Bytes |
netTranBytes_Average |
Byte | userId、appId | Application Total Sent Bytes |
netTranDrop_Average |
Count/Second | userId、appId | Application Sent Packet Drop |
netTranError_Average |
Count/Second | userId、appId | Application Sent Error Packets |
netTranPacket_Average |
Count/Second | userId、appId | Application Sent Packets |
tcpActiveConn_Average |
Count | userId、appId | Application Active TCP Connections |
tcpInactiveConn_Average |
Count | userId、appId | Application Inactive TCP Connections |
tcpTotalConn_Average |
Count | userId、appId | Application Total TCP Connections |
Logs¶
Alibaba Cloud SAE provides Kafka to output logs to TrueWatch, the process is as follows:
- Enable Kafka log reporting for SAE applications
- DataKit enables KafkaMQ log collection, collects application Kafka log reporting Topic
Example: