Volcengine VKE¶
Volcengine Kubernetes Engine (VKE), VKE Metrics collection, including Cluster, Container, Node, Pod, etc.
Configuration¶
Install Func¶
It is recommended to enable TrueWatch Integration - Extensions - DataFlux Func (Automata): All prerequisites are automatically installed, please proceed with the script installation.
If you want to deploy Func by yourself, refer to Self-deploy Func
Install Script¶
Note: Please prepare the Volcengine AK with the required permissions in advance (for simplicity, you can grant the global read-only permission
ReadOnlyAccess
).
To synchronize the monitoring data of VKE cloud resources, we install the corresponding collection script: "TrueWatch Integration (Volcengine-VKE Collection)" (ID: integration_volcengine_vke
).
After clicking 【Install】, enter the corresponding parameters: Volcengine AK, Volcengine account name.
Click 【Deploy Startup Script】, the system will automatically create the Startup
script set and configure the corresponding startup scripts.
After enabling, you can see the corresponding automatic trigger configuration in 「Manage / Automatic Trigger Configuration」. Click 【Execute】 to execute it immediately without waiting for the scheduled time. After a while, you can check the execution task records and corresponding logs.
If you want to collect corresponding logs, you also need to enable the corresponding log collection script. If you want to collect billing data, you need to enable the cloud billing collection script.
Verification¶
- In 「Manage / Automatic Trigger Configuration」, confirm whether the corresponding task has the automatic trigger configuration, and you can also check the corresponding task records and logs to see if there are any exceptions.
- In TrueWatch, check if the asset information exists in 「Infrastructure / Custom」.
- In TrueWatch, check if there is corresponding monitoring data in 「Metrics」.
Metrics¶
After configuring Volcengine Cloud Monitoring, the default Metrics are as follows. You can collect more Metrics through configuration Volcengine Cloud Monitoring Metrics Details
Note: You need to install the monitoring plugin in the
volcengine
VKE console.
MetricName |
SubNamespace |
Metric Name | MetricUnit | Dimension |
---|---|---|---|---|
Cluster_MemoryUsed |
Cluster |
Cluster Memory Usage | Bytes(SI) | Cluster |
Cluster_CPUUsage |
Cluster |
Cluster CPU Usage | Percent | Cluster |
Cluster_MemoryUsage |
Cluster |
Cluster Memory Usage | Percent | Cluster |
Cluster_NodeCount |
Cluster |
Cluster Node Count | Count | Cluster |
Cluster_CPUUsed |
Cluster |
Cluster CPU Usage | Core | Cluster |
Container_MemoryUsed |
Container |
Container Memory Usage | Bytes(SI) | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container |
Container_CPUUsage |
Container |
Container CPU Usage (against limit) | Percent | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container |
Container_MemoryUsage |
Container |
Container Memory Usage (against limit) | Percent | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container |
Container_CPUUsed |
Container |
Container CPU Usage | Core | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container |
Container_GPU_Memory_Free |
Container |
Container GPU Memory Free | Megabytes | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container,GPU |
Container_GPU_Memory_Used |
Container |
Container GPU Memory Usage | Megabytes | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container,GPU |
Container_GPU_Usage |
Container |
Container GPU Usage | Percent | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container,GPU |
Container_GPU_Count |
Container |
Container GPU Count | Count | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container,GPU |
Container_GPU_Memory_Usage |
Container |
Container GPU Memory Usage | Percent | Cluster,Namespace,Deployment,StatefulSet,DaemonSet,CronJob,Job,Pod,Container,GPU |
CronJob_MemoryUsed |
CronJob |
CronJob Memory Usage | Bytes(SI) | Cluster,Namespace,CronJob |
CronJob_CPUUsage |
CronJob |
CronJob CPU Usage (against limit) | Percent | Cluster,Namespace,CronJob |
CronJob_MemoryUsage |
CronJob |
CronJob Memory Usage (against limit) | Percent | Cluster,Namespace,CronJob |
CronJob_CPUUsed |
CronJob |
CronJob CPU Usage | Core | Cluster,Namespace,CronJob |
CronJob_GPU_Memory_Free |
CronJob |
CronJob GPU Memory Free | Megabytes | Cluster,Namespace,CronJob,GPU |
CronJob_GPU_Memory_Used |
CronJob |
CronJob GPU Memory Usage | Megabytes | Cluster,Namespace,CronJob,GPU |
CronJob_GPU_Usage |
CronJob |
CronJob GPU Usage | Percent | Cluster,Namespace,CronJob,GPU |
CronJob_GPU_Count |
CronJob |
CronJob GPU Count | Count | Cluster,Namespace,CronJob,GPU |
CronJob_GPU_Memory_Usage |
CronJob |
CronJob GPU Memory Usage | Percent | Cluster,Namespace,CronJob,GPU |
DaemonSet_MemoryUsed |
DaemonSet |
DaemonSet Memory Usage | Bytes(SI) | Cluster,Namespace,DaemonSet |
DaemonSet_CPUUsage |
DaemonSet |
DaemonSet CPU Usage (against limit) | Percent | Cluster,Namespace,DaemonSet |
DaemonSet_MemoryUsage |
DaemonSet |
DaemonSet Memory Usage (against limit) | Percent | Cluster,Namespace,DaemonSet |
DaemonSet_CPUUsed |
DaemonSet |
DaemonSet CPU Usage | Core | Cluster,Namespace,DaemonSet |
DaemonSet_GPU_Memory_Free |
DaemonSet |
DaemonSet GPU Memory Free | Megabytes | Cluster,Namespace,DaemonSet,GPU |
DaemonSet_GPU_Memory_Used |
DaemonSet |
DaemonSet GPU Memory Usage | Megabytes | Cluster,Namespace,DaemonSet,GPU |
DaemonSet_GPU_Usage |
DaemonSet |
DaemonSet GPU Usage | Percent | Cluster,Namespace,DaemonSet,GPU |
DaemonSet_GPU_Count |
DaemonSet |
DaemonSet GPU Count | Count | Cluster,Namespace,DaemonSet,GPU |
DaemonSet_GPU_Memory_Usage |
DaemonSet |
DaemonSet GPU Memory Usage | Percent | Cluster,Namespace,DaemonSet,GPU |
Deployment_MemoryUsed |
Deployment |
Deployment Memory Usage | Bytes(SI) | Cluster,Namespace,Deployment |
Deployment_CPUUsage |
Deployment |
Deployment CPU Usage (against limit) | Percent | Cluster,Namespace,Deployment |
Deployment_MemoryUsage |
Deployment |
Deployment Memory Usage (against limit) | Percent | Cluster,Namespace,Deployment |
Deployment_CPUUsed |
Deployment |
Deployment CPU Usage | Core | Cluster,Namespace,Deployment |
Deployment_GPU_Memory_Free |
Deployment |
Deployment GPU Memory Free | Megabytes | Cluster,Namespace,Deployment,GPU |
Deployment_GPU_Memory_Used |
Deployment |
Deployment GPU Memory Usage | Megabytes | Cluster,Namespace,Deployment,GPU |
Deployment_GPU_Usage |
Deployment |
Deployment GPU Usage | Percent | Cluster,Namespace,Deployment,GPU |
Deployment_GPU_Count |
Deployment |
Deployment GPU Count | Count | Cluster,Namespace,Deployment,GPU |
Deployment_GPU_Memory_Usage |
Deployment |
Deployment GPU Memory Usage | Percent | Cluster,Namespace,Deployment,GPU |
Job_CPUUsed |
Job |
Job CPU Usage | Core | Cluster,Namespace,Job |
Job_MemoryUsed |
Job |
Job Memory Usage | Bytes(SI) | Cluster,Namespace,Job |
Job_CPUUsage |
Job |
Job CPU Usage (against limit) | Percent | Cluster,Namespace,Job |
Job_MemoryUsage |
Job |
Job Memory Usage (against limit) | Percent | Cluster,Namespace,Job |
Job_GPU_Memory_Free |
Job |
Job GPU Memory Free | Megabytes | Cluster,Namespace,Job,GPU |
Job_GPU_Memory_Used |
Job |
Job GPU Memory Usage | Megabytes | Cluster,Namespace,Job,GPU |
Job_GPU_Usage |
Job |
Job GPU Usage | Percent | Cluster,Namespace,Job,GPU |
Job_GPU_Count |
Job |
Job GPU Count | Count | Cluster,Namespace,Job,GPU |
Job_GPU_Memory_Usage |
Job |
Job GPU Memory Usage | Percent | Cluster,Namespace,Job,GPU |
Namespace_CPUUsed |
Namespace |
Namespace CPU Usage | Core | Cluster,Namespace |
Namespace_MemoryUsed |
Namespace |
Namespace Memory Usage | Bytes(SI) | Cluster,Namespace |
Node_PodCount |
Node |
Node Pod Count | Count | Cluster,Node |
Node_CPURequestUsage |
Node |
Node CPU Allocation Rate (request) | Percent | Cluster,Node |
Node_MemoryRequestUsage |
Node |
Node Memory Allocation Rate (request) | Percent | Cluster,Node |
Node_CPULimitUsage |
Node |
Node CPU Allocation Rate (limit) | Percent | Cluster,Node |
Node_MemoryLimitUsage |
Node |
Node Memory Allocation Rate (limit) | Percent | Cluster,Node |
Node_CPUUsage |
Node |
Node CPU Usage | Percent | Cluster,Node |
Node_MemoryUsage |
Node |
Node Memory Usage | Percent | Cluster,Node |
PersistentVolumeClaim_VolumeUsage |
PersistentVolumeClaim |
Persistent Volume Claim Capacity Usage | Percent | Cluster,Namespace,PersistentVolumeClaim |
Object¶
The collected Volcengine VKE object data structure can be seen in 「Infrastructure - Custom」.
{
"fields": {
"ClusterConfig": {},
"CreateTime": "2024-04-07T06:13:08Z",
"KubernetesConfig": {},
"PodsConfig": {},
"message": {}
},
"measurement": "volcengine_vke",
"tags": {
"ChargeType": "PostPaid",
"ClusterId": "cco93ispooc7b6ohg00b0",
"ClusterName": "test",
"KubernetesVersion": "v1.26.10-vke.14",
"RegionId": "cn-shanghai",
"Status": "Running",
"name": "cco93ispooc7b6ohg00b0"
}
}