Huawei Cloud Search Service CSS for Elasticsearch
Collect monitoring metrics for Huawei Cloud Search Service CSS for Elasticsearch
Configuration¶
Install Func¶
It is recommended to activate the TrueWatch integration - extension - hosted version of Func: all prerequisites are automatically installed, please proceed with the script installation.
If you want to deploy Func by yourself, refer to Self-deploying Func
Install Script¶
Note: Please prepare the Huawei Cloud AK with the required permissions in advance (for simplicity, you can grant the global read-only permission
ReadOnlyAccess
)
-
Log in to the Func console, click 【Script Market】, enter the TrueWatch script market, and search for
integration_huaweicloud_css
-
Click 【Install】, then enter the corresponding parameters: Huawei Cloud AK, SK, and account name
-
Click 【Deploy Startup Script】, the system will automatically create the
Startup
script set and configure the corresponding startup script -
After enabling, you can see the corresponding automatic trigger configuration in 「Manage / Automatic Trigger Configuration」. Click 【Execute】 to immediately execute it once without waiting for the scheduled time. After a while, you can check the execution task records and corresponding logs
Verification¶
- Confirm in 「Manage / Automatic Trigger Configuration」 whether the corresponding task has the automatic trigger configuration, and check the corresponding task records and logs for any exceptions
- In TrueWatch, check if there is asset information in 「Infrastructure - Resource Catalog」
- In TrueWatch, check if there are corresponding monitoring data in 「Metrics」
Metrics¶
Configure Huawei Cloud CSS metrics, you can collect more metrics through configuration Huawei Cloud CSS Metrics Details
Instance Monitoring Metrics¶
Performance monitoring metrics for Huawei Cloud Search Service CSS for Elasticsearch instances are as follows. For more metrics, refer to Table 1
Metric ID | Metric Name | Metric Description | Value Range | Monitoring Period (Raw Metric) |
---|---|---|---|---|
status |
Cluster Health Status | This metric is used to statistically measure the status of the monitored object. | 0,1,2,3; 0:The cluster is 100% available. 1:Data is complete, but some replicas are missing. High availability is somewhat weakened, there is a risk, please pay attention to the cluster situation. 2:Data is missing, the cluster will be abnormal when used. 3:The cluster status is not obtained. |
1 minute |
indices_count |
Number of Indices | The number of indices in the CSS cluster. | ≥ 0 | 1 minute |
total_shards_count |
Number of Shards | The number of shards in the CSS cluster. | ≥ 0 | 1 minute |
primary_shards_count |
Number of Primary Shards | The number of primary shards in the CSS cluster. | ≥ 0 | 1 minute |
coordinating_nodes_count |
Number of Coordinating Nodes | The number of coordinating nodes in the CSS cluster. | ≥ 0 | 1 minute |
data_nodes_count |
Number of Data Nodes | The number of data nodes in the CSS cluster. | ≥ 0 | 1 minute |
SearchRate |
Average Query Rate | Query QPS, the average number of query operations per second in the cluster. | ≥ 0 | 1 minute |
IndexingRate |
Average Indexing Rate | Indexing TPS, the average number of indexing operations per second in the cluster. | ≥ 0 | 1 minute |
IndexingLatency |
Average Indexing Latency | The average time required for shards to complete indexing operations. | ≥ 0 ms | 1 minute |
SearchLatency |
Average Query Latency | The average time required for shards to complete search operations. | ≥ 0 ms | 1 minute |
avg_cpu_usage |
Average CPU Usage | The average CPU utilization of nodes in the CSS cluster. | 0-100% | 1 minute |
avg_mem_used_percent |
Average Memory Usage Percentage | The average percentage of used memory of nodes in the CSS cluster. | 0-100% | 1 minute |
disk_util |
Disk Usage | This metric is used to statistically measure the disk usage of the monitored object. | 0-100% | 1 minute |
avg_load_average |
Average Node Load Value | The average value of the 1-minute average queued tasks in the operating system of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_heap_usage |
Average JVM Heap Usage | The average JVM heap memory usage of nodes in the CSS cluster. | 0-100% | 1 minute |
sum_current_opened_http_count |
Total Current Opened HTTP Connections | The sum of opened and not yet closed HTTP connections on each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_queue |
Average Queued Tasks in Write Queue | The average number of queued tasks in the write thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_search_queue |
Average Queued Tasks in Search Queue | The average number of queued tasks in the search thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_force_merge_queue |
Average Queued Tasks in ForceMerge Queue | The average number of queued tasks in the force merge thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_thread_pool_write_rejected |
Average Rejected Tasks in Write Queue | The average number of rejected tasks in the write thread pool of nodes in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_count |
Average JVM Old Generation GC Count | The average cumulative value of the number of "old generation" garbage collection runs on each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_old_gc_time |
Average JVM Old Generation GC Time | The average cumulative value of the time spent on "old generation" garbage collection on each node in the CSS cluster. | ≥ 0 ms | 1 minute |
avg_jvm_young_gc_count |
Average JVM Young Generation GC Count | The average cumulative value of the number of "young generation" garbage collection runs on each node in the CSS cluster. | ≥ 0 | 1 minute |
avg_jvm_young_gc_time |
Average JVM Young Generation GC Time | The average cumulative value of the time spent on "young generation" garbage collection on each node in the CSS cluster. | ≥ 0 ms | 1 minute |
Objects¶
The collected data structure of Huawei Cloud Search Service CSS for Elasticsearch objects can be seen in 「Infrastructure - Resource Catalog」
{
"measurement": "huaweicloud_css",
"tags": {
"RegionId" : "cn-north-4",
"project_id" : "xxxxxxx",
"enterpriseProjectId" : "",
"instance_id" : "xxxxxxx-xxxxxxx-xxxxxxx-00001",
"instance_name" : "css-3384",
"publicIp" : "xxxxx",
"status" : "100",
"endpoint" : "192.168.0.100:9200",
},
"fields": {
"vpc_id" : "3dda7d4b-aec0-4838-a91a-28xxxxxxxx",
"subnetId" : "xxxxx",
"securityGroupId" : "xxxxxxx",
"datastore" : "{\"supportSecuritymode\": false, \"type\": \"elasticsearch\", \"version\": \"7.6.2\"}",
"instances" : "[{\"azCode\": \"cn-east-3a\", \"id\": \"95f61e90-507b-48d4-8ac5-53dcefd155a3\", \"ip\": \"192.168.0.140\", \"name\": \"css-test-ess-esn-1-1\", \"specCode\": \"ess.spec-kc1.xlarge.2\", \"status\": \"200\", \"type\": \"ess\", \"volume\": {\"size\": 40, \"type\": \"HIGH\"}}]",
"publicKibanaResp" : "xxxx",
"elbWhiteList" : "xxxx",
"updated" : "2023-06-27T07:35:29",
"created" : "2023-06-27T07:35:29",
"bandwidthSize" : "100",
"actions" : "REBOOTING",
"tags" : "xxxx",
"period" : true,
}
}
Partial parameter descriptions are as follows:
Parameter Name | Description |
---|---|
status |
Cluster Status Value |
updated |
Last modification time of the cluster, in ISO8601 format |
bandwidthSize |
Public network bandwidth, unit: Mbit/s |
actions |
Current actions of the cluster |
period |
Whether it is a periodic billing cluster |
status (Cluster Status Value) meanings:
Value | Description |
---|---|
100 |
Creating |
200 |
Available |
303 |
Unavailable |
actions (Current actions of the cluster) meanings:
Value | Description |
---|---|
REBOOTING |
Restarting |
GROWING |
Expanding |
RESTORING |
Restoring cluster |
SNAPSHOTTING |
Creating snapshot |
period meanings:
Value | Description |
---|---|
true |
Periodic billing cluster |
false |
Pay-as-you-go cluster |
Note: Fields in
tags
,fields
may change with subsequent updatesNote: The value of
tags.instance_id
is the cluster ID, used as a unique identifier