Skip to content

Huawei Cloud Search Service CSS for Elasticsearch

Collect monitoring metrics for Huawei Cloud Search Service CSS for Elasticsearch

Configuration

Install Func

It is recommended to activate the TrueWatch integration - extension - hosted version of Func: all prerequisites are automatically installed, please proceed with the script installation.

If you want to deploy Func by yourself, refer to Self-deploying Func

Install Script

Note: Please prepare the Huawei Cloud AK with the required permissions in advance (for simplicity, you can grant the global read-only permission ReadOnlyAccess)

  1. Log in to the Func console, click 【Script Market】, enter the TrueWatch script market, and search for integration_huaweicloud_css

  2. Click 【Install】, then enter the corresponding parameters: Huawei Cloud AK, SK, and account name

  3. Click 【Deploy Startup Script】, the system will automatically create the Startup script set and configure the corresponding startup script

  4. After enabling, you can see the corresponding automatic trigger configuration in 「Manage / Automatic Trigger Configuration」. Click 【Execute】 to immediately execute it once without waiting for the scheduled time. After a while, you can check the execution task records and corresponding logs

Verification

  1. Confirm in 「Manage / Automatic Trigger Configuration」 whether the corresponding task has the automatic trigger configuration, and check the corresponding task records and logs for any exceptions
  2. In TrueWatch, check if there is asset information in 「Infrastructure - Resource Catalog」
  3. In TrueWatch, check if there are corresponding monitoring data in 「Metrics」

Metrics

Configure Huawei Cloud CSS metrics, you can collect more metrics through configuration Huawei Cloud CSS Metrics Details

Instance Monitoring Metrics

Performance monitoring metrics for Huawei Cloud Search Service CSS for Elasticsearch instances are as follows. For more metrics, refer to Table 1

Metric ID Metric Name Metric Description Value Range Monitoring Period (Raw Metric)
status Cluster Health Status This metric is used to statistically measure the status of the monitored object. 0,1,2,3;
0:The cluster is 100% available.
1:Data is complete, but some replicas are missing. High availability is somewhat weakened, there is a risk, please pay attention to the cluster situation.
2:Data is missing, the cluster will be abnormal when used.
3:The cluster status is not obtained.
1 minute
indices_count Number of Indices The number of indices in the CSS cluster. ≥ 0 1 minute
total_shards_count Number of Shards The number of shards in the CSS cluster. ≥ 0 1 minute
primary_shards_count Number of Primary Shards The number of primary shards in the CSS cluster. ≥ 0 1 minute
coordinating_nodes_count Number of Coordinating Nodes The number of coordinating nodes in the CSS cluster. ≥ 0 1 minute
data_nodes_count Number of Data Nodes The number of data nodes in the CSS cluster. ≥ 0 1 minute
SearchRate Average Query Rate Query QPS, the average number of query operations per second in the cluster. ≥ 0 1 minute
IndexingRate Average Indexing Rate Indexing TPS, the average number of indexing operations per second in the cluster. ≥ 0 1 minute
IndexingLatency Average Indexing Latency The average time required for shards to complete indexing operations. ≥ 0 ms 1 minute
SearchLatency Average Query Latency The average time required for shards to complete search operations. ≥ 0 ms 1 minute
avg_cpu_usage Average CPU Usage The average CPU utilization of nodes in the CSS cluster. 0-100% 1 minute
avg_mem_used_percent Average Memory Usage Percentage The average percentage of used memory of nodes in the CSS cluster. 0-100% 1 minute
disk_util Disk Usage This metric is used to statistically measure the disk usage of the monitored object. 0-100% 1 minute
avg_load_average Average Node Load Value The average value of the 1-minute average queued tasks in the operating system of nodes in the CSS cluster. ≥ 0 1 minute
avg_jvm_heap_usage Average JVM Heap Usage The average JVM heap memory usage of nodes in the CSS cluster. 0-100% 1 minute
sum_current_opened_http_count Total Current Opened HTTP Connections The sum of opened and not yet closed HTTP connections on each node in the CSS cluster. ≥ 0 1 minute
avg_thread_pool_write_queue Average Queued Tasks in Write Queue The average number of queued tasks in the write thread pool of nodes in the CSS cluster. ≥ 0 1 minute
avg_thread_pool_search_queue Average Queued Tasks in Search Queue The average number of queued tasks in the search thread pool of nodes in the CSS cluster. ≥ 0 1 minute
avg_thread_pool_force_merge_queue Average Queued Tasks in ForceMerge Queue The average number of queued tasks in the force merge thread pool of nodes in the CSS cluster. ≥ 0 1 minute
avg_thread_pool_write_rejected Average Rejected Tasks in Write Queue The average number of rejected tasks in the write thread pool of nodes in the CSS cluster. ≥ 0 1 minute
avg_jvm_old_gc_count Average JVM Old Generation GC Count The average cumulative value of the number of "old generation" garbage collection runs on each node in the CSS cluster. ≥ 0 1 minute
avg_jvm_old_gc_time Average JVM Old Generation GC Time The average cumulative value of the time spent on "old generation" garbage collection on each node in the CSS cluster. ≥ 0 ms 1 minute
avg_jvm_young_gc_count Average JVM Young Generation GC Count The average cumulative value of the number of "young generation" garbage collection runs on each node in the CSS cluster. ≥ 0 1 minute
avg_jvm_young_gc_time Average JVM Young Generation GC Time The average cumulative value of the time spent on "young generation" garbage collection on each node in the CSS cluster. ≥ 0 ms 1 minute

Objects

The collected data structure of Huawei Cloud Search Service CSS for Elasticsearch objects can be seen in 「Infrastructure - Resource Catalog」

{
  "measurement": "huaweicloud_css",
  "tags": {
    "RegionId"                   : "cn-north-4",
    "project_id"                 : "xxxxxxx",
    "enterpriseProjectId"        : "",
    "instance_id"                : "xxxxxxx-xxxxxxx-xxxxxxx-00001",
    "instance_name"              : "css-3384",
    "publicIp"                   : "xxxxx",
    "status"                     : "100",
    "endpoint"                   : "192.168.0.100:9200",
  },
  "fields": {
    "vpc_id"                     : "3dda7d4b-aec0-4838-a91a-28xxxxxxxx",
    "subnetId"                   : "xxxxx",
    "securityGroupId"            : "xxxxxxx",
    "datastore"                           : "{\"supportSecuritymode\": false, \"type\": \"elasticsearch\", \"version\": \"7.6.2\"}",
    "instances"                           : "[{\"azCode\": \"cn-east-3a\", \"id\": \"95f61e90-507b-48d4-8ac5-53dcefd155a3\", \"ip\": \"192.168.0.140\", \"name\": \"css-test-ess-esn-1-1\", \"specCode\": \"ess.spec-kc1.xlarge.2\", \"status\": \"200\", \"type\": \"ess\", \"volume\": {\"size\": 40, \"type\": \"HIGH\"}}]",
    "publicKibanaResp"                    : "xxxx",
    "elbWhiteList"                        : "xxxx",
    "updated"                             : "2023-06-27T07:35:29",
    "created"                             : "2023-06-27T07:35:29",
    "bandwidthSize"                       : "100",
    "actions"                             : "REBOOTING",
    "tags"                                : "xxxx",
    "period"                              : true, 
  }
}

Partial parameter descriptions are as follows:

Parameter Name Description
status Cluster Status Value
updated Last modification time of the cluster, in ISO8601 format
bandwidthSize Public network bandwidth, unit: Mbit/s
actions Current actions of the cluster
period Whether it is a periodic billing cluster

status (Cluster Status Value) meanings:

Value Description
100 Creating
200 Available
303 Unavailable

actions (Current actions of the cluster) meanings:

Value Description
REBOOTING Restarting
GROWING Expanding
RESTORING Restoring cluster
SNAPSHOTTING Creating snapshot

period meanings:

Value Description
true Periodic billing cluster
false Pay-as-you-go cluster

Note: Fields in tags, fields may change with subsequent updates

Note: The value of tags.instance_id is the cluster ID, used as a unique identifier