Skip to content

AWS OpenSearch

AWS OpenSearch, including connections, requests, latency, slow queries, etc.

Configuration

Install Func

It is recommended to enable TrueWatch integration - Extensions - DataFlux Func (Automata): all prerequisites are automatically installed, please continue with script installation

For self-deployed Func, refer to Self-deployed Func

Install Script

Note: Please prepare an Amazon AK with the required permissions in advance (for simplicity, you can directly grant global read-only permissions ReadOnlyAccess)

Managed Version Activation Script

  1. Log in to the TrueWatch console
  2. Click on the 【Integration】 menu, select 【Cloud Account Management】
  3. Click on 【Add Cloud Account】, select 【AWS】, and fill in the required information on the interface. If the cloud account information has been configured before, skip this step
  4. Click on 【Test】, and after a successful test, click on 【Save】. If the test fails, please check if the relevant configuration information is correct and test again
  5. Click on 【Cloud Account Management】 list to see the added cloud account, click on the corresponding cloud account to enter the details page
  6. Click on the 【Integration】 button on the cloud account details page, find AWS OpenSearch under the Not Installed list, and click on the 【Install】 button to pop up the installation interface for installation.

Manual Activation Script

  1. Log in to the Func console, click on 【Script Market】, enter the TrueWatch script market, search for: integration_aws_open_search

  2. Click on 【Install】, and enter the corresponding parameters: AWS AK ID, AK Secret, and account name.

  3. Click on 【Deploy Startup Script】, the system will automatically create a Startup script set and automatically configure the corresponding startup scripts.

  4. After enabling, you can see the corresponding automatic trigger configuration in 「Management / Automatic Trigger Configuration」. Click on 【Execute】 to immediately execute once without waiting for the scheduled time. Wait a moment, you can view the execution task records and corresponding logs.

Verification

  1. In 「Management / Automatic Trigger Configuration」, confirm whether the corresponding task has the corresponding automatic trigger configuration, and you can also check the corresponding task records and logs to see if there are any exceptions
  2. In TrueWatch, check if there is asset information in 「Infrastructure / Custom」
  3. In TrueWatch, check if there is corresponding monitoring data in 「Metrics」

Metrics

After configuring AWS OpenSearch, the default measurement sets are as follows. You can collect more metrics by configuring AWS CloudWatch Metrics Details

Cluster Metrics

Amazon OpenSearch Service provides the following metrics for clusters.

Metric Description
ClusterStatus.green A value of 1 indicates that all index shards are allocated to nodes in the cluster. Related statistics: Maximum
ClusterStatus.yellow A value of 1 indicates that all primary shards of the indices are allocated to nodes in the cluster, but at least one index's replica shards are not. For more information, see Yellow Cluster Status: Related statistics: Maximum
ClusterStatus.red A value of 1 indicates that at least one index's primary and replica shards are not allocated to nodes in the cluster. For more information, see Red Cluster Status: Related statistics: Maximum
Shards.active The total number of active primary and replica shards. Related statistics: Maximum, Sum
Shards.unassigned The number of shards not allocated to nodes in the cluster. Related statistics: Maximum, Sum
Shards.delayedUnassigned The number of shards whose node allocation is delayed due to timeout settings. Related statistics: Maximum, Sum
Shards.activePrimary The number of active primary shards. Related statistics: Maximum, Sum
Shards.initializing The number of shards being initialized. Related statistics: Sum
Shards.relocating The number of shards being relocated. Related statistics: Sum
Nodes The number of nodes in the OpenSearch Service cluster, including dedicated master UltraWarm nodes and nodes. For more information, see Changing Configuration in Amazon OpenSearch Service: Related statistics: Maximum
SearchableDocuments The total number of searchable documents across all data nodes in the cluster. Related statistics: Minimum, Maximum, Average
CPUUtilization The percentage of CPU utilization for data nodes in the cluster. Maximum shows the node with the highest CPU utilization. Average represents all nodes in the cluster. This metric is also available for individual nodes. Related statistics: Maximum, Average
ClusterUsedSpace The total amount of used space in the cluster. You must keep a one-minute period to get an accurate value. The OpenSearch Service console displays this value in GiB. The Amazon CloudWatch console displays it in MiB. Related statistics: Minimum, Maximum
ClusterIndexWritesBlocked Indicates whether your cluster is accepting or blocking incoming write requests. A value of 0 indicates that the cluster is accepting requests. A value of 1 indicates that requests are being blocked. Some common factors include: FreeStorageSpace being too low or JVMMemoryPressure being too high. To mitigate this, consider increasing disk space or scaling the cluster. Related statistics: Maximum
FreeStorageSpace The available space on each data node in the cluster. Sum shows the total available space in the cluster, but you must keep a one-minute period to get an accurate value. Minimum and Maximum show the nodes with the smallest and largest available space, respectively. This metric is also available for individual nodes. OpenSearchClusterBlockException is thrown when this metric reaches 0. To recover, you must delete indices, add larger instances, or add EBS-based storage to existing instances. To learn more, see Insufficient Available Storage Space. The OpenSearch Service console displays this value in GiB. The Amazon CloudWatch console displays it in MiB.
JVMMemoryPressure The maximum percentage of the Java heap used for all data nodes in the cluster. OpenSearch Service uses half of the instance's RAM for the Java heap, with a maximum heap size of 32 GiB. You can vertically scale the instance's RAM up to 64GiB, at which point you can horizontally scale by adding instances. See Recommended CloudWatch Alarms for Amazon OpenSearch Service. Related statistics: Maximum Note The logic of this metric changed in service software R20220323. For more information, see Release Notes.
JVMGCYoungCollectionCount The number of times "young generation" garbage collection has run. In clusters with sufficient resources, this number should remain small and not grow frequently. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
JVMGCOldCollectionTime The time spent by the cluster performing "old generation" garbage collection, in milliseconds. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
JVMGCYoungCollectionTime The time spent by the cluster performing "young generation" garbage collection, in milliseconds. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
JVMGCOldCollectionCount The number of times "young generation" garbage collection has run. A large and growing number of runs is normal for cluster operations. This metric is also captured at the node level. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
IndexingLatency The difference in the total time (in milliseconds) spent on all indexing operations in a node between minute N and minute (N-1).
IndexingRate The number of indexing operations per minute.
SearchLatency The difference in the total time (in milliseconds) spent on all searches in a node between minute N and minute (N-1).
SearchRate The total number of search requests per minute across all shards on data nodes.
SegmentCount The number of segments on a data node. The more segments you have, the longer each search takes. OpenSearch sometimes merges smaller segments into larger ones. Related node statistics: Maximum, Average Related cluster statistics: Sum, Maximum, Average
SysMemoryUtilization The percentage of instance memory in use. A high value for this metric is normal and usually does not indicate a problem with the cluster. For a better indication of potential performance and stability issues, see the JVMMemoryPressure metric. Related node statistics: Minimum, Maximum, Average Related cluster statistics: Minimum, Maximum, Average
OpenSearchDashboardsConcurrentConnections The number of active concurrent connections to OpenSearch Dashboards. If this number is consistently high, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
OpenSearchDashboardsHeapTotal The amount of heap memory allocated to OpenSearch Dashboards in MiB. Different EC2 instance types may affect the exact memory allocation. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
OpenSearchDashboardsHeapUsed The absolute amount of heap memory used by OpenSearch Dashboards in MiB. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
OpenSearchDashboardsHeapUtilization The maximum percentage of available heap memory used by OpenSearch Dashboards. If this value exceeds 80%, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Minimum, Maximum, Average
OpenSearchDashboardsResponseTimesMaxInMillis The maximum time (in milliseconds) taken by OpenSearch Dashboards to respond to a request. If requests consistently take a long time to return results, consider increasing the size of the instance type. Related node statistics: Maximum Related cluster statistics: Maximum, Average
OpenSearchDashboardsOS1MinuteLoad The one-minute CPU load average for OpenSearch Dashboards. Ideally, CPU load should remain below 1.00. While temporary spikes are fine, if this metric is consistently above 1.00, we recommend increasing the size of the instance type. Related node statistics: Average Related cluster statistics: Average, Maximum
OpenSearchDashboardsRequestTotal The total number of HTTP requests made to OpenSearch Dashboards. If your system is slow or you see a large number of dashboard requests, consider increasing the size of the instance type. Related node statistics: Sum Related cluster statistics: Sum
ThreadpoolForce_mergeQueue The number of queued tasks in the force merge thread pool. If the queue size is consistently large, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
ThreadpoolForce_mergeRejected The number of rejected tasks in the force merge thread pool. If this number continues to grow, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum
ThreadpoolForce_mergeThreads The size of the force merge thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
ThreadpoolSearchQueue The number of queued tasks in the search thread pool. If the queue size is consistently large, consider scaling your cluster. The maximum size of the search queue is 1000. Related node statistics: Maximum Related cluster statistics: Average, Sum
ThreadpoolSearchRejected The number of rejected tasks in the search thread pool. If this number continues to grow, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum
ThreadpoolSearchThreads The size of the search thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
Threadpoolsql-workerQueue The number of queued tasks in the SQL search thread pool. If the queue size is consistently large, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum, Maximum, Average
Threadpoolsql-workerRejected The number of rejected tasks in the SQL search thread pool. If this number continues to grow, consider scaling your cluster. Related node statistics: Maximum Related cluster statistics: Sum
Threadpoolsql-workerThreads The size of the SQL search thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
ThreadpoolWriteQueue The number of queued tasks in the write thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
ThreadpoolWriteRejected The number of rejected tasks in the write thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
ThreadpoolWriteThreads The size of the write thread pool. Related node statistics: Maximum Related cluster statistics: Average, Sum
CoordinatingWriteRejected The total number of rejections that have occurred on coordinating nodes due to indexing pressure since the last OpenSearch Service process start. Related node statistics: Maximum Related cluster statistics: Average, Sum This metric is available in version 7.1 and later.
ReplicaWriteRejected The total number of rejections that have occurred on replica shards due to indexing pressure since the last OpenSearch Service process start. Related node statistics: Maximum Related cluster statistics: Average, Sum This metric is available in version 7.1 and later.
PrimaryWriteRejected The total number of rejections that have occurred on primary shards due to indexing pressure since the last OpenSearch Service process start. Related node statistics: Maximum Related cluster statistics: Average, Sum This metric is available in version 7.1 and later.
ReadLatency The latency of read operations on an EBS volume in seconds. This metric is also available for individual nodes. Related statistics: Minimum, Maximum, Average
ReadThroughput The throughput of read operations on an EBS volume in bytes per second. This metric is also available for individual nodes. Related statistics: Minimum, Maximum, Average
ReadIOPS The number of input and output (I/O) operations per second for read operations on an EBS volume. This metric is also available for individual nodes. Related statistics: Minimum, Maximum, Average
WriteIOPS The number of input and output (I/O) operations per second for write operations on an EBS volume. This metric is also available for individual nodes. Related statistics: Minimum, Maximum, Average
WriteLatency The latency of write operations on an EBS volume in seconds. This metric is also available for individual nodes. Related statistics: Minimum, Maximum, Average
BurstBalance The percentage of I/O credits remaining in the burst bucket of an EBS volume. A value of 100 indicates that the volume has accumulated the maximum number of credits. If this percentage is below 70%, see Low EBS Burst Balance. For domains with gp3 volume types and domains with gp2 volumes larger than 1000 GiB, the burst balance remains at 0. Related statistics: Minimum, Maximum, Average
CurrentPointInTime The number of active PIT search contexts in a node.
TotalPointInTime The number of expired PIT search contexts since the node started.
HasActivePointInTime A value of 1 indicates that there is an active PIT context on the node since the node started. A value of 0 indicates there is not.
HasUsedPointInTime A value of 1 indicates that there has been an expired PIT context on the node since the node started. A value of 0 indicates there has not.
AsynchronousSearchInitializedRate The number of asynchronous searches initialized in the past 1 minute.
AsynchronousSearchRunningCurrent The number of asynchronous searches currently running.
AsynchronousSearchCompletionRate The number of asynchronous searches successfully completed in the past 1 minute.
AsynchronousSearchFailureRate The number of asynchronous searches completed and failed in the last minute.
AsynchronousSearchPersistRate The number of asynchronous searches persisted in the past 1 minute.
AsynchronousSearchRejected The total number of asynchronous searches rejected since the node started.
AsynchronousSearchCancelled The total number of asynchronous searches cancelled since the node started.
SQLRequestCount The number of requests to the _SQL API. Related statistics: Sum
SQLUnhealthy A value of 1 indicates that the SQL plugin will return a 5xx response code or pass invalid query DSL to OpenSearch in response to a specific request. Other requests will continue to succeed. A value of 0 indicates that there have been no recent failures. If you see a persistent value of 1, troubleshoot the requests your client is making to the plugin. Related statistics: Maximum
SQLDefaultCursorRequestCount Similar to SQLRequestCount, but only counts paginated requests. Related statistics: Sum
SQLFailedRequestCountByCusErr The number of requests to the _SQL API that failed due to client issues. For example, a request might return HTTP status code 400 due to IndexNotFoundException. Related statistics: Sum
SQLFailedRequestCountBySysErr The number of requests to the _SQL API that failed due to server issues or functional limitations. For example, a request might return HTTP status code 503 due to VerificationException. Related statistics: Sum
OldGenJVMMemoryPressure The maximum percentage of the Java heap used for the "old generation" on all data nodes in the cluster. This metric is also captured at the node level. Related statistics: Maximum
OpenSearchDashboardsHealthyNodes(formerly KibanaHealthyNodes The health check for OpenSearch Dashboards. If the minimum, maximum, and average are all equal to 1, the dashboard is functioning normally. If you have 10 nodes, the maximum is 1, the minimum is 0, and the average is 0.7, it means 7 nodes (70%) are functioning normally and 3 nodes (30%) are unhealthy. Related statistics: Minimum, Maximum, Average
InvalidHostHeaderRequests The number of HTTP requests to the OpenSearch cluster that contain an invalid (or missing) host header. Valid requests include the domain hostname as the host header value. OpenSearch Service rejects invalid requests to public access domains without restrictive access policies. We recommend applying restrictive access policies to all domains. If you see a large value for this metric, confirm that your OpenSearch client includes the domain hostname (for example, not its IP address) in its requests. Related statistics: Sum
OpenSearchRequests(previously ElasticsearchRequests) The number of requests made to the OpenSearch cluster. Related statistics: Sum
2xx, 3xx, 4xx, 5xx The number of requests to the domain that resulted in the specified HTTP response code (2xx, 3xx, 4xx, 5xx). Related statistics: Sum

Objects

The collected AWS OpenSearch object data structure can be seen in 「Infrastructure - Custom」

{
  "measurement": "aws_opensearch",
  "tags": {
    "name"                  : "df-prd-es",
    "EngineVersion"         : "Elasticsearch_7.10",
    "DomainId"              : "5882XXXXX135/df-prd-es",
    "DomainName"            : "df-prd-es",
    "ClusterConfig"         : "{JSON data of instance types and instance counts in the domain}",
    "ServiceSoftwareOptions": "{JSON data of the current state of the service software}",
    "region"                : "cn-northwest-1",
    "RegionId"              : "cn-northwest-1"
  },
  "fields": {
    "EBSOptions": "{JSON data of the Elastic Block Store for the specified domain}",
    "Endpoints" : "{JSON data of the mapping of domain endpoints for submitting indexing and search requests}",
    "message"   : "{JSON data of the instance}"
  }
}

Note: Fields in tags and fields may change with subsequent updates Tip 1: The value of tags.name is the instance ID, used as a unique identifier Tip 2: The tags.name in this script corresponds to the DomainName data field. When using this script, ensure that there are no duplicate DomainName values across multiple AWS accounts. Tip 3: tags.ClusterConfig, tags.Endpoint, tags.ServiceSoftwareOptions, fields.message, fields.EBSOptions, fields.Endpoints are all JSON serialized strings