AWS MSK¶
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that allows you to build and run applications that use Apache Kafka to process streaming data.
Use the "Cloud Sync" series of script packages in the script market to synchronize cloud monitoring and cloud asset data to TrueWatch.
Configuration¶
Install Func¶
It is recommended to activate the TrueWatch Integration - Extensions - DataFlux Func (Automata): All prerequisites are automatically installed, please proceed with the script installation.
If you are deploying Func yourself, refer to Self-Deploying Func.
Install Script¶
Note: Please prepare the required Amazon AK in advance (for simplicity, you can directly grant global read-only permissions
ReadOnlyAccess
).
Activate Script for Automata¶
- Log in to the TrueWatch console.
- Click on the [Integration] menu, select [Cloud Account Management].
- Click [Add Cloud Account], select [AWS], and fill in the required information on the interface. If you have already configured the cloud account information, ignore this step.
- Click [Test], and if the test is successful, click [Save]. If the test fails, please check if the relevant configuration information is correct and retest.
- Click on the [Cloud Account Management] list to see the added cloud account, click on the corresponding cloud account to enter the details page.
- Click the [Integration] button on the cloud account details page, find
AWS MSK
under theNot Installed
list, and click the [Install] button to pop up the installation interface for installation.
Manually Activate Script¶
-
Log in to the Func console, click on the [Script Market], enter the TrueWatch script market, and search for
integration_aws_kafka
. -
Click [Install], then enter the corresponding parameters: AWS AK ID, AK Secret, and account name.
-
Click [Deploy Startup Script], the system will automatically create a
Startup
script set and automatically configure the corresponding startup script. -
After activation, you can see the corresponding automatic trigger configuration in "Management / Automatic Trigger Configuration". Click [Execute] to immediately execute once without waiting for the scheduled time. Wait a moment, you can view the execution task records and corresponding logs.
We have collected some configurations by default, see the Metrics section for details.
Verification¶
- In "Management / Automatic Trigger Configuration", confirm whether the corresponding task has the corresponding automatic trigger configuration, and you can also view the corresponding task records and logs to check for any abnormalities.
- In TrueWatch, check if there is asset information in "Infrastructure / Custom".
- In TrueWatch, check if there is corresponding monitoring data in "Metrics".
Metrics¶
After configuring Amazon Cloud Monitoring, the default Measurement is as follows, more Metrics can be collected through configuration Amazon Cloud Monitoring Metrics Details.
DEFAULT
Level Monitoring¶
The Metrics described in the table below are available at the DEFAULT
monitoring level. These Metrics are free.
Metrics Available at DEFAULT Monitoring Level |
|||
---|---|---|---|
Name | When Visible | Dimensions | Description |
ActiveControllerCount |
After the cluster enters the ACTIVE state. | Cluster Name | At any given time, only one controller can be active per cluster. |
BurstBalance |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The remaining balance of I/O burst credits for the EBS volumes in the cluster. Use it to investigate latency or throughput degradation. BurstBalance is not reported for EBS volumes when the baseline performance of the volume is higher than the maximum burst performance. For more information, see I/O Credits and Burst Performance. |
BytesInPerSec |
After creating a topic. | Cluster Name, Broker ID, Topic | The number of bytes received per second from clients. This Metric is available per broker and also per topic. |
BytesOutPerSec |
After creating a topic. | Cluster Name, Broker ID, Topic | The number of bytes sent per second to clients. This Metric is available per broker and also per topic. |
ClientConnectionCount |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID, Client Authentication | The number of authenticated active client connections. |
ConnectionCount |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of authenticated active connections, unauthenticated connections, and inter-broker connections. |
CPUCreditBalance |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | This Metric helps you monitor the CPU credit balance of brokers. If your CPU usage consistently exceeds the baseline utilization of 20%, you may deplete your CPU credit balance, which can negatively impact cluster performance. You can take steps to reduce CPU load. For example, you can reduce the number of client requests or update the broker type to M5. |
CpuIdle |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of CPU idle time. |
CpuIoWait |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of CPU idle time during pending disk operations. |
CpuSystem |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of CPU in kernel space. |
CpuUser |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of CPU in user space. |
GlobalPartitionCount |
After the cluster enters the ACTIVE state. | Cluster Name | The number of partitions across all topics in the cluster (excluding replicas). Since GlobalPartitionCount excludes replicas, the sum of PartitionCount values may be higher than the GlobalPartitionCount value when the topic replication factor is greater than 1. |
GlobalTopicCount |
After the cluster enters the ACTIVE state. | Cluster Name | The total number of topics across all brokers in the cluster. |
EstimatedMaxTimeLag |
After a consumer group consumes a topic. | Consumer Group, Topic | The estimated time (in seconds) to deplete MaxOffsetLag . |
KafkaAppLogsDiskUsed |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of disk space used for application logs. |
KafkaDataLogsDiskUsed (Cluster Name, Broker ID Dimensions) |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of disk space used for data logs. |
KafkaDataLogsDiskUsed (Cluster Name Dimensions) |
After the cluster enters the ACTIVE state. | Cluster Name | The percentage of disk space used for data logs. |
LeaderCount |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The total number of partition leaders per broker, excluding replicas. |
MaxOffsetLag |
After a consumer group consumes a topic. | Consumer Group, Topic | The maximum offset lag across all partitions in the topic. |
MemoryBuffered |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of buffered memory (in bytes) for the broker. |
MemoryCached |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of cached memory (in bytes) for the broker. |
MemoryFree |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of free memory (in bytes) available to the broker. |
HeapMemoryAfterGC |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of heap memory used after garbage collection. |
MemoryUsed |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of memory (in bytes) used by the broker. |
MessagesInPerSec |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of incoming messages per second for the broker. |
NetworkRxDropped |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of received packets dropped. |
NetworkRxErrors |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of network receive errors for the broker. |
NetworkRxPackets |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of packets received by the broker. |
NetworkTxDropped |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of transmitted packets dropped. |
NetworkTxErrors |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of network transmit errors for the broker. |
NetworkTxPackets |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of packets transmitted by the broker. |
OfflinePartitionsCount |
After the cluster enters the ACTIVE state. | Cluster Name | The total number of partitions in the cluster that are offline. |
PartitionCount |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The total number of topic partitions per broker, including replicas. |
ProduceTotalTimeMsMean |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The average produce time (in milliseconds). |
RequestBytesMean |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The average number of request bytes for the broker. |
RequestTime |
After applying request throttling. | Cluster Name, Broker ID | The average time (in milliseconds) spent by the broker network and I/O threads processing requests. |
RootDiskUsed |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The percentage of root disk used by the broker. |
SumOffsetLag |
After a consumer group consumes a topic. | Consumer Group, Topic | The aggregate offset lag across all partitions in the topic. |
SwapFree |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of swap memory (in bytes) available to the broker. |
SwapUsed |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The size of swap memory (in bytes) used by the broker. |
TrafficShaping |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | An advanced Metric that indicates the number of packets shaped (dropped or queued) due to exceeding network allocations. PER_BROKER Metrics provide more detailed information. |
UnderMinIsrPartitionCount |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of partitions that are not fully managed by the broker. |
UnderReplicatedPartitions |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The number of partitions that are not fully replicated by the broker. |
ZooKeeperRequestLatencyMsMean |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The average latency (in milliseconds) of Apache ZooKeeper requests from the broker. |
ZooKeeperSessionState |
After the cluster enters the ACTIVE state. | Cluster Name, Broker ID | The connection state of the broker ZooKeeper session may be one of the following states: NOT_CONNECTED: '0.0', ASSOCIATED: '0.1', CONNECTING: '0.5', CONNECTEDREADONLY: '0.8', CONNECTED: '1.0', CLOSED: '5.0', AUTH_FAILED: '10.0'. |
PER_BROKER
Level Monitoring¶
When you set the monitoring level to PER_BROKER
, in addition to all DEFAULT
level Metrics, you will also get the Metrics described in the table below. You are charged for the Metrics in the table below, while the DEFAULT
level Metrics remain free. The Metrics in this table have the following dimensions: Cluster Name, Broker ID.
Additional Metrics Available at PER_BROKER Monitoring Level |
||
---|---|---|
Name | When Visible | Description |
BwInAllowanceExceeded |
After the cluster enters the ACTIVE state. | The number of packets shaped due to inbound aggregate bandwidth exceeding the broker's maximum bandwidth. |
BwOutAllowanceExceeded |
After the cluster enters the ACTIVE state. | The number of packets shaped due to outbound aggregate bandwidth exceeding the broker's maximum bandwidth. |
ConnTrackAllowanceExceeded |
After the cluster enters the ACTIVE state. | The number of packets shaped due to connection tracking exceeding the broker's maximum value. Connection tracking is related to security groups, which track each established connection to ensure that return packets are delivered as expected. |
ConnectionCloseRate |
After the cluster enters the ACTIVE state. | The number of connections closed per second per listener. This number is aggregated per listener and then filtered for client listeners. |
ConnectionCreationRate |
After the cluster enters the ACTIVE state. | The number of new connections established per second per listener. This number is aggregated per listener and then filtered for client listeners. |
CpuCreditUsage |
After the cluster enters the ACTIVE state. | This Metric helps you monitor the CPU credit usage on the instance. If your CPU usage consistently exceeds the baseline level of 20%, you may deplete your CPU credit balance, which can negatively impact cluster performance. You can monitor this Metric and set alerts to take corrective actions. |
FetchConsumerLocalTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) spent processing consumer requests at the leader. |
FetchConsumerRequestQueueTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) consumer requests waited in the request queue. |
FetchConsumerResponseQueueTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) consumer requests waited in the response queue. |
FetchConsumerResponseSendTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) spent sending consumer responses. |
FetchConsumerTotalTimeMsMean |
After providing producers/consumers. | The total average time (in milliseconds) spent by consumers fetching data from the broker. |
FetchFollowerLocalTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) spent processing follower requests at the leader. |
FetchFollowerRequestQueueTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) follower requests waited in the request queue. |
FetchFollowerResponseQueueTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) follower requests waited in the response queue. |
FetchFollowerResponseSendTimeMsMean |
After providing producers/consumers. | The average time (in milliseconds) spent sending follower responses. |
FetchFollowerTotalTimeMsMean |
After providing producers/consumers. | The total average time (in milliseconds) spent by followers fetching data from the broker. |
FetchMessageConversionsPerSec |
After creating a topic. | The number of message conversions per second for the broker. |
FetchThrottleByteRate |
After applying bandwidth throttling. | The number of throttled bytes per second. |
FetchThrottleQueueSize |
After applying bandwidth throttling. | The number of messages in the throttle queue. |
FetchThrottleTime |
After applying bandwidth throttling. | The average fetch throttle time (in milliseconds). |
NetworkProcessorAvgIdlePercent |
After the cluster enters the ACTIVE state. | The average percentage of time the network processor was idle. |
PpsAllowanceExceeded |
After the cluster enters the ACTIVE state. | The number of packets shaped due to bidirectional PPS exceeding the broker's maximum value. |
ProduceLocalTimeMsMean |
After the cluster enters the ACTIVE state. | The average time (in milliseconds) spent processing requests at the leader. |
ProduceMessageConversionsPerSec |
After creating a topic. | The number of message conversions per second for the broker. |
ProduceMessageConversionsTimeMsMean |
After the cluster enters the ACTIVE state. | The average time (in milliseconds) spent converting message formats. |
ProduceRequestQueueTimeMsMean |
After the cluster enters the ACTIVE state. | The average time (in milliseconds) request messages waited in the queue. |
ProduceResponseQueueTimeMsMean |
After the cluster enters the ACTIVE state. | The average time (in milliseconds) response messages waited in the queue. |
ProduceResponseSendTimeMsMean |
After the cluster enters the ACTIVE state. | The average time (in milliseconds) spent sending response messages. |
ProduceThrottleByteRate |
After applying bandwidth throttling. | The number of throttled bytes per second. |
ProduceThrottleQueueSize |
After applying bandwidth throttling. | The number of messages in the throttle queue. |
ProduceThrottleTime |
After applying bandwidth throttling. | The average produce throttle time (in milliseconds). |
ProduceTotalTimeMsMean |
After the cluster enters the ACTIVE state. | The average produce time (in milliseconds). |
RemoteBytesInPerSec |
After having producers/consumers. | The total number of bytes transferred from tiered storage in response to consumer fetches. This Metric includes all topic partitions that affect downstream data transfer traffic. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteBytesOutPerSec | After having producers/consumers. | The total number of bytes transferred to tiered storage, including data from log segments, indexes, and other auxiliary files. This Metric includes all topic partitions that affect upstream data transfer traffic. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteLogManagerTasksAvgIdlePercent | After the cluster enters the ACTIVE state. | The average percentage of time the remote log manager was idle. The remote log manager transfers data from the broker to tiered storage. Category: Internal Activities. This is a KIP-405 Metric. |
RemoteLogReaderAvgIdlePercent | After the cluster enters the ACTIVE state. | The average percentage of time the remote log reader was idle. The remote log reader transfers data from remote storage to the broker in response to consumer fetches. Category: Internal Activities. This is a KIP-405 Metric. |
RemoteLogReaderTaskQueueSize | After the cluster enters the ACTIVE state. | The number of tasks responsible for reading from tiered storage and waiting to be scheduled. Category: Internal Activities. This is a KIP-405 Metric. |
RemoteReadErrorPerSec | After the cluster enters the ACTIVE state. | The total error rate of read requests sent by the specified broker to tiered storage to retrieve data in response to consumer fetches. This Metric includes all topic partitions that affect downstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteReadRequestsPerSec | After the cluster enters the ACTIVE state. | The total number of read requests sent by the specified broker to tiered storage to retrieve data in response to consumer fetches. This Metric includes all topic partitions that affect downstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteWriteErrorPerSec | After the cluster enters the ACTIVE state. | The total error rate of write requests sent by the specified broker to tiered storage to transfer data upstream. This Metric includes all topic partitions that affect upstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
ReplicationBytesInPerSec |
After creating a topic. | The number of bytes received per second from other brokers. |
ReplicationBytesOutPerSec |
After creating a topic. | The number of bytes sent per second to other brokers. |
RequestExemptFromThrottleTime |
After applying request throttling. | The average time (in milliseconds) spent by the broker network and I/O threads processing requests exempt from throttling. |
RequestHandlerAvgIdlePercent |
After the cluster enters the ACTIVE state. | The average percentage of time the request handler threads were idle. |
RequestThrottleQueueSize |
After applying request throttling. | The number of messages in the throttle queue. |
RequestThrottleTime |
After applying request throttling. | The average request throttle time (in milliseconds). |
TcpConnections |
After the cluster enters the ACTIVE state. | Displays the number of incoming and outgoing TCP segments with the SYN flag set. |
TotalTierBytesLag | After creating a topic. | The total number of bytes of data eligible for tiering on the broker but not yet transferred to tiered storage. These Metrics show the efficiency of upstream data transfer. As latency increases, the amount of data not present in tiered storage also increases. Category: Archive Latency. This is not a KIP-405 Metric. |
TrafficBytes |
After the cluster enters the ACTIVE state. | Displays the network traffic between clients (producers and consumers) and brokers in total bytes. Traffic between brokers is not reported. |
VolumeQueueLength |
After the cluster enters the ACTIVE state. | The number of read and write operation requests waiting to be completed during the specified time period. |
VolumeReadBytes |
After the cluster enters the ACTIVE state. | The number of bytes read during the specified time period. |
VolumeReadOps |
After the cluster enters the ACTIVE state. | The number of read operations during the specified time period. |
VolumeTotalReadTime |
After the cluster enters the ACTIVE state. | The total number of seconds spent completing all read operations during the specified time period. |
VolumeTotalWriteTime |
After the cluster enters the ACTIVE state. | The total number of seconds spent completing all write operations during the specified time period. |
VolumeWriteBytes |
After the cluster enters the ACTIVE state. | The number of bytes written during the specified time period. |
VolumeWriteOps |
After the cluster enters the ACTIVE state. | The number of write operations during the specified time period. |
PER_TOPIC_PER_BROKER
Level Monitoring¶
When you set the monitoring level to PER_TOPIC_PER_BROKER
, in addition to all Metrics from PER_BROKER
and DEFAULT levels, you will also get the Metrics described in the table below. Only DEFAULT
level Metrics are free. The Metrics in this table have the following dimensions: Cluster Name, Broker ID, Topic.
Important: For Amazon MSK clusters using Apache Kafka 2.4.1 or later, the Metrics in the table below only appear after their value first becomes non-zero. For example, to see BytesInPerSec
, one or more producers must first send data to the cluster.
Additional Metrics Available at PER_TOPIC_PER_BROKER Monitoring Level |
||
---|---|---|
Name | When Visible | Description |
FetchMessageConversionsPerSec |
After creating a topic. | The number of fetched message conversions per second. |
MessagesInPerSec |
After creating a topic. | The number of messages received per second. |
ProduceMessageConversionsPerSec |
After creating a topic. | The number of produced message conversions per second. |
RemoteBytesInPerSec | After you create a topic and the topic is producing/consuming. | The number of bytes transferred from tiered storage in response to consumer fetches for the specified topic and broker. This Metric includes all partitions in the topic that affect downstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteBytesOutPerSec | After you create a topic and the topic is producing/consuming. | The number of bytes transferred to tiered storage for the specified topic and broker. This Metric includes all partitions in the topic that affect upstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteReadErrorPerSec | After you create a topic and the topic is producing/consuming. | The error rate of read requests sent by the specified broker to tiered storage to retrieve data in response to consumer fetches for the specified topic. This Metric includes all partitions in the topic that affect downstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
RemoteReadRequestsPerSec | After you create a topic and the topic is producing/consuming. | The number of read requests sent by the specified broker to tiered storage to retrieve data in response to consumer fetches for the specified topic. This Metric includes all partitions in the topic that affect downstream data transfer traffic for the specified broker. Category: Traffic and Error Rates. This is a KIP-405 Metric. |
PER_TOPIC_PER_PARTITION
Level Monitoring¶
When you set the monitoring level to PER_TOPIC_PER_PARTITION
, in addition to all Metrics from PER_TOPIC_PER_BROKER
, PER_BROKER
, and DEFAULT levels, you will also get the Metrics described in the table below. Only DEFAULT
level Metrics are free. The Metrics in this table have the following dimensions: Consumer Group, Topic, Partition.
Additional Metrics Available at PER_TOPIC_PER_PARTITION Monitoring Level |
||
---|---|---|
Name | When Visible | Description |
EstimatedTimeLag |
After a consumer group consumes a topic. | The estimated time (in seconds) to deplete the partition offset lag. |
OffsetLag |
After a consumer group consumes a topic. | The lag of the partition-level consumer in terms of offset. |
Objects¶
AWS MSK object data is currently unavailable.