AWS EMR¶

Use the "Cloud Sync" series script packages in the script market to synchronize cloud monitoring and cloud asset data to TrueWatch

Configuration¶

Install Func¶

It is recommended to activate the TrueWatch integration - extension - DataFlux Func (Automata): all prerequisites are automatically installed, please proceed with the script installation.

If you deploy Func yourself, refer to Self-deploy Func

Install Script¶

Note: Please prepare the required Amazon AK in advance (for simplicity, you can directly grant global read-only permissions ReadOnlyAccess)

Activate Script in Automata¶

Log in to the TrueWatch console
Click on the 【Integration】 menu, select 【Cloud Account Management】
Click 【Add Cloud Account】, select 【AWS】, and fill in the required information on the interface. If you have already configured the cloud account information before, ignore this step.
Click 【Test】, and after the test is successful, click 【Save】. If the test fails, please check if the relevant configuration information is correct and retest.
Click on the 【Cloud Account Management】 list to see the added cloud account, click on the corresponding cloud account to enter the details page.
Click the 【Integration】 button on the cloud account details page, find AWS EMR under the Not Installed list, and click the 【Install】 button to pop up the installation interface and install it.

Manually Activate Script¶

Log in to the Func console, click on the 【Script Market】, enter the TrueWatch script market, and search for: integration_aws_emr
Click 【Install】, then enter the corresponding parameters: AWS AK ID, AK Secret, and account name.
Click 【Deploy Startup Script】, the system will automatically create the Startup script set and configure the corresponding startup scripts.
After activation, you can see the corresponding automatic trigger configuration in 「Management / Automatic Trigger Configuration」. Click 【Execute】 to immediately execute it once without waiting for the scheduled time. After a while, you can check the execution task records and corresponding logs.

Verification¶

In 「Management / Automatic Trigger Configuration」, confirm whether the corresponding task has the corresponding automatic trigger configuration, and you can also check the corresponding task records and logs to see if there are any exceptions.
In TrueWatch, check if the asset information exists in 「Infrastructure / Custom」.
In TrueWatch, check if there is corresponding monitoring data in 「Metrics」.

Metrics¶

After configuring Amazon CloudWatch, the default Measurement is as follows. You can collect more metrics through configuration. Amazon CloudWatch Metrics Details

Metric	Description
`IsIdle`	Indicates that the cluster is no longer performing tasks but is still active and incurring charges. This metric is set to 1 if no tasks and jobs are running; otherwise, it is set to 0. The system checks this value every five minutes, and a value of 1 only indicates that the cluster was idle at the time of the check, not that it was idle for the entire five minutes. To avoid false alarms, you should raise an alarm when multiple consecutive 5-minute checks yield a value of 1. For example, you should raise an alarm when the value is 1 for thirty minutes or longer. Use Case: Monitor cluster performance Unit: Boolean
`ContainerAllocated`	Number of resource containers allocated by ResourceManager. Use Case: Monitor cluster progress Unit: Count
`ContainerReserved`	Number of reserved containers. Use Case: Monitor cluster progress Unit: Count
`ContainerPending`	Number of containers in the queue that have not been allocated. Use Case: Monitor cluster progress Unit: Count
`AppsCompleted`	Number of applications submitted to YARN that have completed. Use Case: Monitor cluster progress Unit: Count
`AppsFailed`	Number of applications submitted to YARN that failed to complete. Use Case: Monitor cluster progress, monitor cluster health Unit: Count
`AppsKilled`	Number of applications submitted to YARN that have been killed. Use Case: Monitor cluster progress, monitor cluster health Unit: Count
`AppsPending`	Number of applications submitted to YARN that are pending. Use Case: Monitor cluster progress Unit: Count
`AppsRunning`	Number of applications submitted to YARN that are running. Use Case: Monitor cluster progress Unit: Count
`AppsSubmitted`	Number of applications submitted to YARN. Use Case: Monitor cluster progress Unit: Count
`CoreNodesRunning`	Number of core nodes in running state. Data points for this metric are only reported if the corresponding instance group exists. Use Case: Monitor cluster health Unit: Count
`LiveDataNodes`	Percentage of data nodes receiving tasks from Hadoop. Use Case: Monitor cluster health Unit: Percentage
`MRActiveNodes`	Number of nodes currently running MapReduce tasks or jobs. Equivalent to YARN metric `mapred.resourcemanager.NoOfActiveNodes`. Use Case: Monitor cluster progress Unit: Count
`MRLostNodes`	Number of nodes assigned to MapReduce that have been marked as LOST. Equivalent to YARN metric `mapred.resourcemanager.NoOfLostNodes`. Use Case: Monitor cluster health, monitor cluster progress Unit: Count
`MRTotalNodes`	Number of nodes currently available for MapReduce jobs. Equivalent to YARN metric `mapred.resourcemanager.TotalNodes`. Use Case: Monitor cluster progress Unit: Count
`MRActiveNodes`	Number of nodes currently running MapReduce tasks or jobs. Equivalent to YARN metric `mapred.resourcemanager.NoOfActiveNodes`. Use Case: Monitor cluster progress Unit: Count
`MRRebootedNodes`	Number of available nodes that have been rebooted and marked as "rebooted" status for MapReduce. Equivalent to YARN metric `mapred.resourcemanager.NoOfRebootedNodes`. Use Case: Monitor cluster health, monitor cluster progress Unit: Count
`MRUnhealthyNodes`	Number of nodes available for MapReduce jobs marked as "unhealthy" status. Equivalent to YARN metric `mapred.resourcemanager.NoOfUnhealthyNodes`. Use Case: Monitor cluster progress Unit: Count
`MRDecommissionedNodes`	Number of nodes assigned to MapReduce applications that have been marked as decommissioned. Equivalent to YARN metric `mapred.resourcemanager.NoOfDecommissionedNodes`. Use Case: Monitor cluster health, monitor cluster progress Unit: Count
`S3BytesWritten`	Number of bytes written to Amazon S3. This metric only aggregates MapReduce tasks and does not apply to other workloads on Amazon EMR. Use Case: Analyze cluster performance, monitor cluster progress Unit: Count
`S3BytesRead`	Number of bytes read from Amazon S3. This metric only aggregates MapReduce tasks and does not apply to other workloads on Amazon EMR. Use Case: Analyze cluster performance, monitor cluster progress Unit: Count
`HDFSUtilization`	Percentage of `HDFS` storage currently in use. Use Case: Analyze cluster performance Unit: Percentage
`TotalLoad`	Total number of concurrent data transfers. Use Case: Monitor cluster health Unit: Count
`MemoryTotalMB`	Total amount of memory in the cluster. Use Case: Monitor cluster progress Unit: Count
`MemoryReservedMB`	Amount of reserved memory. Use Case: Monitor cluster progress Unit: Count
`HDFSBytesRead`	Number of bytes read from `HDFS`. This metric only aggregates MapReduce tasks and does not apply to other workloads on Amazon EMR. Use Case: Analyze cluster performance, monitor cluster progress Unit: Count
`HDFSBytesWritten`	Number of bytes written to `HDFS`. This metric only aggregates MapReduce tasks and does not apply to other workloads on Amazon EMR. Use Case: Analyze cluster performance, monitor cluster progress Unit: Count
`MissingBlocks`	Number of data blocks in `HDFS` that have no replicas. These blocks may be corrupted. Use Case: Monitor cluster health Unit: Count
`MemoryAvailableMB`	Amount of memory available for allocation. Use Case: Monitor cluster progress Unit: Count
`MemoryAllocatedMB`	Amount of memory allocated to the cluster. Use Case: Monitor cluster progress Unit: Count
`PendingDeletionBlocks`	Number of data blocks marked for deletion. Use Case: Monitor cluster progress, monitor cluster health Unit: Count
`UnderReplicatedBlocks`	Number of data blocks that need to be replicated one or more times. Use Case: Monitor cluster progress, monitor cluster health Unit: Count
`DfsPendingReplicationBlocks`
`CapacityRemainingGB`	Remaining `HDFS` disk capacity. Use Case: Monitor cluster progress, monitor cluster health Unit: Count

Object¶

The collected AWS EMR object data structure can be seen in 「Infrastructure - Custom」.

{
  "measurement": "aws_emr",
  "tags": {
    "Id"                 : "xxxxx",
    "ClusterName"        : "xxxxx",
    "ClusterArn"         : "xxxx",
    "RegionId"           : "cn-north-1",
    "OutpostArn"         : "xxxx",
  },
  "fields": {
    "Status"               : "{Instance status JSON data}",
    "message"              : "{Instance JSON data}"
  }
}

Note: The fields in tags and fields may change with subsequent updates