Hadoop Yarn ResourceManager¶
Collect Yarn ResourceManager Metrics.
Installation and Deployment¶
Since ResourceManager is developed in Java, metrics can be collected using the jmx-exporter method.
1. ResourceManager Configuration¶
1.1 Download jmx-exporter¶
Download link: https://github.com/prometheus/jmx_exporter
1.2 Download jmx Script¶
Download link: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-resourcemanager.yml
1.3 ResourceManager Startup Parameter Adjustment¶
Add the following to the resourcemanager startup parameters:
{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17109:/opt/jmx/jmx_resource_manager.yml
1.4 Restart ResourceManager¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure Collector¶
Since jmx-exporter directly exposes the metrics
URL, it can be collected directly using the prom
collector.
Navigate to the conf.d/prom
directory under the DataKit installation directory, and copy prom.conf.sample
to resourcemanager.conf
.
cp prom.conf.sample resourcemanager.conf
Adjust the contents of resourcemanager.conf
as follows:
urls = ["http://localhost:17109/metrics"]
source ="yarn-resourcemanager"
[inputs.prom.tags]
component = "yarn-resourcemanager"
interval = "10s"
Other configurations should be adjusted as needed
, adjustment parameter descriptions:
- urls: The
jmx-exporter
metrics URL, fill in the metrics URL exposed by the corresponding component here - source: Collector alias, it is recommended to distinguish it
- keep_exist_metric_name: Keep the metric name
- interval: Collection interval
- inputs.prom.tags: Add additional tags
3. Restart DataKit¶
Metrics¶
Hadoop Measurement¶
ResourceManager metrics are under the Hadoop Measurement, here we mainly introduce the ResourceManager related metric descriptions
Metrics | Description | Unit |
---|---|---|
resourcemanager_activeapplications |
Number of ResourceManager applications |
count |
resourcemanager_activeusers |
Number of active ResourceManager users |
count |
resourcemanager_aggregatecontainersallocated |
Number of containers allocated by the ResourceManager |
count |
resourcemanager_aggregatecontainerspreempted |
Number of containers preempted by the ResourceManager |
count |
resourcemanager_aggregatecontainersreleased |
Number of containers released by the ResourceManager |
count |
resourcemanager_aggregatememorymbsecondspreempted |
Amount of memory consumed per second by preempted containers |
B/s |
resourcemanager_aggregatenodelocalcontainersallocated |
Number of containers running locally on all nodes |
count |
resourcemanager_aggregateoffswitchcontainersallocated |
Number of aggregate switch containers allocated by the ResourceManager |
count |
resourcemanager_aggregateracklocalcontainersallocated |
Number of aggregate local container racks |
count |
resourcemanager_aggregatevcoresecondspreempted |
Number of CPU cores preempted by the ResourceManager |
byte |
resourcemanager_allocatedcontainers |
Number of containers allocated to applications by the ResourceManager |
count |
resourcemanager_allocatedmb |
Amount of memory allocated by the ResourceManager |
B/s |
resourcemanager_allocatedvcores |
Number of CPU cores allocated by the ResourceManager |
count |
resourcemanager_amlaunchdelayavgtime |
Average application launch delay time |
ms |
resourcemanager_amlaunchdelaynumops |
Number of application launch delays |
count |
resourcemanager_amregisterdelayavgtime |
Average ResourceManager registration delay time |
ms |
resourcemanager_amregisterdelaynumops |
Number of ResourceManager registration delays |
s |
resourcemanager_amresourceusagemb |
Number of container launch operations by the NodeManager |
count |
resourcemanager_amresourceusagevcores |
Number of containers completed by the NodeManager |
count |
resourcemanager_appattemptfirstcontainerallocationdelayavgtime |
Number of container failures by the NodeManager |
count |
resourcemanager_appattemptfirstcontainerallocationdelaynumops |
Number of container exits by the NodeManager |
count |
resourcemanager_appscompleted |
Number of containers running by the NodeManager |
count |
resourcemanager_appsfailed |
Number of application failures by the ResourceManager |
count |
resourcemanager_appskilled |
Number of applications terminated by the ResourceManager |
count |
resourcemanager_appspending |
Number of applications waiting to be executed |
count |
resourcemanager_appsrunning |
Number of applications running |
count |
resourcemanager_appssubmitted |
Number of applications submitted by the ResourceManager |
count |
resourcemanager_availablemb |
Total available memory by the ResourceManager |
count |
resourcemanager_availablevcores |
Number of available CPU cores by the ResourceManager |
count |
resourcemanager_callqueuelength |
ResourceManager call queue length |
count |
resourcemanager_continuousschedulingrunavgtime |
Average ResourceManager continuous scheduling run time |
ms |
resourcemanager_continuousschedulingrunimaxtime |
Maximum ResourceManager continuous scheduling run time |
ms |
resourcemanager_continuousschedulingrunimintime |
Minimum ResourceManager continuous scheduling run time |
ms |
resourcemanager_continuousschedulingruninumops |
Number of ResourceManager continuous scheduling operations |
count |
resourcemanager_continuousschedulingrunmaxtime |
Maximum ResourceManager continuous scheduling run time |
ms |
resourcemanager_continuousschedulingrunmintime |
Minimum ResourceManager continuous scheduling run time |
ms |
resourcemanager_continuousschedulingrunnumops |
Number of ResourceManager continuous scheduling operations |
count |
resourcemanager_deferredrpcprocessingtimenumops |
Number of deferred RPC processing time operations by the ResourceManager |
count |
resourcemanager_droppedpuball |
Number of puball drops by the ResourceManager |
count |
resourcemanager_fairsharemb |
Amount of memory allocated by the ResourceManager |
count |
resourcemanager_fairsharevcores |
Number of CPU cores allocated by the ResourceManager |
count |
resourcemanager_gccount |
Number of garbage collections by the ResourceManager |
count |
resourcemanager_gccountconcurrentmarksweep |
Number of garbage collection mark sweeps |
count |
resourcemanager_gccountparnew |
Number of parnew garbage collectors |
ms |
resourcemanager_gcnuminfothresholdexceeded |
Number of times GC collection information exceeded the threshold by the ResourceManager |
count |
resourcemanager_gcnumwarnthresholdexceeded |
Number of times GC pauses exceeded the threshold by the ResourceManager |
count |
resourcemanager_gctimemillis |
Time from last GC start to completion |
ms |
resourcemanager_gctimemillisconcurrentmarksweep |
Number of successful log write operations by the NodeManager |
count |
resourcemanager_gctimemillisparnew |
Time from parnew start to completion |
ms |
resourcemanager_gctotalextrasleeptime |
Total extra sleep time by the ResourceManager |
ms |
resourcemanager_getgroupsavgtime |
Average time to get groups by the ResourceManager |
count |
resourcemanager_logerror |
Number of memory heap used by the NodeManager |
count |
resourcemanager_logfatal |
Maximum memory by the NodeManager |
byte |
resourcemanager_loginfailureavgtime |
Average login failure time by the ResourceManager |
ms |
resourcemanager_loginfailurenumops |
Number of login failures by the ResourceManager |
count |
resourcemanager_loginfo |
Number of login information by the ResourceManager |
count |
resourcemanager_loginsuccessavgtime |
Average login success time by the ResourceManager |
ms |
resourcemanager_loginsuccessnumops |
Number of successful logins by the ResourceManager |
count |
resourcemanager_logwarn |
Number of log warnings by the ResourceManager |
count |
resourcemanager_maxamsharemb |
Maximum AM resource usage by the ResourceManager |
byte |
resourcemanager_maxamsharevcores |
Maximum shared CPU cores by the ResourceManager |
count |
resourcemanager_maxapps |
Maximum number of applications by the ResourceManager |
count |
resourcemanager_memheapcommittedm |
Amount of memory allocated by the ResourceManager |
byte |
resourcemanager_memheapmaxm |
Maximum amount of memory by the ResourceManager |
byte |
resourcemanager_memheapusedm |
Amount of memory used by the ResourceManager |
byte |
resourcemanager_memmaxm |
Maximum memory by the ResourceManager |
byte |
resourcemanager_memnonheapcommittedm |
Amount of memory declared to be allocated by the ResourceManager |
byte |
resourcemanager_memnonheapmaxm |
Maximum amount of memory declared by the ResourceManager |
byte |
resourcemanager_memnonheapusedm |
Amount of memory used declared by the ResourceManager |
byte |
resourcemanager_minsharemb |
Minimum amount of resources by the ResourceManager |
count |
resourcemanager_minsharevcores |
Minimum number of CPU cores by the ResourceManager |
byte |
resourcemanager_nodeheartbeatavgtime |
Average node heartbeat time by the ResourceManager |
s |
resourcemanager_nodeheartbeatnumops |
Number of node heartbeats by the ResourceManager |
count |
resourcemanager_nodeupdatecallavgtime |
Average node update response time by the ResourceManager |
s |
resourcemanager_nodeupdatecallimaxtime |
Maximum node response time by the ResourceManager |
s |
resourcemanager_nodeupdatecallimintime |
Minimum node update response time by the ResourceManager |
s |
resourcemanager_nodeupdatecallinumops |
Number of node update responses by the ResourceManager |
count |
resourcemanager_numactivenms |
Number of currently alive NodeManagers by the ResourceManager |
count |
resourcemanager_numactivesinks |
Number of currently alive sinks by the ResourceManager |
count |
resourcemanager_numactivesources |
Number of alive resources by the ResourceManager |
count |
resourcemanager_numallsinks |
Total number of sinks by the ResourceManager |
count |
resourcemanager_numallsources |
Total amount of resource data by the ResourceManager |
count |
resourcemanager_numdecommissionednms |
Number of decommissioned nodes by the ResourceManager |
count |
resourcemanager_numdecommissioningnms |
Number of nodes being decommissioned by the ResourceManager |
count |
resourcemanager_numdroppedconnections |
Number of dropped connections by the ResourceManager |
count |
resourcemanager_numlostnms |
Number of lost nodes by the ResourceManager |
count |
resourcemanager_numopenconnections |
Number of open connections by the ResourceManager |
count |
resourcemanager_numrebootednms |
Number of rebooted nodes by the ResourceManager |
count |
resourcemanager_numshutdownnms |
Number of shutdown nodes by the ResourceManager |
count |
resourcemanager_numunhealthynms |
Number of healthy nodes by the ResourceManager |
count |
resourcemanager_pendingcontainers |
Number of containers waiting to be allocated by the ResourceManager |
count |
resourcemanager_pendingmb |
Number of resources waiting to be allocated by the ResourceManager |
count |
resourcemanager_pendingvcores |
Number of CPU cores waiting to be allocated by the ResourceManager |
count |
resourcemanager_publishavgtime |
Average data publish time by the ResourceManager |
s |
resourcemanager_rpcprocessingtimeavgtime |
Average RPC execution time by the ResourceManager |
s |
resourcemanager_rpcprocessingtimenumops |
Number of executions by the ResourceManager |
count |
resourcemanager_rpcqueuetimeavgtime |
Average RPC response time by the ResourceManager |
count |
resourcemanager_rpcqueuetimenumops |
Number of RPC response operations by the ResourceManager |
count |
resourcemanager_rpcslowcalls |
RPC slow call time by the ResourceManager |
s |
resourcemanager_running_0 |
Number of applications running for 0 seconds |
count |
resourcemanager_running_1440 |
Number of applications running for 1400 seconds |
count |
resourcemanager_running_300 |
Number of applications running for 300 seconds |
count |
resourcemanager_running_60 |
Number of applications running for 60 seconds |
count |
resourcemanager_securityenabled |
Number of security mechanisms enabled by the ResourceManager |
count |
resourcemanager_sentbytes |
Number of bytes sent by the ResourceManager |
byte |
resourcemanager_snapshotavgtime |
Average data snapshot time by the ResourceManager |
s |
resourcemanager_snapshotnumops |
Number of data snapshot operations by the ResourceManager |
count |
resourcemanager_steadyfairsharemb |
Weighted shared memory by the ResourceManager |
byte |
resourcemanager_steadyfairsharevcores |
Weighted shared CPU cores by the ResourceManager |
count |
resourcemanager_threadsblocked |
Number of thread locks by the ResourceManager |
count |
resourcemanager_threadsnew |
Number of new threads by the ResourceManager |
count |
resourcemanager_threadsrunnable |
Number of running threads by the ResourceManager |
count |
resourcemanager_threadsterminated |
Number of terminated threads by the ResourceManager |
count |
resourcemanager_threadstimedwaiting |
Number of timed waiting threads by the ResourceManager |
count |
resourcemanager_threadswaiting |
Number of waiting threads by the ResourceManager |
count |
resourcemanager_updatethreadrunavgtime |
Average update thread time by the ResourceManager |
s |
resourcemanager_updatethreadrunimaxtime |
Maximum update thread time by the ResourceManager |
s |
resourcemanager_updatethreadrunimintime |
Minimum update thread time by the ResourceManager |
s |
resourcemanager_updatethreadruninumops |
Number of update thread operations by the ResourceManager |
count |
resourcemanager_updatethreadrunmaxtime |
Maximum update thread time by the ResourceManager |
s |
resourcemanager_updatethreadrunmintime |
Minimum update thread time by the ResourceManager |
s |
resourcemanager_updatethreadrunnumops |
Number of update thread operations by the ResourceManager |
count |