Skip to content

Hadoop Yarn ResourceManager

Collect Yarn ResourceManager Metrics.

Installation and Deployment

Since ResourceManager is developed in Java, metrics can be collected using the jmx-exporter method.

1. ResourceManager Configuration

1.1 Download jmx-exporter

Download link: https://github.com/prometheus/jmx_exporter

1.2 Download jmx Script

Download link: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-resourcemanager.yml

1.3 ResourceManager Startup Parameter Adjustment

Add the following to the resourcemanager startup parameters:

{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17109:/opt/jmx/jmx_resource_manager.yml

1.4 Restart ResourceManager

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure Collector

Since jmx-exporter directly exposes the metrics URL, it can be collected directly using the prom collector.

Navigate to the conf.d/prom directory under the DataKit installation directory, and copy prom.conf.sample to resourcemanager.conf.

cp prom.conf.sample resourcemanager.conf

Adjust the contents of resourcemanager.conf as follows:

  urls = ["http://localhost:17109/metrics"]
  source ="yarn-resourcemanager"
  [inputs.prom.tags]
    component = "yarn-resourcemanager" 
  interval = "10s"

Other configurations should be adjusted as needed

, adjustment parameter descriptions:

  • urls: The jmx-exporter metrics URL, fill in the metrics URL exposed by the corresponding component here
  • source: Collector alias, it is recommended to distinguish it
  • keep_exist_metric_name: Keep the metric name
  • interval: Collection interval
  • inputs.prom.tags: Add additional tags

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Measurement

ResourceManager metrics are under the Hadoop Measurement, here we mainly introduce the ResourceManager related metric descriptions

Metrics Description Unit
resourcemanager_activeapplications Number of ResourceManager applications count
resourcemanager_activeusers Number of active ResourceManager users count
resourcemanager_aggregatecontainersallocated Number of containers allocated by the ResourceManager count
resourcemanager_aggregatecontainerspreempted Number of containers preempted by the ResourceManager count
resourcemanager_aggregatecontainersreleased Number of containers released by the ResourceManager count
resourcemanager_aggregatememorymbsecondspreempted Amount of memory consumed per second by preempted containers B/s
resourcemanager_aggregatenodelocalcontainersallocated Number of containers running locally on all nodes count
resourcemanager_aggregateoffswitchcontainersallocated Number of aggregate switch containers allocated by the ResourceManager count
resourcemanager_aggregateracklocalcontainersallocated Number of aggregate local container racks count
resourcemanager_aggregatevcoresecondspreempted Number of CPU cores preempted by the ResourceManager byte
resourcemanager_allocatedcontainers Number of containers allocated to applications by the ResourceManager count
resourcemanager_allocatedmb Amount of memory allocated by the ResourceManager B/s
resourcemanager_allocatedvcores Number of CPU cores allocated by the ResourceManager count
resourcemanager_amlaunchdelayavgtime Average application launch delay time ms
resourcemanager_amlaunchdelaynumops Number of application launch delays count
resourcemanager_amregisterdelayavgtime Average ResourceManager registration delay time ms
resourcemanager_amregisterdelaynumops Number of ResourceManager registration delays s
resourcemanager_amresourceusagemb Number of container launch operations by the NodeManager count
resourcemanager_amresourceusagevcores Number of containers completed by the NodeManager count
resourcemanager_appattemptfirstcontainerallocationdelayavgtime Number of container failures by the NodeManager count
resourcemanager_appattemptfirstcontainerallocationdelaynumops Number of container exits by the NodeManager count
resourcemanager_appscompleted Number of containers running by the NodeManager count
resourcemanager_appsfailed Number of application failures by the ResourceManager count
resourcemanager_appskilled Number of applications terminated by the ResourceManager count
resourcemanager_appspending Number of applications waiting to be executed count
resourcemanager_appsrunning Number of applications running count
resourcemanager_appssubmitted Number of applications submitted by the ResourceManager count
resourcemanager_availablemb Total available memory by the ResourceManager count
resourcemanager_availablevcores Number of available CPU cores by the ResourceManager count
resourcemanager_callqueuelength ResourceManager call queue length count
resourcemanager_continuousschedulingrunavgtime Average ResourceManager continuous scheduling run time ms
resourcemanager_continuousschedulingrunimaxtime Maximum ResourceManager continuous scheduling run time ms
resourcemanager_continuousschedulingrunimintime Minimum ResourceManager continuous scheduling run time ms
resourcemanager_continuousschedulingruninumops Number of ResourceManager continuous scheduling operations count
resourcemanager_continuousschedulingrunmaxtime Maximum ResourceManager continuous scheduling run time ms
resourcemanager_continuousschedulingrunmintime Minimum ResourceManager continuous scheduling run time ms
resourcemanager_continuousschedulingrunnumops Number of ResourceManager continuous scheduling operations count
resourcemanager_deferredrpcprocessingtimenumops Number of deferred RPC processing time operations by the ResourceManager count
resourcemanager_droppedpuball Number of puball drops by the ResourceManager count
resourcemanager_fairsharemb Amount of memory allocated by the ResourceManager count
resourcemanager_fairsharevcores Number of CPU cores allocated by the ResourceManager count
resourcemanager_gccount Number of garbage collections by the ResourceManager count
resourcemanager_gccountconcurrentmarksweep Number of garbage collection mark sweeps count
resourcemanager_gccountparnew Number of parnew garbage collectors ms
resourcemanager_gcnuminfothresholdexceeded Number of times GC collection information exceeded the threshold by the ResourceManager count
resourcemanager_gcnumwarnthresholdexceeded Number of times GC pauses exceeded the threshold by the ResourceManager count
resourcemanager_gctimemillis Time from last GC start to completion ms
resourcemanager_gctimemillisconcurrentmarksweep Number of successful log write operations by the NodeManager count
resourcemanager_gctimemillisparnew Time from parnew start to completion ms
resourcemanager_gctotalextrasleeptime Total extra sleep time by the ResourceManager ms
resourcemanager_getgroupsavgtime Average time to get groups by the ResourceManager count
resourcemanager_logerror Number of memory heap used by the NodeManager count
resourcemanager_logfatal Maximum memory by the NodeManager byte
resourcemanager_loginfailureavgtime Average login failure time by the ResourceManager ms
resourcemanager_loginfailurenumops Number of login failures by the ResourceManager count
resourcemanager_loginfo Number of login information by the ResourceManager count
resourcemanager_loginsuccessavgtime Average login success time by the ResourceManager ms
resourcemanager_loginsuccessnumops Number of successful logins by the ResourceManager count
resourcemanager_logwarn Number of log warnings by the ResourceManager count
resourcemanager_maxamsharemb Maximum AM resource usage by the ResourceManager byte
resourcemanager_maxamsharevcores Maximum shared CPU cores by the ResourceManager count
resourcemanager_maxapps Maximum number of applications by the ResourceManager count
resourcemanager_memheapcommittedm Amount of memory allocated by the ResourceManager byte
resourcemanager_memheapmaxm Maximum amount of memory by the ResourceManager byte
resourcemanager_memheapusedm Amount of memory used by the ResourceManager byte
resourcemanager_memmaxm Maximum memory by the ResourceManager byte
resourcemanager_memnonheapcommittedm Amount of memory declared to be allocated by the ResourceManager byte
resourcemanager_memnonheapmaxm Maximum amount of memory declared by the ResourceManager byte
resourcemanager_memnonheapusedm Amount of memory used declared by the ResourceManager byte
resourcemanager_minsharemb Minimum amount of resources by the ResourceManager count
resourcemanager_minsharevcores Minimum number of CPU cores by the ResourceManager byte
resourcemanager_nodeheartbeatavgtime Average node heartbeat time by the ResourceManager s
resourcemanager_nodeheartbeatnumops Number of node heartbeats by the ResourceManager count
resourcemanager_nodeupdatecallavgtime Average node update response time by the ResourceManager s
resourcemanager_nodeupdatecallimaxtime Maximum node response time by the ResourceManager s
resourcemanager_nodeupdatecallimintime Minimum node update response time by the ResourceManager s
resourcemanager_nodeupdatecallinumops Number of node update responses by the ResourceManager count
resourcemanager_numactivenms Number of currently alive NodeManagers by the ResourceManager count
resourcemanager_numactivesinks Number of currently alive sinks by the ResourceManager count
resourcemanager_numactivesources Number of alive resources by the ResourceManager count
resourcemanager_numallsinks Total number of sinks by the ResourceManager count
resourcemanager_numallsources Total amount of resource data by the ResourceManager count
resourcemanager_numdecommissionednms Number of decommissioned nodes by the ResourceManager count
resourcemanager_numdecommissioningnms Number of nodes being decommissioned by the ResourceManager count
resourcemanager_numdroppedconnections Number of dropped connections by the ResourceManager count
resourcemanager_numlostnms Number of lost nodes by the ResourceManager count
resourcemanager_numopenconnections Number of open connections by the ResourceManager count
resourcemanager_numrebootednms Number of rebooted nodes by the ResourceManager count
resourcemanager_numshutdownnms Number of shutdown nodes by the ResourceManager count
resourcemanager_numunhealthynms Number of healthy nodes by the ResourceManager count
resourcemanager_pendingcontainers Number of containers waiting to be allocated by the ResourceManager count
resourcemanager_pendingmb Number of resources waiting to be allocated by the ResourceManager count
resourcemanager_pendingvcores Number of CPU cores waiting to be allocated by the ResourceManager count
resourcemanager_publishavgtime Average data publish time by the ResourceManager s
resourcemanager_rpcprocessingtimeavgtime Average RPC execution time by the ResourceManager s
resourcemanager_rpcprocessingtimenumops Number of executions by the ResourceManager count
resourcemanager_rpcqueuetimeavgtime Average RPC response time by the ResourceManager count
resourcemanager_rpcqueuetimenumops Number of RPC response operations by the ResourceManager count
resourcemanager_rpcslowcalls RPC slow call time by the ResourceManager s
resourcemanager_running_0 Number of applications running for 0 seconds count
resourcemanager_running_1440 Number of applications running for 1400 seconds count
resourcemanager_running_300 Number of applications running for 300 seconds count
resourcemanager_running_60 Number of applications running for 60 seconds count
resourcemanager_securityenabled Number of security mechanisms enabled by the ResourceManager count
resourcemanager_sentbytes Number of bytes sent by the ResourceManager byte
resourcemanager_snapshotavgtime Average data snapshot time by the ResourceManager s
resourcemanager_snapshotnumops Number of data snapshot operations by the ResourceManager count
resourcemanager_steadyfairsharemb Weighted shared memory by the ResourceManager byte
resourcemanager_steadyfairsharevcores Weighted shared CPU cores by the ResourceManager count
resourcemanager_threadsblocked Number of thread locks by the ResourceManager count
resourcemanager_threadsnew Number of new threads by the ResourceManager count
resourcemanager_threadsrunnable Number of running threads by the ResourceManager count
resourcemanager_threadsterminated Number of terminated threads by the ResourceManager count
resourcemanager_threadstimedwaiting Number of timed waiting threads by the ResourceManager count
resourcemanager_threadswaiting Number of waiting threads by the ResourceManager count
resourcemanager_updatethreadrunavgtime Average update thread time by the ResourceManager s
resourcemanager_updatethreadrunimaxtime Maximum update thread time by the ResourceManager s
resourcemanager_updatethreadrunimintime Minimum update thread time by the ResourceManager s
resourcemanager_updatethreadruninumops Number of update thread operations by the ResourceManager count
resourcemanager_updatethreadrunmaxtime Maximum update thread time by the ResourceManager s
resourcemanager_updatethreadrunmintime Minimum update thread time by the ResourceManager s
resourcemanager_updatethreadrunnumops Number of update thread operations by the ResourceManager count