Skip to content

Hadoop Yarn NodeManager

Collect Yarn NodeManager Metrics.

Installation and Configuration

Since NodeManager is developed in Java, metrics can be collected using the jmx-exporter method.

1. NodeManager Configuration

1.1 Download jmx-exporter

Download link: https://github.com/prometheus/jmx_exporter

1.2 Download jmx Script

Download link: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-nodemanager.yml

1.3 Adjust NodeManager Startup Parameters

Add the following to the nodemanager startup parameters:

{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17108:/opt/jmx/jmx_node_manager.yml

1.4 Restart NodeManager

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure the Collector

Since jmx-exporter directly exposes the metrics URL, you can collect data using the prom collector.

Navigate to the conf.d/prom directory under the DataKit installation directory, and copy prom.conf.sample to nodemanager.conf.

cp prom.conf.sample nodemanager.conf

Adjust the content of nodemanager.conf as follows:

  urls = ["http://localhost:17108/metrics"]
  source ="yarn-nodemanager"
  [inputs.prom.tags]
    component = "yarn-nodemanager" 
  interval = "10s"

Adjust other configurations as needed

, parameter adjustment instructions:

  • urls: The jmx-exporter metrics URL, fill in the metrics URL exposed by the corresponding component here
  • source: Collector alias, recommended for differentiation
  • keep_exist_metric_name: Keep the metric name
  • interval: Collection interval
  • inputs.prom.tags: Add additional tags

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Measurement

NodeManager metrics are located under the Hadoop Measurement, here we mainly introduce NodeManager related metrics.

Metrics Description Unit
nodemanager_allocatedcontainers Number of containers allocated by the node manager count
nodemanager_allocatedgb Amount allocated by the node manager count
nodemanager_allocatedopportunisticgb Number of bytes that can be allocated by the node manager count
nodemanager_allocatedopportunisticvcores Number of opportunistic allocations by the node manager count
nodemanager_allocatedvcores Number of vcores allocated by the node manager count
nodemanager_availablegb Number of bytes available to the node manager count
nodemanager_availablevcores Number of vcores available to the node manager count
nodemanager_badlocaldirs Number of bad local directories for the node manager count
nodemanager_badlogdirs Number of bad log directories for the node manager count
nodemanager_blocktransferratebytes_count Number of block transfer bytes by the node manager byte
nodemanager_blocktransferratebytes_rate1 Block transfer byte rate 1 by the node manager B/s
nodemanager_blocktransferratebytes_rate15 Block transfer byte rate 15 by the node manager B/s
nodemanager_blocktransferratebytes_rate5 Block transfer byte rate 5 by the node manager B/s
nodemanager_blocktransferratebytes_ratemean Average block transfer byte rate by the node manager byte
nodemanager_cachesizebeforeclean Cache size before cleaning by the node manager byte
nodemanager_callqueuelength Call queue length by the node manager count
nodemanager_containerlaunchdurationavgtime Average container launch time by the node manager s
nodemanager_containerlaunchdurationnumops Number of container launch operations by the node manager count
nodemanager_containerscompleted Number of containers completed by the node manager count
nodemanager_containersfailed Number of containers failed by the node manager count
nodemanager_containersiniting Number of containers exiting by the node manager count
nodemanager_containerskilled Number of containers running by the node manager count
nodemanager_containerslaunched Number of containers launched by the node manager count
nodemanager_containersreiniting Number of containers restarted by the node manager count
nodemanager_containersrolledbackonfailure Number of containers rolled back on failure by the node manager count
nodemanager_containersrunning Number of containers running by the node manager ms
nodemanager_deferredrpcprocessingtimeavgtime Average deferred RPC processing time by the node manager s
nodemanager_deferredrpcprocessingtimenumops Number of deferred RPC operations by the node manager count
nodemanager_droppedpuball Number of dropped puball by the node manager count
nodemanager_gccount Garbage collection count by the node manager count
nodemanager_gccountconcurrentmarksweep Number of garbage collection counts and marks by the node manager count
nodemanager_gccountparnew Number of garbage collection copies by the node manager count
nodemanager_gcnuminfothresholdexceeded Number of garbage collection info exceeding threshold by the node manager count
nodemanager_gcnumwarnthresholdexceeded Number of garbage collection warnings exceeding threshold by the node manager count
nodemanager_gctimemillis Garbage collection time in milliseconds by the node manager ms
nodemanager_gctimemillisconcurrentmarksweep Garbage collection mark time in milliseconds by the node manager ms
nodemanager_gctimemillisparnew Copy time in milliseconds by the node manager ms
nodemanager_gctotalextrasleeptime Total garbage collection sleep time by the node manager s
nodemanager_getgroupsavgtime Average time to get groups by the node manager s
nodemanager_getgroupsnumops Number of get group operations by the node manager count
nodemanager_goodlocaldirsdiskutilizationperc Disk utilization percentage of healthy local directories by the node manager count
nodemanager_logerror Number of log errors by the node manager count
nodemanager_logfatal Number of log deletions by the node manager count
nodemanager_loginfailureavgtime Average time of log write failures by the node manager ms
nodemanager_loginfailurenumops Number of log write failure operations by the node manager count
nodemanager_loginfo Number of log info by the node manager count
nodemanager_loginsuccessavgtime Average time of successful log writes by the node manager count
nodemanager_loginsuccessnumops Number of successful log write operations by the node manager count
nodemanager_logwarn Number of log warnings by the node manager count
nodemanager_memheapcommittedm Amount of committed memory heap by the node manager count
nodemanager_memheapmaxm Maximum memory heap by the node manager count
nodemanager_memheapusedm Amount of used memory heap by the node manager count
nodemanager_memmaxm Maximum memory by the node manager byte
nodemanager_memnonheapcommittedm Amount of uncommitted memory heap by the node manager count
nodemanager_memnonheapmaxm Maximum amount of uncommitted memory heap by the node manager count
nodemanager_memnonheapusedm Maximum amount of unused memory heap by the node manager count
nodemanager_numactiveconnections Number of connections by the node manager count
nodemanager_numactivesinks Number of active pools by the node manager count
nodemanager_numactivesources Number of active resources by the node manager count
nodemanager_numallsinks Total number of pools by the node manager count
nodemanager_numallsources Total number of resources count
nodemanager_numdroppedconnections Number of dropped connections by the node manager count
nodemanager_numopenconnections Number of open connections by the node manager count
nodemanager_numregisteredconnections Number of registered connections by the node manager count
nodemanager_openblockrequestlatencymillis_count Number of open block latency by the node manager count
nodemanager_openblockrequestlatencymillis_rate1 Open block latency request rate 1 by the node manager B/s
nodemanager_openblockrequestlatencymillis_rate15 Open block latency request rate 15 by the node manager B/s
nodemanager_openblockrequestlatencymillis_rate5 Open block latency request rate 5 by the node manager B/s
nodemanager_openblockrequestlatencymillis_ratemean Average open block request latency rate by the node manager B/s
nodemanager_privatebytesdeleted Number of private bytes deleted by the node manager byte
nodemanager_publicbytesdeleted Number of bytes deleted by the node manager byte
nodemanager_publishavgtime Average publish time by the node manager s
nodemanager_publishnumops Number of publish data operations by the node manager ms
nodemanager_receivedbytes Number of bytes received by the node manager byte
nodemanager_registeredexecutorssize Number of registered executor classification tables by the node manager count
nodemanager_registerexecutorrequestlatencymillis_count Number of register executor request latency milliseconds by the node manager count
nodemanager_registerexecutorrequestlatencymillis_rate1 Register executor request latency rate 1 by the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_rate15 Register executor request latency rate 15 by the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_rate5 Register executor request latency rate 5 by the node manager B/s
nodemanager_registerexecutorrequestlatencymillis_ratemean Average register executor latency milliseconds by the node manager count
nodemanager_renewalfailures Number of renewal failures by the node manager count
nodemanager_renewalfailurestotal Total number of renewal failures by the node manager count
nodemanager_rpcauthenticationfailures Number of authentication failures by the node manager count
nodemanager_rpcauthorizationsuccesses Number of authentication successes by the node manager count
nodemanager_rpcclientbackoff Number of RPC client backoffs by the node manager count
nodemanager_rpcprocessingtimeavgtime Average RPC processing time by the node manager s
nodemanager_rpcprocessingtimenumops Number of RPC processing operations by the node manager count
nodemanager_rpcqueuetimeavgtime Average RPC queue time by the node manager count
nodemanager_rpcqueuetimenumops Number of RPC queue time operations by the node manager count
nodemanager_rpcslowcalls Number of slow calls by the node manager count
nodemanager_runningopportunisticcontainers Number of running opportunistic containers by the node manager count
nodemanager_securityenabled Number of security enabled by the node manager count
nodemanager_sentbytes Number of bytes sent by the node manager byte
nodemanager_shuffleconnections Number of shuffle connections by the node manager count
nodemanager_shuffleoutputbytes Number of shuffle output bytes by the node manager byte
nodemanager_shuffleoutputsfailed Number of shuffle output failures by the node manager count
nodemanager_shuffleoutputsok Number of successful shuffle outputs by the node manager count
nodemanager_snapshotavgtime Average snapshot time by the node manager s
nodemanager_snapshotnumops Number of snapshot operations by the node manager count
nodemanager_threadsblocked Number of blocked threads by the node manager count
nodemanager_threadsnew Number of new threads by the node manager count
nodemanager_threadsrunnable Number of non-runnable threads by the node manager count
nodemanager_threadsterminated Number of initialized threads by the node manager count
nodemanager_threadstimedwaiting Thread wait time by the node manager s
nodemanager_threadswaiting Number of thread switches by the node manager count
nodemanager_totalbytesdeleted Total number of bytes deleted by the node manager byte