Hadoop Yarn NodeManager¶
Collect Yarn NodeManager Metrics.
Installation and Configuration¶
Since NodeManager is developed in Java, metrics can be collected using the jmx-exporter method.
1. NodeManager Configuration¶
1.1 Download jmx-exporter¶
Download link: https://github.com/prometheus/jmx_exporter
1.2 Download jmx Script¶
Download link: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-yarn-nodemanager.yml
1.3 Adjust NodeManager Startup Parameters¶
Add the following to the nodemanager startup parameters:
{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17108:/opt/jmx/jmx_node_manager.yml
1.4 Restart NodeManager¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure the Collector¶
Since jmx-exporter directly exposes the metrics
URL, you can collect data using the prom
collector.
Navigate to the conf.d/prom
directory under the DataKit installation directory, and copy prom.conf.sample
to nodemanager.conf
.
cp prom.conf.sample nodemanager.conf
Adjust the content of nodemanager.conf
as follows:
urls = ["http://localhost:17108/metrics"]
source ="yarn-nodemanager"
[inputs.prom.tags]
component = "yarn-nodemanager"
interval = "10s"
Adjust other configurations as needed
, parameter adjustment instructions:
- urls: The
jmx-exporter
metrics URL, fill in the metrics URL exposed by the corresponding component here - source: Collector alias, recommended for differentiation
- keep_exist_metric_name: Keep the metric name
- interval: Collection interval
- inputs.prom.tags: Add additional tags
3. Restart DataKit¶
Metrics¶
Hadoop Measurement¶
NodeManager metrics are located under the Hadoop Measurement, here we mainly introduce NodeManager related metrics.
Metrics | Description | Unit |
---|---|---|
nodemanager_allocatedcontainers |
Number of containers allocated by the node manager |
count |
nodemanager_allocatedgb |
Amount allocated by the node manager |
count |
nodemanager_allocatedopportunisticgb |
Number of bytes that can be allocated by the node manager |
count |
nodemanager_allocatedopportunisticvcores |
Number of opportunistic allocations by the node manager |
count |
nodemanager_allocatedvcores |
Number of vcores allocated by the node manager |
count |
nodemanager_availablegb |
Number of bytes available to the node manager |
count |
nodemanager_availablevcores |
Number of vcores available to the node manager |
count |
nodemanager_badlocaldirs |
Number of bad local directories for the node manager |
count |
nodemanager_badlogdirs |
Number of bad log directories for the node manager |
count |
nodemanager_blocktransferratebytes_count |
Number of block transfer bytes by the node manager |
byte |
nodemanager_blocktransferratebytes_rate1 |
Block transfer byte rate 1 by the node manager |
B/s |
nodemanager_blocktransferratebytes_rate15 |
Block transfer byte rate 15 by the node manager |
B/s |
nodemanager_blocktransferratebytes_rate5 |
Block transfer byte rate 5 by the node manager |
B/s |
nodemanager_blocktransferratebytes_ratemean |
Average block transfer byte rate by the node manager |
byte |
nodemanager_cachesizebeforeclean |
Cache size before cleaning by the node manager |
byte |
nodemanager_callqueuelength |
Call queue length by the node manager |
count |
nodemanager_containerlaunchdurationavgtime |
Average container launch time by the node manager |
s |
nodemanager_containerlaunchdurationnumops |
Number of container launch operations by the node manager |
count |
nodemanager_containerscompleted |
Number of containers completed by the node manager |
count |
nodemanager_containersfailed |
Number of containers failed by the node manager |
count |
nodemanager_containersiniting |
Number of containers exiting by the node manager |
count |
nodemanager_containerskilled |
Number of containers running by the node manager |
count |
nodemanager_containerslaunched |
Number of containers launched by the node manager |
count |
nodemanager_containersreiniting |
Number of containers restarted by the node manager |
count |
nodemanager_containersrolledbackonfailure |
Number of containers rolled back on failure by the node manager |
count |
nodemanager_containersrunning |
Number of containers running by the node manager |
ms |
nodemanager_deferredrpcprocessingtimeavgtime |
Average deferred RPC processing time by the node manager |
s |
nodemanager_deferredrpcprocessingtimenumops |
Number of deferred RPC operations by the node manager |
count |
nodemanager_droppedpuball |
Number of dropped puball by the node manager |
count |
nodemanager_gccount |
Garbage collection count by the node manager |
count |
nodemanager_gccountconcurrentmarksweep |
Number of garbage collection counts and marks by the node manager |
count |
nodemanager_gccountparnew |
Number of garbage collection copies by the node manager |
count |
nodemanager_gcnuminfothresholdexceeded |
Number of garbage collection info exceeding threshold by the node manager |
count |
nodemanager_gcnumwarnthresholdexceeded |
Number of garbage collection warnings exceeding threshold by the node manager |
count |
nodemanager_gctimemillis |
Garbage collection time in milliseconds by the node manager |
ms |
nodemanager_gctimemillisconcurrentmarksweep |
Garbage collection mark time in milliseconds by the node manager |
ms |
nodemanager_gctimemillisparnew |
Copy time in milliseconds by the node manager |
ms |
nodemanager_gctotalextrasleeptime |
Total garbage collection sleep time by the node manager |
s |
nodemanager_getgroupsavgtime |
Average time to get groups by the node manager |
s |
nodemanager_getgroupsnumops |
Number of get group operations by the node manager |
count |
nodemanager_goodlocaldirsdiskutilizationperc |
Disk utilization percentage of healthy local directories by the node manager |
count |
nodemanager_logerror |
Number of log errors by the node manager |
count |
nodemanager_logfatal |
Number of log deletions by the node manager |
count |
nodemanager_loginfailureavgtime |
Average time of log write failures by the node manager |
ms |
nodemanager_loginfailurenumops |
Number of log write failure operations by the node manager |
count |
nodemanager_loginfo |
Number of log info by the node manager |
count |
nodemanager_loginsuccessavgtime |
Average time of successful log writes by the node manager |
count |
nodemanager_loginsuccessnumops |
Number of successful log write operations by the node manager |
count |
nodemanager_logwarn |
Number of log warnings by the node manager |
count |
nodemanager_memheapcommittedm |
Amount of committed memory heap by the node manager |
count |
nodemanager_memheapmaxm |
Maximum memory heap by the node manager |
count |
nodemanager_memheapusedm |
Amount of used memory heap by the node manager |
count |
nodemanager_memmaxm |
Maximum memory by the node manager |
byte |
nodemanager_memnonheapcommittedm |
Amount of uncommitted memory heap by the node manager |
count |
nodemanager_memnonheapmaxm |
Maximum amount of uncommitted memory heap by the node manager |
count |
nodemanager_memnonheapusedm |
Maximum amount of unused memory heap by the node manager |
count |
nodemanager_numactiveconnections |
Number of connections by the node manager |
count |
nodemanager_numactivesinks |
Number of active pools by the node manager |
count |
nodemanager_numactivesources |
Number of active resources by the node manager |
count |
nodemanager_numallsinks |
Total number of pools by the node manager |
count |
nodemanager_numallsources |
Total number of resources |
count |
nodemanager_numdroppedconnections |
Number of dropped connections by the node manager |
count |
nodemanager_numopenconnections |
Number of open connections by the node manager |
count |
nodemanager_numregisteredconnections |
Number of registered connections by the node manager |
count |
nodemanager_openblockrequestlatencymillis_count |
Number of open block latency by the node manager |
count |
nodemanager_openblockrequestlatencymillis_rate1 |
Open block latency request rate 1 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_rate15 |
Open block latency request rate 15 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_rate5 |
Open block latency request rate 5 by the node manager |
B/s |
nodemanager_openblockrequestlatencymillis_ratemean |
Average open block request latency rate by the node manager |
B/s |
nodemanager_privatebytesdeleted |
Number of private bytes deleted by the node manager |
byte |
nodemanager_publicbytesdeleted |
Number of bytes deleted by the node manager |
byte |
nodemanager_publishavgtime |
Average publish time by the node manager |
s |
nodemanager_publishnumops |
Number of publish data operations by the node manager |
ms |
nodemanager_receivedbytes |
Number of bytes received by the node manager |
byte |
nodemanager_registeredexecutorssize |
Number of registered executor classification tables by the node manager |
count |
nodemanager_registerexecutorrequestlatencymillis_count |
Number of register executor request latency milliseconds by the node manager |
count |
nodemanager_registerexecutorrequestlatencymillis_rate1 |
Register executor request latency rate 1 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_rate15 |
Register executor request latency rate 15 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_rate5 |
Register executor request latency rate 5 by the node manager |
B/s |
nodemanager_registerexecutorrequestlatencymillis_ratemean |
Average register executor latency milliseconds by the node manager |
count |
nodemanager_renewalfailures |
Number of renewal failures by the node manager |
count |
nodemanager_renewalfailurestotal |
Total number of renewal failures by the node manager |
count |
nodemanager_rpcauthenticationfailures |
Number of authentication failures by the node manager |
count |
nodemanager_rpcauthorizationsuccesses |
Number of authentication successes by the node manager |
count |
nodemanager_rpcclientbackoff |
Number of RPC client backoffs by the node manager |
count |
nodemanager_rpcprocessingtimeavgtime |
Average RPC processing time by the node manager |
s |
nodemanager_rpcprocessingtimenumops |
Number of RPC processing operations by the node manager |
count |
nodemanager_rpcqueuetimeavgtime |
Average RPC queue time by the node manager |
count |
nodemanager_rpcqueuetimenumops |
Number of RPC queue time operations by the node manager |
count |
nodemanager_rpcslowcalls |
Number of slow calls by the node manager |
count |
nodemanager_runningopportunisticcontainers |
Number of running opportunistic containers by the node manager |
count |
nodemanager_securityenabled |
Number of security enabled by the node manager |
count |
nodemanager_sentbytes |
Number of bytes sent by the node manager |
byte |
nodemanager_shuffleconnections |
Number of shuffle connections by the node manager |
count |
nodemanager_shuffleoutputbytes |
Number of shuffle output bytes by the node manager |
byte |
nodemanager_shuffleoutputsfailed |
Number of shuffle output failures by the node manager |
count |
nodemanager_shuffleoutputsok |
Number of successful shuffle outputs by the node manager |
count |
nodemanager_snapshotavgtime |
Average snapshot time by the node manager |
s |
nodemanager_snapshotnumops |
Number of snapshot operations by the node manager |
count |
nodemanager_threadsblocked |
Number of blocked threads by the node manager |
count |
nodemanager_threadsnew |
Number of new threads by the node manager |
count |
nodemanager_threadsrunnable |
Number of non-runnable threads by the node manager |
count |
nodemanager_threadsterminated |
Number of initialized threads by the node manager |
count |
nodemanager_threadstimedwaiting |
Thread wait time by the node manager |
s |
nodemanager_threadswaiting |
Number of thread switches by the node manager |
count |
nodemanager_totalbytesdeleted |
Total number of bytes deleted by the node manager |
byte |