Hadoop HDFS DataNode¶
Collect HDFS datanode metrics.
Installation and Deployment¶
Since DataNode is developed in Java, metrics can be collected using the jmx-exporter method.
1. DataNode Configuration¶
1.1 Download jmx-exporter¶
Download URL: https://github.com/prometheus/jmx_exporter
1.2 Download jmx script¶
Download URL: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-datanode.yml
1.3 Adjust DataNode Startup Parameters¶
Add the following to the datanode startup parameters:
{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17106:/opt/jmx/hadoop-hdfs-datanode.yml
1.4 Restart DataNode¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure the Collector¶
Since jmx-exporter can directly expose the metrics
URL, it can be collected using the prom
collector.
Navigate to the conf.d/prom
directory under the DataKit installation directory, and copy prom.conf.sample
to datanode.conf
.
cp prom.conf.sample datanode.conf
Adjust the content of datanode.conf
as follows:
urls = ["http://localhost:17106/metrics"]
source ="hdfs-datanode"
[inputs.prom.tags]
component = "hdfs-datanode"
interval = "10s"
Adjust other configurations as needed
, adjustment parameter descriptions:
- urls:
jmx-exporter
metrics URL, fill in the metrics URL exposed by the corresponding component here - source: Collector alias, it is recommended to differentiate
- keep_exist_metric_name: Keep the metric name
- interval: Collection interval
- inputs.prom.tags: Add additional tags
3. Restart DataKit¶
Metrics¶
Hadoop Metrics¶
DataNode metrics are located under the Hadoop metrics, here we mainly introduce DataNode related metrics.
Metrics | Description | Unit |
---|---|---|
datanode_block_verification_failures |
Number of block verification failures on the datanode |
count |
datanode_blocks_cached |
Number of blocks cached on the datanode |
count |
datanode_blocks_read |
Number of blocks read on the datanode |
count |
datanode_blocks_removed |
Number of blocks removed on the datanode |
count |
datanode_blocks_replicated |
Number of blocks replicated on the datanode |
count |
datanode_blocks_uncached |
Number of blocks uncached on the datanode |
count |
datanode_blocks_verified |
Number of blocks verified on the datanode |
count |
datanode_blocks_written |
Number of blocks written on the datanode |
count |
datanode_bytes_read |
Number of bytes read on the datanode |
byte |
datanode_bytes_written |
Number of bytes written on the datanode |
byte |
datanode_cache_capacity |
Cache capacity of the datanode |
byte |
datanode_cache_reports_avg_time |
Average time for cache reports on the datanode |
ms |
datanode_cache_reports_num_ops |
Number of cache report operations on the datanode |
count |
datanode_cache_used |
Amount of cache used on the datanode |
byte |
datanode_capacity |
Capacity of the datanode |
count |
datanode_data_node_active_xceivers_count |
Number of active receivers on the datanode |
count |
datanode_datanode_network_errors |
Number of network errors on the datanode |
count |
datanode_dfs_used |
Amount of DFS space used on the datanode |
btye |
datanode_dropped_pub_all |
Total number of dropped publish messages on the datanode |
count |
datanode_estimated_capacity_lost |
Estimated lost capacity on the datanode |
byte |
datanode_flush_io_rate_avg_time |
Average time for flush I/O rate on the datanode |
ms |
datanode_flush_io_rate_num_ops |
Number of flush I/O operations on the datanode |
count |
datanode_flush_nanos_avg_time |
Average time for flush operations on the datanode (nanoseconds) |
ns |
datanode_flush_nanos_num_ops |
Number of flush operations on the datanode |
count |
datanode_fsync_count |
Number of fsync operations on the datanode |
count |
datanode_heartbeats_avg_time |
Average time for heartbeats on the datanode |
ms |
datanode_heartbeats_num_ops |
Number of heartbeat operations on the datanode |
count |
datanode_heartbeats_total_avg_time |
Total average time for heartbeats on the datanode |
ms |
datanode_heartbeats_total_num_ops |
Total number of heartbeat operations on the datanode |
count |
datanode_incremental_block_reports_avg_time |
Average time for incremental block reports on the datanode |
ms |
datanode_incremental_block_reports_num_ops |
Number of incremental block report operations on the datanode |
count |
datanode_lifelines_avg_time |
Average time for lifeline signals on the datanode |
ms |
datanode_lifelines_num_ops |
Number of lifeline signal operations on the datanode |
count |
datanode_metadata_operation_rate_avg_time |
Average time for metadata operation rate on the datanode |
ms |
datanode_metadata_operation_rate_num_ops |
Number of metadata operations on the datanode |
count |
datanode_num_active_sinks |
Number of active receivers on the datanode |
count |
datanode_num_active_sources |
Number of active sources on the datanode |
count |
datanode_num_all_sinks |
Total number of receivers on the datanode |
count |
datanode_num_all_sources |
Total number of sources on the datanode |
count |
datanode_num_blocks_cached |
Number of blocks cached on the datanode |
count |
datanode_num_blocks_failed_to_cache |
Number of blocks failed to cache on the datanode |
count |
datanode_num_blocks_failed_to_un_cache |
Number of blocks failed to uncache on the datanode |
count |
datanode_num_blocks_failed_to_uncache |
Number of blocks failed to uncache on the datanode |
count |
datanode_num_failed_volumes |
Number of failed volumes on the datanode |
count |
datanode_publish_avg_time |
Average time for publish operations on the datanode |
ms |
datanode_publish_num_ops |
Number of publish operations on the datanode |
count |
datanode_ram_disk_blocks_deleted_before_lazy_persisted |
Number of RAM disk blocks deleted before lazy persistence on the datanode |
count |
datanode_ram_disk_blocks_evicted |
Number of RAM disk blocks evicted on the datanode |
count |
datanode_ram_disk_blocks_read_hits |
Number of RAM disk block read hits on the datanode |
count |
datanode_ram_disk_blocks_write |
Number of RAM disk block writes on the datanode |
count |
datanode_ram_disk_bytes_write |
Number of bytes written to RAM disk on the datanode |
byte |
datanode_read_block_op_avg_time |
Average time for read block operations on the datanode |
ms |
datanode_read_block_op_num_ops |
Number of read block operations on the datanode |
count |
datanode_read_io_rate_avg_time |
Average time for read I/O rate on the datanode |
ms |
datanode_read_io_rate_num_ops |
Number of read I/O operations on the datanode |
count |
datanode_reads_from_local_client |
Number of reads from local clients on the datanode |
count |
datanode_reads_from_remote_client |
Number of reads from remote clients on the datanode |
count |
datanode_remaining |
Remaining space on the datanode |
byte |
datanode_remote_bytes_read |
Number of bytes read remotely on the datanode |
byte |
datanode_remote_bytes_written |
Number of bytes written remotely on the datanode |
byte |
datanode_replace_block_op_avg_time |
Average time for replace block operations on the datanode |
ms |
datanode_replace_block_op_num_ops |
Number of replace block operations on the datanode |
count |
datanode_send_data_packet_blocked_on_network_nanos_avg_time |
Average time for network blocking when sending data packets on the datanode (nanoseconds) |
ns |
datanode_send_data_packet_blocked_on_network_nanos_num_ops |
Number of network blocking operations when sending data packets on the datanode |
count |
datanode_send_data_packet_transfer_nanos_avg_time |
Average time for data packet transfer on the datanode (nanoseconds) |
ns |
datanode_send_data_packet_transfer_nanos_num_ops |
Number of data packet transfer operations on the datanode |
count |
datanode_snapshot_avg_time |
Average time for snapshots on the datanode |
ms |
datanode_snapshot_num_ops |
Number of snapshot operations on the datanode |
count |
datanode_sync_io_rate_avg_time |
Average time for sync I/O rate on the datanode |
ms |
datanode_sync_io_rate_num_ops |
Number of sync I/O operations on the datanode |
count |
datanode_total_data_file_ios |
Total number of data file I/O operations on the datanode |
count |
datanode_total_file_io_errors |
Total number of file I/O errors on the datanode |
count |
datanode_total_metadata_operations |
Total number of metadata operations on the datanode |
count |
datanode_total_read_time |
Total read time on the datanode |
ms |
datanode_total_write_time |
Total write time on the datanode |
ms |
datanode_volume_failures |
Number of volume failures on the datanode |
count |
datanode_write_block_op_avg_time |
Average time for write block operations on the datanode |
ms |
datanode_write_block_op_num_ops |
Number of write block operations on the datanode |
count |
datanode_write_io_rate_avg_time |
Average time for write I/O rate on the datanode |
ms |
datanode_write_io_rate_num_ops |
Number of write I/O operations on the datanode |
count |
datanode_writes_from_local_client |
Number of writes from local clients on the datanode |
count |
datanode_writes_from_remote_client |
Number of writes from remote clients on the datanode |
count |
datanode_xceiver_count |
Number of receivers on the datanode |
count |
datanode_xmits_in_progress |
Number of ongoing transfers on the datanode |
count |