Skip to content

Hadoop HDFS DataNode

Collect HDFS datanode metrics.

Installation and Deployment

Since DataNode is developed in Java, metrics can be collected using the jmx-exporter method.

1. DataNode Configuration

1.1 Download jmx-exporter

Download URL: https://github.com/prometheus/jmx_exporter

1.2 Download jmx script

Download URL: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-datanode.yml

1.3 Adjust DataNode Startup Parameters

Add the following to the datanode startup parameters:

{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17106:/opt/jmx/hadoop-hdfs-datanode.yml

1.4 Restart DataNode

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure the Collector

Since jmx-exporter can directly expose the metrics URL, it can be collected using the prom collector.

Navigate to the conf.d/prom directory under the DataKit installation directory, and copy prom.conf.sample to datanode.conf.

cp prom.conf.sample datanode.conf

Adjust the content of datanode.conf as follows:

  urls = ["http://localhost:17106/metrics"]
  source ="hdfs-datanode"
  [inputs.prom.tags]
    component = "hdfs-datanode" 
  interval = "10s"

Adjust other configurations as needed

, adjustment parameter descriptions:

  • urls: jmx-exporter metrics URL, fill in the metrics URL exposed by the corresponding component here
  • source: Collector alias, it is recommended to differentiate
  • keep_exist_metric_name: Keep the metric name
  • interval: Collection interval
  • inputs.prom.tags: Add additional tags

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Metrics

DataNode metrics are located under the Hadoop metrics, here we mainly introduce DataNode related metrics.

Metrics Description Unit
datanode_block_verification_failures Number of block verification failures on the datanode count
datanode_blocks_cached Number of blocks cached on the datanode count
datanode_blocks_read Number of blocks read on the datanode count
datanode_blocks_removed Number of blocks removed on the datanode count
datanode_blocks_replicated Number of blocks replicated on the datanode count
datanode_blocks_uncached Number of blocks uncached on the datanode count
datanode_blocks_verified Number of blocks verified on the datanode count
datanode_blocks_written Number of blocks written on the datanode count
datanode_bytes_read Number of bytes read on the datanode byte
datanode_bytes_written Number of bytes written on the datanode byte
datanode_cache_capacity Cache capacity of the datanode byte
datanode_cache_reports_avg_time Average time for cache reports on the datanode ms
datanode_cache_reports_num_ops Number of cache report operations on the datanode count
datanode_cache_used Amount of cache used on the datanode byte
datanode_capacity Capacity of the datanode count
datanode_data_node_active_xceivers_count Number of active receivers on the datanode count
datanode_datanode_network_errors Number of network errors on the datanode count
datanode_dfs_used Amount of DFS space used on the datanode btye
datanode_dropped_pub_all Total number of dropped publish messages on the datanode count
datanode_estimated_capacity_lost Estimated lost capacity on the datanode byte
datanode_flush_io_rate_avg_time Average time for flush I/O rate on the datanode ms
datanode_flush_io_rate_num_ops Number of flush I/O operations on the datanode count
datanode_flush_nanos_avg_time Average time for flush operations on the datanode (nanoseconds) ns
datanode_flush_nanos_num_ops Number of flush operations on the datanode count
datanode_fsync_count Number of fsync operations on the datanode count
datanode_heartbeats_avg_time Average time for heartbeats on the datanode ms
datanode_heartbeats_num_ops Number of heartbeat operations on the datanode count
datanode_heartbeats_total_avg_time Total average time for heartbeats on the datanode ms
datanode_heartbeats_total_num_ops Total number of heartbeat operations on the datanode count
datanode_incremental_block_reports_avg_time Average time for incremental block reports on the datanode ms
datanode_incremental_block_reports_num_ops Number of incremental block report operations on the datanode count
datanode_lifelines_avg_time Average time for lifeline signals on the datanode ms
datanode_lifelines_num_ops Number of lifeline signal operations on the datanode count
datanode_metadata_operation_rate_avg_time Average time for metadata operation rate on the datanode ms
datanode_metadata_operation_rate_num_ops Number of metadata operations on the datanode count
datanode_num_active_sinks Number of active receivers on the datanode count
datanode_num_active_sources Number of active sources on the datanode count
datanode_num_all_sinks Total number of receivers on the datanode count
datanode_num_all_sources Total number of sources on the datanode count
datanode_num_blocks_cached Number of blocks cached on the datanode count
datanode_num_blocks_failed_to_cache Number of blocks failed to cache on the datanode count
datanode_num_blocks_failed_to_un_cache Number of blocks failed to uncache on the datanode count
datanode_num_blocks_failed_to_uncache Number of blocks failed to uncache on the datanode count
datanode_num_failed_volumes Number of failed volumes on the datanode count
datanode_publish_avg_time Average time for publish operations on the datanode ms
datanode_publish_num_ops Number of publish operations on the datanode count
datanode_ram_disk_blocks_deleted_before_lazy_persisted Number of RAM disk blocks deleted before lazy persistence on the datanode count
datanode_ram_disk_blocks_evicted Number of RAM disk blocks evicted on the datanode count
datanode_ram_disk_blocks_read_hits Number of RAM disk block read hits on the datanode count
datanode_ram_disk_blocks_write Number of RAM disk block writes on the datanode count
datanode_ram_disk_bytes_write Number of bytes written to RAM disk on the datanode byte
datanode_read_block_op_avg_time Average time for read block operations on the datanode ms
datanode_read_block_op_num_ops Number of read block operations on the datanode count
datanode_read_io_rate_avg_time Average time for read I/O rate on the datanode ms
datanode_read_io_rate_num_ops Number of read I/O operations on the datanode count
datanode_reads_from_local_client Number of reads from local clients on the datanode count
datanode_reads_from_remote_client Number of reads from remote clients on the datanode count
datanode_remaining Remaining space on the datanode byte
datanode_remote_bytes_read Number of bytes read remotely on the datanode byte
datanode_remote_bytes_written Number of bytes written remotely on the datanode byte
datanode_replace_block_op_avg_time Average time for replace block operations on the datanode ms
datanode_replace_block_op_num_ops Number of replace block operations on the datanode count
datanode_send_data_packet_blocked_on_network_nanos_avg_time Average time for network blocking when sending data packets on the datanode (nanoseconds) ns
datanode_send_data_packet_blocked_on_network_nanos_num_ops Number of network blocking operations when sending data packets on the datanode count
datanode_send_data_packet_transfer_nanos_avg_time Average time for data packet transfer on the datanode (nanoseconds) ns
datanode_send_data_packet_transfer_nanos_num_ops Number of data packet transfer operations on the datanode count
datanode_snapshot_avg_time Average time for snapshots on the datanode ms
datanode_snapshot_num_ops Number of snapshot operations on the datanode count
datanode_sync_io_rate_avg_time Average time for sync I/O rate on the datanode ms
datanode_sync_io_rate_num_ops Number of sync I/O operations on the datanode count
datanode_total_data_file_ios Total number of data file I/O operations on the datanode count
datanode_total_file_io_errors Total number of file I/O errors on the datanode count
datanode_total_metadata_operations Total number of metadata operations on the datanode count
datanode_total_read_time Total read time on the datanode ms
datanode_total_write_time Total write time on the datanode ms
datanode_volume_failures Number of volume failures on the datanode count
datanode_write_block_op_avg_time Average time for write block operations on the datanode ms
datanode_write_block_op_num_ops Number of write block operations on the datanode count
datanode_write_io_rate_avg_time Average time for write I/O rate on the datanode ms
datanode_write_io_rate_num_ops Number of write I/O operations on the datanode count
datanode_writes_from_local_client Number of writes from local clients on the datanode count
datanode_writes_from_remote_client Number of writes from remote clients on the datanode count
datanode_xceiver_count Number of receivers on the datanode count
datanode_xmits_in_progress Number of ongoing transfers on the datanode count