Skip to content

Hadoop HDFS DataNode

Collect HDFS datanode Metrics information.

Installation and Deployment

Since the DataNode is developed in the LANGUAGE, it is possible to use jmx-exporter to collect Metrics information.

1. DataNode Configuration

1.1 Download jmx-exporter

Download address: https://github.com/prometheus/jmx_exporter

1.2 Download jmx Script

Download address: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-datanode.yml

1.3 Adjust DataNode Startup Parameters

Add the following to the startup parameters of the datanode:

{JAVA_GC_ARGS} -javaagent:/opt/guance/jmx/jmx_exporter-1.0.1.jar=localhost:17106:/opt/guance/jmx/hadoop-hdfs-datanode.yml

1.4 Restart DataNode

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure the Collector

jmx-exporter can directly expose metrics url, so the prom collector can be used for collection.

Go to the DataKit installation directory under conf.d/prom, copy prom.conf.sample as datanode.conf.

cp prom.conf.sample datanode.conf

Adjust the content of datanode.conf as follows:

  urls = ["http://localhost:17106/metrics"]
  source = "hdfs-datanode"
  [inputs.prom.tags]
    component = "hdfs-datanode" 
  interval = "10s"

Other configurations should be adjusted as needed

, parameter adjustment instructions:

  • urls: The jmx-exporter Metrics address, fill in the corresponding component's exposed Metrics url here.
  • source: Alias for the collector, recommended to distinguish.
  • keep_exist_metric_name: Keep metric names.
  • interval: Collection interval.
  • inputs.prom.tags: Add extra tags.

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Measurement

DataNode Metrics are located under the Hadoop Measurement set, mainly introducing the explanation of DataNode related Metrics.

Metrics Description Unit
datanode_block_verification_failures Number of block verification failures on data nodes count
datanode_blocks_cached Number of blocks cached on data nodes count
datanode_blocks_read Number of blocks read by data nodes count
datanode_blocks_removed Number of blocks removed by data nodes count
datanode_blocks_replicated Number of blocks replicated by data nodes count
datanode_blocks_uncached Number of blocks not cached on data nodes count
datanode_blocks_verified Number of blocks verified by data nodes count
datanode_blocks_written Number of blocks written by data nodes count
datanode_bytes_read Number of bytes read by data nodes byte
datanode_bytes_written Number of bytes written by data nodes byte
datanode_cache_capacity Cache capacity on data nodes byte
datanode_cache_reports_avg_time Average time for cache reports on data nodes ms
datanode_cache_reports_num_ops Number of operations for cache reports on data nodes count
datanode_cache_used Amount of cache used on data nodes byte
datanode_capacity Capacity of data nodes count
datanode_data_node_active_xceivers_count Number of active receivers on data nodes count
datanode_datanode_network_errors Number of network errors on data nodes count
datanode_dfs_used Amount of DFS space used by data nodes byte
datanode_dropped_pub_all Total number of published messages dropped by data nodes count
datanode_estimated_capacity_lost Estimated capacity lost by data nodes byte
datanode_flush_io_rate_avg_time Average time for flushing I/O rate on data nodes ms
datanode_flush_io_rate_num_ops Number of flush I/O operations on data nodes count
datanode_flush_nanos_avg_time Average time for flush operation (nanoseconds) on data nodes ns
datanode_flush_nanos_num_ops Number of flush operations on data nodes count
datanode_fsync_count Number of fsync operations on data nodes count
datanode_heartbeats_avg_time Average time for heartbeats on data nodes ms
datanode_heartbeats_num_ops Number of heartbeat operations on data nodes count
datanode_heartbeats_total_avg_time Total average time for heartbeats on data nodes ms
datanode_heartbeats_total_num_ops Total number of heartbeat operations on data nodes count
datanode_incremental_block_reports_avg_time Average time for incremental block reports on data nodes ms
datanode_incremental_block_reports_num_ops Number of operations for incremental block reports on data nodes count
datanode_lifelines_avg_time Average time for lifeline signals on data nodes ms
datanode_lifelines_num_ops Number of operations for lifeline signals on data nodes count
datanode_metadata_operation_rate_avg_time Average time for metadata operations on data nodes ms
datanode_metadata_operation_rate_num_ops Number of metadata operations on data nodes count
datanode_num_active_sinks Number of active sinks on data nodes count
datanode_num_active_sources Number of active sources on data nodes count
datanode_num_all_sinks Number of all sinks on data nodes count
datanode_num_all_sources Number of all sources on data nodes count
datanode_num_blocks_cached Number of blocks cached on data nodes count
datanode_num_blocks_failed_to_cache Number of blocks failed to cache on data nodes count
datanode_num_blocks_failed_to_un_cache Number of blocks failed to uncache on data nodes count
datanode_num_blocks_failed_to_uncache Number of blocks failed to uncache on data nodes count
datanode_num_failed_volumes Number of failed volumes on data nodes count
datanode_publish_avg_time Average time for publishing on data nodes ms
datanode_publish_num_ops Number of publish operations on data nodes count
datanode_ram_disk_blocks_deleted_before_lazy_persisted Number of RAM disk blocks deleted before lazy persistence on data nodes count
datanode_ram_disk_blocks_evicted Number of RAM disk blocks evicted on data nodes count
datanode_ram_disk_blocks_read_hits Number of hits when reading RAM disk blocks on data nodes count
datanode_ram_disk_blocks_write Number of writes to RAM disk blocks on data nodes count
datanode_ram_disk_bytes_write Number of bytes written to RAM disk on data nodes byte
datanode_read_block_op_avg_time Average time for reading block operations on data nodes ms
datanode_read_block_op_num_ops Number of read block operations on data nodes count
datanode_read_io_rate_avg_time Average time for read I/O rate on data nodes ms
datanode_read_io_rate_num_ops Number of read I/O operations on data nodes count
datanode_reads_from_local_client Number of reads from local clients on data nodes count
datanode_reads_from_remote_client Number of reads from remote clients on data nodes count
datanode_remaining Remaining space on data nodes byte
datanode_remote_bytes_read Number of bytes read remotely on data nodes byte
datanode_remote_bytes_written Number of bytes written remotely on data nodes byte
datanode_replace_block_op_avg_time Average time for replacing block operations on data nodes ms
datanode_replace_block_op_num_ops Number of replace block operations on data nodes count
datanode_send_data_packet_blocked_on_network_nanos_avg_time Average time blocked on network while sending data packets (nanoseconds) on data nodes ns
datanode_send_data_packet_blocked_on_network_nanos_num_ops Number of network blocking operations while sending data packets on data nodes count
datanode_send_data_packet_transfer_nanos_avg_time Average time transferring data packets (nanoseconds) on data nodes ns
datanode_send_data_packet_transfer_nanos_num_ops Number of transfer operations for data packets on data nodes count
datanode_snapshot_avg_time Average time for snapshots on data nodes ms
datanode_snapshot_num_ops Number of snapshot operations on data nodes count
datanode_sync_io_rate_avg_time Average time for sync I/O rate on data nodes ms
datanode_sync_io_rate_num_ops Number of sync I/O operations on data nodes count
datanode_total_data_file_ios Total number of data file I/O operations on data nodes count
datanode_total_file_io_errors Total number of file I/O errors on data nodes count
datanode_total_metadata_operations Total number of metadata operations on data nodes count
datanode_total_read_time Total read time on data nodes ms
datanode_total_write_time Total write time on data nodes ms
datanode_volume_failures Number of volume failures on data nodes count
datanode_write_block_op_avg_time Average time for writing block operations on data nodes ms
datanode_write_block_op_num_ops Number of write block operations on data nodes count
datanode_write_io_rate_avg_time Average time for write I/O rate on data nodes ms
datanode_write_io_rate_num_ops Number of write I/O operations on data nodes count
datanode_writes_from_local_client Number of writes from local clients on data nodes count
datanode_writes_from_remote_client Number of writes from remote clients on data nodes count
datanode_xceiver_count Number of xceivers on data nodes count
datanode_xmits_in_progress Number of ongoing transmissions on data nodes count