Skip to content

Hadoop HDFS NameNode

Collect HDFS namenode metrics information.

Installation and Configuration

Since NameNode is developed in Java, metrics can be collected using the jmx-exporter approach.

1. NameNode Configuration

1.1 Download jmx-exporter

Download URL: https://github.com/prometheus/jmx_exporter

1.2 Download jmx script

Download URL: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-namenode.yml

1.3 Adjust NameNode Startup Parameters

Add the following to the namenode startup parameters:

{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17107:/opt/jmx/hadoop-hdfs-namenode.yml

1.4 Restart NameNode

2. DataKit Collector Configuration

2.1 Install DataKit

2.2 Configure Collector

Since jmx-exporter can directly expose the metrics URL, it can be collected directly using the prom collector.

Navigate to the conf.d/prom directory under the DataKit installation directory, and copy prom.conf.sample to namenode.conf.

cp prom.conf.sample namenode.conf

Adjust the contents of namenode.conf as follows:

  urls = ["http://localhost:17107/metrics"]
  source ="hdfs-namenode"
  [inputs.prom.tags]
    component = "hdfs-namenode" 
  interval = "10s"

Adjust other configurations as needed

, parameter adjustment instructions:

  • urls: The jmx-exporter metrics URL, fill in the metrics URL exposed by the corresponding component
  • source: Collector alias, recommended to distinguish
  • keep_exist_metric_name: Keep metric names
  • interval: Collection interval
  • inputs.prom.tags: Add additional tags

3. Restart DataKit

Restart DataKit

Metrics

Hadoop Metrics

NameNode metrics are under the Hadoop metrics, here we mainly introduce NameNode related metrics

Metrics Description Unit
namenode_add_block_ops Add block operations count count
namenode_allow_snapshot_ops Allow snapshot operations count count
namenode_block_capacity Block capacity byte
namenode_block_deletion_start_time Block deletion start time count
namenode_block_ops_batched Batch processed block operations count count
namenode_block_ops_queued Queued block operations count count
namenode_block_pool_used_space Used block pool space count
namenode_block_received_and_deleted_ops Received and deleted block operations count count
namenode_blocks Block count count
namenode_bytes_in_future_ecblock_groups Bytes in future EC block groups count
namenode_bytes_in_future_replicated_blocks Bytes in future replicated blocks count
namenode_bytes_with_future_generation_stamps Bytes with future generation stamps count
namenode_cache_capacity Cache capacity byte
namenode_cache_report_avg_time Cache report average time count
namenode_cache_report_num_ops Cache report operations count count
namenode_cache_used Used cache count
namenode_capacity Capacity count
namenode_capacity_remaining Remaining capacity byte
namenode_capacity_remaining_gb Remaining capacity (GB) GB
namenode_capacity_total_gb Total capacity (GB) GB
namenode_capacity_used Used capacity byte
namenode_capacity_used_gb Used capacity (GB) GB
namenode_capacity_used_non_dfs Non-DFS used capacity GB
namenode_corrupt_blocks Corrupt blocks count
namenode_corrupt_ecblock_groups Corrupt EC block groups count
namenode_corrupt_replicated_blocks Corrupt replicated blocks count
namenode_create_file_ops Create file operations count count
namenode_create_snapshot_ops Create snapshot operations count count
namenode_create_symlink_ops Create symlink operations count count
namenode_delete_file_ops Delete file operations count count
namenode_delete_snapshot_ops Delete snapshot operations count count
namenode_disallow_snapshot_ops Disallow snapshot operations count count
namenode_distinct_version_count Distinct version count count
namenode_distinct_versions Distinct versions count
namenode_dropped_pub_all Dropped pub_all count
namenode_elapsed_time Elapsed time ms
namenode_estimated_capacity_lost Estimated lost capacity byte
namenode_excess_blocks Excess blocks count
namenode_expired_heartbeats Expired heartbeats count
namenode_file_info_ops File info operations count count
namenode_files File count count
namenode_files_appended Appended file count count
namenode_files_deleted Deleted file count count
namenode_files_in_get_listing_ops File count in get listing operations count
namenode_files_renamed Renamed file count count
namenode_files_truncated Truncated file count count
namenode_free Free count
namenode_fs_image_load_time File system image load time ms
namenode_fs_lock_queue_length File system lock queue length count
namenode_gc_count Garbage collection count count
namenode_generate_edektime_avg_time Generate EDEK time average time ms
namenode_generate_edektime_num_ops Generate EDEK operations count count
namenode_get_additional_datanode_ops Get additional datanode operations count count
namenode_highest_priority_low_redundancy_ecblocks Highest priority low redundancy EC blocks count
namenode_highest_priority_low_redundancy_replicated_blocks Highest priority low redundancy replicated blocks count
namenode_last_checkpoint_time Last checkpoint time ms
namenode_last_hatransition_time Last HA transition time ms
namenode_last_written_transaction_id Last written transaction ID count
namenode_list_snapshottable_dir_ops List snapshottable directory operations count count
namenode_lock_queue_length Lock queue length count
namenode_low_redundancy_ecblock_groups Low redundancy EC block groups count
namenode_low_redundancy_replicated_blocks Low redundancy replicated blocks count
namenode_max_objects Max objects count count
namenode_millis_since_last_loaded_edits Milliseconds since last loaded edits ms
namenode_missing_blocks Missing blocks count
namenode_missing_ecblock_groups Missing EC block groups count
namenode_missing_repl_one_blocks Missing one replica blocks count
namenode_missing_replicated_blocks Missing replicated blocks count
namenode_missing_replication_one_blocks Missing one replica replicated blocks count
namenode_nnstarted_time_in_millis Start time (milliseconds) ms
namenode_non_dfs_used_space Non-DFS used space count
namenode_num_active_clients Active clients count count
namenode_num_active_sinks Active sink datanodes count count
namenode_num_active_sources Active source datanodes count count
namenode_num_all_sinks All sink datanodes count count
namenode_num_all_sources All source datanodes count count
namenode_num_dead_data_nodes Dead datanodes count count
namenode_num_decom_dead_data_nodes Decommissioned dead datanodes count count
namenode_num_decom_live_data_nodes Decommissioned live datanodes count count
namenode_num_decommissioning_data_nodes Decommissioning datanodes count count
namenode_num_edit_log_loaded_avg_count Edit log loaded average count count
namenode_num_edit_log_loaded_num_ops Edit log loaded operations count count
namenode_num_encryption_zones Encryption zones count count
namenode_num_entering_maintenance_data_nodes Entering maintenance mode datanodes count count
namenode_num_files_under_construction Files under construction count count
namenode_num_in_maintenance_dead_data_nodes Maintenance dead datanodes count count
namenode_num_in_maintenance_live_data_nodes Maintenance live datanodes count count
namenode_num_live_data_nodes Live datanodes count count
namenode_num_stale_data_nodes Stale datanodes count count
namenode_num_stale_storages Stale storages count count
namenode_num_timed_out_pending_reconstructions Timed out pending reconstructions count count
namenode_num_times_re_replication_not_scheduled Re-replication not scheduled count count
namenode_number_of_missing_blocks Missing blocks count count
namenode_number_of_missing_blocks_with_replication_factor_one Missing blocks with replication factor one count count
namenode_number_of_snapshottable_dirs Snapshottable directories count count
namenode_pending_data_node_message_count Pending datanode message count count
namenode_pending_deletion_blocks Pending deletion blocks count count
namenode_pending_deletion_ecblocks Pending deletion EC blocks count count
namenode_pending_deletion_replicated_blocks Pending deletion replicated blocks count count
namenode_pending_reconstruction_blocks Pending reconstruction blocks count count
namenode_pending_replication_blocks Pending replication blocks count count
namenode_percent_block_pool_used Block pool used percentage percent
namenode_percent_complete Completion percentage percent
namenode_percent_remaining Remaining percentage percent
namenode_percent_used Used percentage percent
namenode_postponed_misreplicated_blocks Postponed misreplicated blocks count count
namenode_publish_avg_time Publish average time ms
namenode_publish_num_ops Publish operations count count
namenode_put_image_avg_time Put image average time ms
namenode_put_image_num_ops Put image operations count count
namenode_rename_snapshot_ops Rename snapshot operations count count
namenode_resource_check_time_avg_time Resource check average time ms
namenode_resource_check_time_num_ops Resource check operations count count
namenode_safe_mode Safe mode count
namenode_safe_mode_count Safe mode count count
namenode_safe_mode_elapsed_time Safe mode elapsed time count
namenode_safe_mode_percent_complete Safe mode completion percentage percent
namenode_safe_mode_time Safe mode time ms
namenode_saving_checkpoint Saving checkpoint count
namenode_saving_checkpoint_count Saving checkpoint count count
namenode_saving_checkpoint_elapsed_time Saving checkpoint elapsed time ms
namenode_saving_checkpoint_percent_complete Saving checkpoint completion percentage count
namenode_scheduled_replication_blocks Scheduled replication blocks count count
namenode_stale_data_nodes Stale datanodes count
namenode_storage_block_report_avg_time Storage block report average time ms
namenode_storage_block_report_num_ops Storage block report operations count count
namenode_successful_re_replications Successful re-replications count count
namenode_syncs_avg_time Syncs average time ms
namenode_syncs_num_ops Syncs operations count count
namenode_tag_total_sync_times Tag total sync times count
namenode_timeout_re_replications Timeout re-replications count count
namenode_total_blocks Total blocks count count
namenode_total_ecblock_groups Total EC block groups count count
namenode_total_file_ops Total file operations count count
namenode_total_load Total load count
namenode_total_replicated_blocks Total replicated blocks count count
namenode_total_sync_count Total sync count count
namenode_total_sync_times Total sync times count
namenode_transactions_avg_time Transactions average time ms
namenode_transactions_batched_in_sync Sync batched transactions count count
namenode_transactions_num_ops Transactions operations count count
namenode_transactions_since_last_checkpoint Transactions since last checkpoint count count
namenode_transactions_since_last_log_roll Transactions since last log roll count count
namenode_under_replicated_blocks Under replicated blocks count count
namenode_used Used count
namenode_volume_failures Volume failures count count
namenode_warm_up_edektime_avg_time Warm up EDEK average time ms
namenode_warm_up_edektime_num_ops Warm up EDEK operations count count