Hadoop HDFS NameNode¶
Collect HDFS namenode metrics information.
Installation and Configuration¶
Since NameNode is developed in Java, metrics can be collected using the jmx-exporter approach.
1. NameNode Configuration¶
1.1 Download jmx-exporter¶
Download URL: https://github.com/prometheus/jmx_exporter
1.2 Download jmx script¶
Download URL: https://github.com/lrwh/jmx-exporter/blob/main/hadoop-hdfs-namenode.yml
1.3 Adjust NameNode Startup Parameters¶
Add the following to the namenode startup parameters:
{JAVA_GC_ARGS} -javaagent:/opt/jmx/jmx_exporter-1.0.1.jar=localhost:17107:/opt/jmx/hadoop-hdfs-namenode.yml
1.4 Restart NameNode¶
2. DataKit Collector Configuration¶
2.1 Install DataKit¶
2.2 Configure Collector¶
Since jmx-exporter can directly expose the metrics
URL, it can be collected directly using the prom
collector.
Navigate to the conf.d/prom
directory under the DataKit installation directory, and copy prom.conf.sample
to namenode.conf
.
cp prom.conf.sample namenode.conf
Adjust the contents of namenode.conf
as follows:
urls = ["http://localhost:17107/metrics"]
source ="hdfs-namenode"
[inputs.prom.tags]
component = "hdfs-namenode"
interval = "10s"
Adjust other configurations as needed
, parameter adjustment instructions:
- urls: The
jmx-exporter
metrics URL, fill in the metrics URL exposed by the corresponding component - source: Collector alias, recommended to distinguish
- keep_exist_metric_name: Keep metric names
- interval: Collection interval
- inputs.prom.tags: Add additional tags
3. Restart DataKit¶
Metrics¶
Hadoop Metrics¶
NameNode metrics are under the Hadoop metrics, here we mainly introduce NameNode related metrics
Metrics | Description | Unit |
---|---|---|
namenode_add_block_ops |
Add block operations count |
count |
namenode_allow_snapshot_ops |
Allow snapshot operations count |
count |
namenode_block_capacity |
Block capacity |
byte |
namenode_block_deletion_start_time |
Block deletion start time |
count |
namenode_block_ops_batched |
Batch processed block operations count |
count |
namenode_block_ops_queued |
Queued block operations count |
count |
namenode_block_pool_used_space |
Used block pool space |
count |
namenode_block_received_and_deleted_ops |
Received and deleted block operations count |
count |
namenode_blocks |
Block count |
count |
namenode_bytes_in_future_ecblock_groups |
Bytes in future EC block groups |
count |
namenode_bytes_in_future_replicated_blocks |
Bytes in future replicated blocks |
count |
namenode_bytes_with_future_generation_stamps |
Bytes with future generation stamps |
count |
namenode_cache_capacity |
Cache capacity |
byte |
namenode_cache_report_avg_time |
Cache report average time |
count |
namenode_cache_report_num_ops |
Cache report operations count |
count |
namenode_cache_used |
Used cache |
count |
namenode_capacity |
Capacity |
count |
namenode_capacity_remaining |
Remaining capacity |
byte |
namenode_capacity_remaining_gb |
Remaining capacity (GB) |
GB |
namenode_capacity_total_gb |
Total capacity (GB) |
GB |
namenode_capacity_used |
Used capacity |
byte |
namenode_capacity_used_gb |
Used capacity (GB) |
GB |
namenode_capacity_used_non_dfs |
Non-DFS used capacity |
GB |
namenode_corrupt_blocks |
Corrupt blocks |
count |
namenode_corrupt_ecblock_groups |
Corrupt EC block groups |
count |
namenode_corrupt_replicated_blocks |
Corrupt replicated blocks |
count |
namenode_create_file_ops |
Create file operations count |
count |
namenode_create_snapshot_ops |
Create snapshot operations count |
count |
namenode_create_symlink_ops |
Create symlink operations count |
count |
namenode_delete_file_ops |
Delete file operations count |
count |
namenode_delete_snapshot_ops |
Delete snapshot operations count |
count |
namenode_disallow_snapshot_ops |
Disallow snapshot operations count |
count |
namenode_distinct_version_count |
Distinct version count |
count |
namenode_distinct_versions |
Distinct versions |
count |
namenode_dropped_pub_all |
Dropped pub_all |
count |
namenode_elapsed_time |
Elapsed time |
ms |
namenode_estimated_capacity_lost |
Estimated lost capacity |
byte |
namenode_excess_blocks |
Excess blocks |
count |
namenode_expired_heartbeats |
Expired heartbeats |
count |
namenode_file_info_ops |
File info operations count |
count |
namenode_files |
File count |
count |
namenode_files_appended |
Appended file count |
count |
namenode_files_deleted |
Deleted file count |
count |
namenode_files_in_get_listing_ops |
File count in get listing operations |
count |
namenode_files_renamed |
Renamed file count |
count |
namenode_files_truncated |
Truncated file count |
count |
namenode_free |
Free |
count |
namenode_fs_image_load_time |
File system image load time |
ms |
namenode_fs_lock_queue_length |
File system lock queue length |
count |
namenode_gc_count |
Garbage collection count |
count |
namenode_generate_edektime_avg_time |
Generate EDEK time average time |
ms |
namenode_generate_edektime_num_ops |
Generate EDEK operations count |
count |
namenode_get_additional_datanode_ops |
Get additional datanode operations count |
count |
namenode_highest_priority_low_redundancy_ecblocks |
Highest priority low redundancy EC blocks |
count |
namenode_highest_priority_low_redundancy_replicated_blocks |
Highest priority low redundancy replicated blocks |
count |
namenode_last_checkpoint_time |
Last checkpoint time |
ms |
namenode_last_hatransition_time |
Last HA transition time |
ms |
namenode_last_written_transaction_id |
Last written transaction ID |
count |
namenode_list_snapshottable_dir_ops |
List snapshottable directory operations count |
count |
namenode_lock_queue_length |
Lock queue length |
count |
namenode_low_redundancy_ecblock_groups |
Low redundancy EC block groups |
count |
namenode_low_redundancy_replicated_blocks |
Low redundancy replicated blocks |
count |
namenode_max_objects |
Max objects count |
count |
namenode_millis_since_last_loaded_edits |
Milliseconds since last loaded edits |
ms |
namenode_missing_blocks |
Missing blocks |
count |
namenode_missing_ecblock_groups |
Missing EC block groups |
count |
namenode_missing_repl_one_blocks |
Missing one replica blocks |
count |
namenode_missing_replicated_blocks |
Missing replicated blocks |
count |
namenode_missing_replication_one_blocks |
Missing one replica replicated blocks |
count |
namenode_nnstarted_time_in_millis |
Start time (milliseconds) |
ms |
namenode_non_dfs_used_space |
Non-DFS used space |
count |
namenode_num_active_clients |
Active clients count |
count |
namenode_num_active_sinks |
Active sink datanodes count |
count |
namenode_num_active_sources |
Active source datanodes count |
count |
namenode_num_all_sinks |
All sink datanodes count |
count |
namenode_num_all_sources |
All source datanodes count |
count |
namenode_num_dead_data_nodes |
Dead datanodes count |
count |
namenode_num_decom_dead_data_nodes |
Decommissioned dead datanodes count |
count |
namenode_num_decom_live_data_nodes |
Decommissioned live datanodes count |
count |
namenode_num_decommissioning_data_nodes |
Decommissioning datanodes count |
count |
namenode_num_edit_log_loaded_avg_count |
Edit log loaded average count |
count |
namenode_num_edit_log_loaded_num_ops |
Edit log loaded operations count |
count |
namenode_num_encryption_zones |
Encryption zones count |
count |
namenode_num_entering_maintenance_data_nodes |
Entering maintenance mode datanodes count |
count |
namenode_num_files_under_construction |
Files under construction count |
count |
namenode_num_in_maintenance_dead_data_nodes |
Maintenance dead datanodes count |
count |
namenode_num_in_maintenance_live_data_nodes |
Maintenance live datanodes count |
count |
namenode_num_live_data_nodes |
Live datanodes count |
count |
namenode_num_stale_data_nodes |
Stale datanodes count |
count |
namenode_num_stale_storages |
Stale storages count |
count |
namenode_num_timed_out_pending_reconstructions |
Timed out pending reconstructions count |
count |
namenode_num_times_re_replication_not_scheduled |
Re-replication not scheduled count |
count |
namenode_number_of_missing_blocks |
Missing blocks count |
count |
namenode_number_of_missing_blocks_with_replication_factor_one |
Missing blocks with replication factor one count |
count |
namenode_number_of_snapshottable_dirs |
Snapshottable directories count |
count |
namenode_pending_data_node_message_count |
Pending datanode message count |
count |
namenode_pending_deletion_blocks |
Pending deletion blocks count |
count |
namenode_pending_deletion_ecblocks |
Pending deletion EC blocks count |
count |
namenode_pending_deletion_replicated_blocks |
Pending deletion replicated blocks count |
count |
namenode_pending_reconstruction_blocks |
Pending reconstruction blocks count |
count |
namenode_pending_replication_blocks |
Pending replication blocks count |
count |
namenode_percent_block_pool_used |
Block pool used percentage |
percent |
namenode_percent_complete |
Completion percentage |
percent |
namenode_percent_remaining |
Remaining percentage |
percent |
namenode_percent_used |
Used percentage |
percent |
namenode_postponed_misreplicated_blocks |
Postponed misreplicated blocks count |
count |
namenode_publish_avg_time |
Publish average time |
ms |
namenode_publish_num_ops |
Publish operations count |
count |
namenode_put_image_avg_time |
Put image average time |
ms |
namenode_put_image_num_ops |
Put image operations count |
count |
namenode_rename_snapshot_ops |
Rename snapshot operations count |
count |
namenode_resource_check_time_avg_time |
Resource check average time |
ms |
namenode_resource_check_time_num_ops |
Resource check operations count |
count |
namenode_safe_mode |
Safe mode |
count |
namenode_safe_mode_count |
Safe mode count |
count |
namenode_safe_mode_elapsed_time |
Safe mode elapsed time |
count |
namenode_safe_mode_percent_complete |
Safe mode completion percentage |
percent |
namenode_safe_mode_time |
Safe mode time |
ms |
namenode_saving_checkpoint |
Saving checkpoint |
count |
namenode_saving_checkpoint_count |
Saving checkpoint count |
count |
namenode_saving_checkpoint_elapsed_time |
Saving checkpoint elapsed time |
ms |
namenode_saving_checkpoint_percent_complete |
Saving checkpoint completion percentage |
count |
namenode_scheduled_replication_blocks |
Scheduled replication blocks count |
count |
namenode_stale_data_nodes |
Stale datanodes |
count |
namenode_storage_block_report_avg_time |
Storage block report average time |
ms |
namenode_storage_block_report_num_ops |
Storage block report operations count |
count |
namenode_successful_re_replications |
Successful re-replications count |
count |
namenode_syncs_avg_time |
Syncs average time |
ms |
namenode_syncs_num_ops |
Syncs operations count |
count |
namenode_tag_total_sync_times |
Tag total sync times |
count |
namenode_timeout_re_replications |
Timeout re-replications count |
count |
namenode_total_blocks |
Total blocks count |
count |
namenode_total_ecblock_groups |
Total EC block groups count |
count |
namenode_total_file_ops |
Total file operations count |
count |
namenode_total_load |
Total load |
count |
namenode_total_replicated_blocks |
Total replicated blocks count |
count |
namenode_total_sync_count |
Total sync count |
count |
namenode_total_sync_times |
Total sync times |
count |
namenode_transactions_avg_time |
Transactions average time |
ms |
namenode_transactions_batched_in_sync |
Sync batched transactions count |
count |
namenode_transactions_num_ops |
Transactions operations count |
count |
namenode_transactions_since_last_checkpoint |
Transactions since last checkpoint count |
count |
namenode_transactions_since_last_log_roll |
Transactions since last log roll count |
count |
namenode_under_replicated_blocks |
Under replicated blocks count |
count |
namenode_used |
Used |
count |
namenode_volume_failures |
Volume failures count |
count |
namenode_warm_up_edektime_avg_time |
Warm up EDEK average time |
ms |
namenode_warm_up_edektime_num_ops |
Warm up EDEK operations count |
count |