HDFS block load balancer: balancer

yida&yueda 2022-02-13 07:51:48 阅读数:478

hdfs block load balancer balancer

1. background

HDFS Data may not always be DataNode Evenly distributed between . A common reason is to add new to an existing cluster DataNode.HDFS Provides a Balancer Program , analysis block Place information and throughout DataNode Balance data between nodes , Until it is considered balanced .

So-called Balance refers to each DataNode Utilization ratio ( The ratio of the used space on the node to the total capacity of the node ) And cluster utilization ( The ratio of the used space on the cluster to the total capacity of the cluster ) The difference does not exceed a given threshold percentage . The balancer cannot be in a single DataNode Balance between volumes on .
 Comparison diagram before and after balance

2 . Command line configuration and running

 Insert picture description here

-threshold 10 // Conditions for cluster equilibrium ,datanode Threshold difference in disk usage between , Interval selection :0~100
-policy datanode // Balance strategy , The default is datanode, If datanode Balance , Then the cluster is balanced .
-exclude -f /tmp/ip1.txt // The default is empty. , Specify this section ip No participation balance, -f: Specify the input as a file
-include -f /tmp/ip2.txt // The default is empty. , Only this part is allowed ip Participate in balance,-f: Specify the input as a file
-idleiterations 5 // iteration 5

2.1 Set balanced data transmission bandwidth

 command :**hdfs dfsadmin -setBalancerBandwidth newbandwidth**

among newbandwidth Is each DataNode The maximum amount of network bandwidth that can be used during a balancing operation , In bytes per second .

 such as :hdfs dfsadmin -setBalancerBandwidth 104857600

2.2 Default run balancer

 command :hdfs balancer

At this time, the data block balance operation will be performed with the default parameters .

2.3 Modify the threshold to run balancer

 command :hdfs balancer -threshold 5


 command :hdfs balancer -threshold 5

Balancer The threshold value will be 5% function ( The default value is 10%), This means that the program will ensure that each DataNode The disk usage on the cluster does not differ from the overall usage in the cluster by more than 5%. for example , If all in the cluster DataNode The overall utilization of is% of the total storage capacity of the cluster disk 40%, Then the program will ensure that each DataNode The disk utilization is in this DataNode Of disk storage capacity 35% to 45% Between .

copyright:author[yida&yueda],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751466690.html