HDFS high availability (HA), single point of failure, active and standby clusters, brain fracture, data synchronization, HDFS HA solution - QJM

yida&yueda 2022-02-13 07:51:45 阅读数:560

hdfs high availability ha single

HDFS High Availability(HA) High availability

1.1 High Availability Background knowledge

1.1.1 A single point of failure 、 High availability

A single point of failure ( English :single point of failure, abbreviation SPOF) It means that once a certain point in the system fails , Will make the whole system inoperable , let me put it another way , A single point of failure is an overall failure .
 Insert picture description here

High availability ( English :high availability, Abbreviation for HA),IT The term , The ability of a system to perform its functions without interruption , Represents the availability of the system . It's one of the criteria for system design . High availability system means that system services can run longer , It is usually realized by improving the fault tolerance of the system .

A system with high availability or high reliability does not want a single point of failure to cause an overall failure . Generally, multiple components with the same function can be added through redundancy , As long as these components do not fail at the same time , System ( Or at least part of the system ) Still working , This will improve reliability .
 Insert picture description here

1.1.2 How to achieve high availability

1.1.2.1 Active / standby cluster

Solve a single point of failure , The core of realizing high availability of system services is not to make failures never happen , But to minimize the impact of failure on the business . Because hardware and software failure is an unavoidable problem .

At present, the mature practice in enterprises is to set up backup for the location of single point of failure , Form an active and standby architecture . The popular description is When the Lord hangs up , Backup top , Continue to provide services after a short interruption .

Common is A master one framework , Of course, you can also have one master and many standby . More backups , The more fault tolerant , meanwhile , The greater the redundancy , Waste resources .
 Insert picture description here

1.1.2.2 Active、Standby

Active: The main character . Active roles , Represents the role that is providing services externally . There is and only one... At any time active External services .

Standby: Backup roles . Need to keep data with the protagonist 、 State synchronization , And always ready to switch to the main character ( When the main character dies or fails ), External services , Maintain service availability .

1.1.3 Usability criteria —x individual 9

In the high availability of the system, there is a standard to measure its reliability ——X**** individual 9, This X It's for numbers 3-5.X individual 9 Indicates in the system 1 In the course of using it for years , The system can be used normally and the total time (1 year ) The ratio of the .

  • Ø 3**** individual 9:(1-99.9%)36524=8.76 Hours , Indicates that the system is in continuous operation 1 The most likely business interruption in a year is 8.76 Hours .

  • Ø 4**** individual 9:(1-99.99%)36524=0.876 Hours =52.6 minute , Indicates that the system is in continuous operation 1 The most likely business interruption in a year is 52.6 minute .

  • Ø 5**** individual 9:(1-99.999%)36524*60=5.26 minute , Indicates that the system is in continuous operation 1 The most likely business interruption in a year is 5.26 minute .

It can be seen that ,9 The more , The more reliable the system is , The less business disruption you can tolerate , But the cost is higher .
 Insert picture description here

1.1.4 HA The core problem of system design

1.1.4.1 Cleft brain problem

Split brain (split-brain) Refer to “ Brain splitting ”, This is a medical term . stay HA In the cluster , Cerebral fissure refers to the connection between the active and standby nodes " heartbeat " When disconnected ( When two nodes are disconnected ), Originally as a whole 、 Coordinated in action HA System , It splits into two separate nodes . Because of the loss of contact with each other , The primary and standby nodes are like " Schizocephalus " equally , Make the whole cluster in chaos . The serious consequences of cerebral fissure :

1) The cluster has no owner : They all think the other party is in good shape , I am a backup role , The consequence is no service ;

2) Cluster multi master : All think the other party is faulty , I am the protagonist . Compete with each other for shared resources , The result is chaos , Data corruption . In addition, it is also unclear about client access , Who to look for ?

The core of avoiding the brain crack problem is : Maintain that at any time, the system has and only one master role to provide services .

1.1.4.2 Data synchronization issues

The condition between the active and standby nodes is the prerequisite for the continuous availability of the service 、 The data are consistent , Or exactly the same . If the data gap between the standby node and the primary node is too large , Even if the active / standby switching is completed , That's meaningless .

Common practices for data synchronization are : Replay the operation record through the log . The protagonist provides services normally , Transactional operations that occur are logged , Alternate role read log replay operation .

HDFS NAMENODE Single point problem

stay Hadoop 2.0.0 Before ,NameNode yes HDFS Single point of failure in a cluster (SPOF). There is only one per cluster NameNode, If the computer or process is not available , Then the whole cluster is in the whole NameNode Will not be available until you restart or start on another computer .

NameNode A single point of failure affects... In two ways HDFS Total cluster availability :

  • Ø If an accident happens ( For example, the machine crashed ), On reboot NameNode Before , The cluster will not be available .

  • Ø Planned maintenance events , for example NameNode Software or hardware upgrades on your computer , This will result in extended cluster downtime .

HDFS High availability solutions : Run two... In the same cluster ( from 3.0.0 rise , More than two ) redundancy NameNode. This allows rapid failover to a new server in the event of a machine crash NameNode, Or normal failover initiated by the administrator for planned maintenance purposes .

 Insert picture description here

1.2 [ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-jnFkGaqq-1639055057902)(D:##myFile##learning\A-BIgData\mdfileImgPath\clip_image012.png)]HDFS HA Solution —QJM

QJM Full name Quorum Journal Manager, from cloudera The company proposed , yes Hadoop Officially recommended HDFS HA One of the solutions .

QJM in , Use zookeeper in ZKFC To achieve active / standby switching ; Use Journal Node(JN) Cluster implementation edits log To achieve the purpose of data synchronization .

1.2.1 QJM— Active standby switching 、 The brain crack problem is solved

1.2.1.1 ZKFailoverController(zkfc)

Apache ZooKeeper It is a highly available distributed coordination service software , Used to maintain a small amount of coordination data . Zookeeper The following features and functions of are involved in HDFS Of HA In solution :

Ø temporary znode

 If one znode Nodes are temporary , Then the znode The lifecycle of will be the same as that of the client that created it session binding . Client disconnected session end ,znode Will be automatically deleted .

Ø Path Path uniqueness

zookeeper Maintains a data structure similar to a directory tree . Each node is called Znode.Znode Have uniqueness , No duplicate names . It can also be understood as exclusivity .

Ø Monitoring mechanism

 The client can target znode Set listening for events that occur on , When an event occurs, the trigger condition ,zk The service will notify the event to the client setting listening .

ZKFailoverController(ZKFC) It's a new component , It's a ZooKeeper client . function NameNode Every computer in is also running ZKFC,ZKFC The main responsibility of :

  • Ø Monitoring and management NameNode A healthy state

    ZKFC Regularly by command ping Locally responsible for monitoring NameNode node .

  • Ø Maintenance and ZooKeeper Cluster contact

    If the local NameNode In good condition , also ZKFC See that no other nodes currently hold locks znode, It will attempt to acquire the lock itself . If it works , It means that it “ Won the election ”, And responsible for running failover to make it local NameNode be in Active state . If other nodes already hold locks ,zkfc Electoral defeat , The node will be registered to listen , Wait for the next election .

1.2.1.2 Fencing Isolation mechanism

The failover process is also known as the process of active and standby role switching , What I fear most in the switching process is the sending of brain fissure . Therefore need Fencing Mechanism To avoid , Put the previous Active Node isolation , And then local NameNode Convert to Active state .

Hadoop The public library provides two types of services fenching Realization , Namely sshfence and shellfence( Default implementation ), among sshfence It means passing through ssh Log in to the target node , Use command fuser Kill the process ( adopt tcp Port number locating process pid, The method ratio jps The command is more accurate ),shellfence It refers to the execution of a user-defined shell command ( Script ) Complete the isolation .

1.2.2 QJM— The problem of active and standby data synchronization is solved

Journal Node(JN) colony It's a lightweight distributed system , It is mainly used for high-speed reading and writing data 、 Store the data . Usually use 2N+1** platform JournalNode Storage sharing Edits Log( Edit log ).
 Insert picture description here

Any modification operations are in Active NN On execution ,JournalNode The process will also record edits log To At least half The above JN in , At this time Standby NN Detected JN The synchronization inside log When it changes, it reads JN Inside edits log, Then the replay operation records are synchronized to their own directory image tree ,

When a fault occurs Active NN After hanging up ,Standby NN Before it becomes Active NN front , Read all JN The modification log inside , In this way, it can be highly reliable guarantee and hang up NN The directory image tree of is consistent , And then seamlessly take over its responsibilities , Maintain requests from clients , So as to achieve the purpose of high availability .

Synchronize to your own directory image tree ,

When a fault occurs Active NN After hanging up ,Standby NN Before it becomes Active NN front , Read all JN The modification log inside , In this way, it can be highly reliable guarantee and hang up NN The directory image tree of is consistent , And then seamlessly take over its responsibilities , Maintain requests from clients , So as to achieve the purpose of high availability .

copyright:author[yida&yueda],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751430418.html