Code sheep 2022-02-13 07:07:51 阅读数:172
The core : No need to develop , If you can use , Architecture design
Definition : Distributed coordination service component 「 Components that help others solve problems 」
function 1:
Auxiliary elections (HA High availability – For master-slave architecture clusters , Solve single point of failure by setting up backup nodes )
Two nodes arrive at the same time ZK Scramble to create a temporary node ( The life cycle varies with the session ), Failed node listening active node
HDFS in : Yes 2 individual NameNode, One is active( The election ), One is standby
Hadoop The high availability mechanism in requires zookeeper Coordinate master node (namenode) Which job , Which one standby
YARN in : There are two ResourceManager, One is active( The election ), One is standby
Spark in :StandAlone You also need a master node (Master) And slave nodes (worker), So there is 2 individual Master, An election is needed active
Hbase、Kafka… So it is
function 2:
Store metadata
HDFS Distributed to blk block (128M) The form of is stored in various DataNode in , Data that holds data information ( How big is the data 、 type 、 Divided into several pieces 、 Storage location, etc ) be called 「 Metadata 」
「 Metadata 」 Stored in memory , But because the memory is unstable ( fault 、 Shut down and empty ) Can only be used for temporary storage ( Keep the latest and most complete one ), Not for a long time and as the only form of storage , Therefore, metadata will take different forms : Stored in different places
In the form of a document : There is HDFS
fsimage( The memory image file last persisted on disk )+edits( Memory operation log )= The latest and most complete metadata in memory ( Save in NameNode in , Pass regularly SecondaryNameNode Persist to disk after merging ), In fact, this file ( The latest and most complete metadata ) Will be loaded into memory
reflection : No, SSN OK?
answer : Sure ,NN Images and logs will also be merged at startup , But there is no SSN With the help of the ( Update image ) When the image is not updated , Log file accumulation , It will affect nn The starting speed of
NN be responsible for 3 thing : Manage all slave nodes 、 Manage metadata 、 Responsible for receiving client requests
RDBMS(MySQL):,Sqoop、HIVE(MySQL)、Oozie、Hue Metadata of exists in relational database
use RDBMS Software that stores metadata , Generally not distributed , The number of requests is relatively small
ZooKeeper: Hbase、Kafka
use ZK Software that stores metadata , Generally distributed software , The request concurrency is relatively high , Because only distributed can withstand high concurrency
ZK How to make sure you don't have problems ?( Software problems or machine problems )
reflection :ZK Can the number of be even
Sure , But it's better to be odd , The election has nothing to do with the odd and even number of machines , More than half of the votes will end
About zookeeper You can leave a message if you want to know ~
copyright:author[Code sheep],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130707486085.html