Code sheep 2022-02-13 07:07:51 阅读数:172



  • The core : No need to develop , If you can use , Architecture design

  • Definition : Distributed coordination service component 「 Components that help others solve problems 」

  • function 1:

    • Auxiliary elections (HA High availability – For master-slave architecture clusters , Solve single point of failure by setting up backup nodes )

      Two nodes arrive at the same time ZK Scramble to create a temporary node ( The life cycle varies with the session ), Failed node listening active node

      • HDFS in : Yes 2 individual NameNode, One is active( The election ), One is standby

        Hadoop The high availability mechanism in requires zookeeper Coordinate master node (namenode) Which job , Which one standby

      • YARN in : There are two ResourceManager, One is active( The election ), One is standby

      • Spark in :StandAlone You also need a master node (Master) And slave nodes (worker), So there is 2 individual Master, An election is needed active

      • Hbase、Kafka… So it is

  • function 2:

    • Store metadata

      HDFS Distributed to blk block (128M) The form of is stored in various DataNode in , Data that holds data information ( How big is the data 、 type 、 Divided into several pieces 、 Storage location, etc ) be called 「 Metadata 」

      「 Metadata 」 Stored in memory , But because the memory is unstable ( fault 、 Shut down and empty ) Can only be used for temporary storage ( Keep the latest and most complete one ), Not for a long time and as the only form of storage , Therefore, metadata will take different forms : Stored in different places

      • In the form of a document : There is HDFS
        fsimage( The memory image file last persisted on disk )+edits( Memory operation log )= The latest and most complete metadata in memory ( Save in NameNode in , Pass regularly SecondaryNameNode Persist to disk after merging ), In fact, this file ( The latest and most complete metadata ) Will be loaded into memory

         reflection : No, SSN OK?
        answer : Sure ,NN Images and logs will also be merged at startup , But there is no SSN With the help of the ( Update image ) When the image is not updated , Log file accumulation , It will affect nn The starting speed of
        NN be responsible for 3 thing : Manage all slave nodes 、 Manage metadata 、 Responsible for receiving client requests

     Insert picture description here

    • RDBMS(MySQL):,Sqoop、HIVE(MySQL)、Oozie、Hue Metadata of exists in relational database

      use RDBMS Software that stores metadata , Generally not distributed , The number of requests is relatively small

    • ZooKeeper: Hbase、Kafka

      use ZK Software that stores metadata , Generally distributed software , The request concurrency is relatively high , Because only distributed can withstand high concurrency

  • ZK How to make sure you don't have problems ?( Software problems or machine problems )

    • ZK The data content stored on each node is consistent
      • How to achieve it ? ZK It is a special master-slave architecture —「 Fair distributed architecture 」
        Master node (leader) Responsible for writing , The master node broadcasts all data to the slave node ( Sync ), Ensure that the data of all nodes are consistent
      • What if the primary node fails ? Each node has the right to elect leader, There is no need for additional master nodes Standby because ZK Each node can receive read and write requests
      • therefore ZK Can only be used as a Small data storage system
 reflection :ZK Can the number of be even
Sure , But it's better to be odd , The election has nothing to do with the odd and even number of machines , More than half of the votes will end

About zookeeper You can leave a message if you want to know ~

copyright:author[Code sheep],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130707486085.html