HDFS architecture principle, architecture analysis, master-slave architecture analysis, connection and difference between namenode, secondarynamenode and datanode, HDFS blocking mechanism, replica mechanism and metadata management

yida&yueda 2022-02-13 07:51:43 阅读数:17

hdfs architecture principle architecture analysis

HDFS Framework principle

1. HDFS Architecture analysis

1.1 HDFS General overview

HDFS yes Hadoop Distribute File System For short , Meaning for :Hadoop distributed file system . yes Hadoop One of the core components , As the bottom distributed storage service in the big data ecosystem .HDFS The problem to be solved is how to store big data , It is a file storage system across multiple computers and has high fault tolerance .

HDFS Cluster compliance Master slave architecture . Each cluster includes a master node and multiple slave nodes . In the internal , The file is divided into one or more blocks , Each block is stored on a different slave node computer according to the replication factor . The primary node stores and manages the file system namespace , That is, information about file blocks , For example, block location , Authority, etc . The data block of the file stored from the node . The master and subordinate perform their respective duties , Cooperate with each other , Jointly provide distributed file storage services . Of course, internal details are transparent to users .

 Insert picture description here

1.2 Role Introduction

1.2.1 summary

HDFS follow Master slave architecture . Each cluster includes a master node and multiple slave nodes . among :

NameNode**** It's the master node , Responsible for storing and managing file system metadata information , Include namespace Directory structure 、 File block location information, etc ;DataNode It's the slave node , Responsible for storing specific data blocks of files .

The two roles play their respective roles , Coordinate and complete the distributed file storage service .

SecondaryNameNode Is the auxiliary role of the main character , Help the protagonist merge metadata .
 Insert picture description here

1.2.2 Namenode

NameNode yes Hadoop The core of distributed file system , The protagonist in the architecture . It maintains and manages file system metadata , Including namespace directory tree structure 、 Location information of files and blocks 、 Access rights and other information . Based on this ,NameNode It became a visit HDFS The only access .

Metadata is managed internally through memory and disk . The metadata files on the disk include Fsimage Memory metadata image files and edits log(Journal) Edit log .

stay Hadoop2 Before ,NameNode It's a single point of failure .Hadoop 2 High availability introduced in .Hadoop The cluster architecture allows two or more servers to run in a hot standby configuration in a cluster NameNode.

 Insert picture description here

1.2.3 Datanode

DataNode yes Hadoop HDFS The slave role in , Responsible for specific data block storage .DataNode The number of them determines HDFS The overall data storage capacity of the cluster . Through the and NameNode Cooperate to maintain data blocks .
 Insert picture description here

1.2.4 Secondarynamenode

except DataNode and NameNode outside , There is another daemon , It's called secondary NameNode. act as NameNode Secondary node of , But it can't replace NameNode.

When NameNode Startup time ,NameNode Merge Fsimage and edits log File to restore the current file system namespace . If edits log Too large is not conducive to loading ,Secondary NameNode On auxiliary NameNode from NameNode download Fsimage Document and edits log File merge .

1.3 HDFS Important features

1.3.1 Master slave architecture

HDFS use master/slave framework . Generally one HDFS There is a cluster Namenode And a certain number of Datanode form .Namenode yes HDFS Master node ,Datanode yes HDFS From the node , The two roles play their respective roles , Coordinate and complete the distributed file storage service .

 Insert picture description here

1.3.2 Blocking mechanism

HDFS Files in are physically partitioned (block) Of , The size of the block can be specified by configuration parameters , Parameters in hdfs-default.xml in :dfs.blocksize. The default size is 128M(134217728).

 Insert picture description here

1.3.3 Replica mechanism

For fault tolerance , All of the documents block There will be copies . For each file block size (dfs.blocksize) And copy coefficient (dfs.replication) It's all configurable . The application can specify the number of copies of a file . The copy factor can be specified when the file is created , It can also be changed later by command .

 Default **dfs.replication**** The value of is 3**, That is to say, it will be copied again 2 Share , Together with itself, a total of 3 Copies .

 Insert picture description here

1.3.4 Namespace

HDFS Support traditional Hierarchical file organization structure . Users can create directories , Then save the files in these directories . The file system namespace hierarchy is similar to most existing file systems : Users can create 、 Delete 、 Move or rename files .

Namenode Responsible for maintaining the file system namespace The name space , Any changes to the file system namespace or properties will be Namenode recorded .

HDFS Will provide the client with a Unified abstract directory tree , The client accesses the file through the path , Form like :hdfs://namenode:port/dir-a/dir-b/dir-c/file.data.

1.3.5 Metadata management

stay HDFS in ,Namenode There are two types of managed metadata :

  • Ø Attribute information of the file itself

    File name 、 jurisdiction , Modification time , file size , replicator , Block size .

  • Ø File block location mapping information

    Record file blocks and DataNode Mapping information between , Which block is on which node .

1.3.6 Block storage

Each of the documents block The specific storage management of DataNode Node commitment . every last block Can be in multiple DataNode On storage .

 Insert picture description here

copyright:author[yida&yueda],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751409777.html