yida＆yueda 2022-02-13 07:51:43 阅读数:17
HDFS yes Hadoop Distribute File System For short , Meaning for ：Hadoop distributed file system . yes Hadoop One of the core components , As the bottom distributed storage service in the big data ecosystem .HDFS The problem to be solved is how to store big data , It is a file storage system across multiple computers and has high fault tolerance .
HDFS Cluster compliance Master slave architecture . Each cluster includes a master node and multiple slave nodes . In the internal , The file is divided into one or more blocks , Each block is stored on a different slave node computer according to the replication factor . The primary node stores and manages the file system namespace , That is, information about file blocks , For example, block location , Authority, etc . The data block of the file stored from the node . The master and subordinate perform their respective duties , Cooperate with each other , Jointly provide distributed file storage services . Of course, internal details are transparent to users .
HDFS follow Master slave architecture . Each cluster includes a master node and multiple slave nodes . among ：
NameNode**** It's the master node , Responsible for storing and managing file system metadata information , Include namespace Directory structure 、 File block location information, etc ;DataNode It's the slave node , Responsible for storing specific data blocks of files .
The two roles play their respective roles , Coordinate and complete the distributed file storage service .
SecondaryNameNode Is the auxiliary role of the main character , Help the protagonist merge metadata .
NameNode yes Hadoop The core of distributed file system , The protagonist in the architecture . It maintains and manages file system metadata , Including namespace directory tree structure 、 Location information of files and blocks 、 Access rights and other information . Based on this ,NameNode It became a visit HDFS The only access .
Metadata is managed internally through memory and disk . The metadata files on the disk include Fsimage Memory metadata image files and edits log（Journal） Edit log .
stay Hadoop2 Before ,NameNode It's a single point of failure .Hadoop 2 High availability introduced in .Hadoop The cluster architecture allows two or more servers to run in a hot standby configuration in a cluster NameNode.
DataNode yes Hadoop HDFS The slave role in , Responsible for specific data block storage .DataNode The number of them determines HDFS The overall data storage capacity of the cluster . Through the and NameNode Cooperate to maintain data blocks .
except DataNode and NameNode outside , There is another daemon , It's called secondary NameNode. act as NameNode Secondary node of , But it can't replace NameNode.
When NameNode Startup time ,NameNode Merge Fsimage and edits log File to restore the current file system namespace . If edits log Too large is not conducive to loading ,Secondary NameNode On auxiliary NameNode from NameNode download Fsimage Document and edits log File merge .
HDFS use master/slave framework . Generally one HDFS There is a cluster Namenode And a certain number of Datanode form .Namenode yes HDFS Master node ,Datanode yes HDFS From the node , The two roles play their respective roles , Coordinate and complete the distributed file storage service .
HDFS Files in are physically partitioned （block） Of , The size of the block can be specified by configuration parameters , Parameters in hdfs-default.xml in ：dfs.blocksize. The default size is 128M（134217728）.
For fault tolerance , All of the documents block There will be copies . For each file block size （dfs.blocksize） And copy coefficient （dfs.replication） It's all configurable . The application can specify the number of copies of a file . The copy factor can be specified when the file is created , It can also be changed later by command .
Default **dfs.replication**** The value of is 3**, That is to say, it will be copied again 2 Share , Together with itself, a total of 3 Copies .
HDFS Support traditional Hierarchical file organization structure . Users can create directories , Then save the files in these directories . The file system namespace hierarchy is similar to most existing file systems ： Users can create 、 Delete 、 Move or rename files .
Namenode Responsible for maintaining the file system namespace The name space , Any changes to the file system namespace or properties will be Namenode recorded .
HDFS Will provide the client with a Unified abstract directory tree , The client accesses the file through the path , Form like ：hdfs://namenode:port/dir-a/dir-b/dir-c/file.data.
stay HDFS in ,Namenode There are two types of managed metadata ：
Ø Attribute information of the file itself
File name 、 jurisdiction , Modification time , file size , replicator , Block size .
Ø File block location mapping information
Record file blocks and DataNode Mapping information between , Which block is on which node .
Each of the documents block The specific storage management of DataNode Node commitment . every last block Can be in multiple DataNode On storage .
copyright：author[yida＆yueda]，Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751409777.html