yida＆yueda 2022-02-13 07:51:41 阅读数:741
a、 NameNode yes HDFS At the heart of , The main character of the cluster , go by the name of Master.
b、 NameNode Storage management only HDFS Metadata ： file system namespace Operation and maintenance directory tree , Location information of files and blocks .
c、 NameNode Do not store actual data or data sets . The data itself is actually stored in DataNodes in .
d、 NameNode know HDFS The block list of any given file in and its location . Use this information NameNode Know how to build files from blocks .
e、 NameNode The location of each block in each file is not persisted DataNode Location information for , This information will be sent from... At system startup DataNode Reconstruction in the report .
f、 NameNode about HDFS crucial , When NameNode closed ,HDFS / Hadoop The cluster cannot access .
g、 NameNode yes Hadoop Single point of failure in a cluster .
h、 NameNode The machine is usually configured with a large amount of memory （RAM）.
a、 DataNode Responsible for storing the actual data in HDFS in . Is the slave role of the cluster , go by the name of Slave.
b、 DataNode Startup time , It publishes itself to NameNode And report the list of blocks you are responsible for holding .
c、 according to NameNode Instructions , Execution block creation 、 Copy 、 Delete operation .
d、 DataNode On a regular basis （dfs.heartbeat.interval Configuration Item Configuration , The default is 3 second ） towards NameNode Send a heartbeat , If NameNode I haven't received... For a long time DataNode The heartbeat sent , NameNode Think it's time to DataNode invalid .
e、 DataNode Regularly to NameNode Report the data block information held by yourself , The time interval of the report is taken as a parameter dfs.blockreport.intervalMsec, If the parameter is not configured, the default value is 6 Hours .
f、 DataNode The machine is usually configured with a large amount of hard disk space . Because the actual data is stored in DataNode in .
Metadata （Metadata）, Also known as intermediary data , For description data The data of （data about data）, It mainly describes the data attribute （property） Of Information , Used to support, for example, indication of storage location 、 history data 、 resources lookup 、 File recording and other functions .
stay HDFS in , Metadata mainly refers to File related metadata , from NameNode Management and maintenance . In a broad sense , because NameNode You also need to manage many DataNode node , therefore DataNode The location and health status information of are also metadata .
stay HDFS in , There are two types of file related metadata ：
Ø Attribute information of the file itself
File name 、 jurisdiction , Modification time , file size , replicator , Block size .
Ø File block location mapping information
Record file blocks and DataNode Mapping information between , Which block is on which node .
According to the storage form, it can be divided into memory metadata and metadata file , Stored in memory and on disk respectively .
In order to ensure the high efficiency of user operation metadata interaction , Low latency ,NameNode Store all metadata in memory , We call it memory metadata . The metadata in memory is the most complete , Including the attribute information of the file itself 、 File block location mapping information .
But the fatal problem with memory is , Breakpoint data missing , Data doesn't persist . therefore NameNode It also assists the metadata file to ensure the security and integrity of metadata .
Is a persistent checkpoint for memory metadata . however fsimage It contains only Hadoop Metadata information related to the attributes of files in the file system , But it does not contain information about the location of the file block . File block location information is stored only in memory , When by datanode When starting to join the cluster , towards namenode The result of data block reporting , And the data block report is performed at a specified time interval .
The action of persistence is the transfer of data from memory to disk IO The process . Would be right namenode Normal service has a certain impact , Can't persist frequently .
In order to avoid the problem of data loss between two persistence , Designed again Edits log Edit log file . What's recorded in the file is HDFS All changes （ File creation , Delete or modify ） Log , Changes made by the file system client are first logged to edits In file .
fsimage and edits Files are serialized , stay NameNode When it starts , It will be fsimage The contents of the file are loaded into memory , We'll do it later edits The operations in the file , Make the metadata in memory and the actual synchronization , The metadata in memory supports the read operation of the client , It is also the most complete metadata .
When the client HDFS Add or modify the files in , The operation record is first recorded in edits Log file , When the client operation is successful , The corresponding metadata will be updated to the memory metadata . because fsimage The papers are usually large （GB Levels are very common ）, If all the update operations go to fsimage Add... To the file , This will cause the system to run very slowly .
HDFS This design implementation begins with ： One is to update the data in memory 、 Quick query , Greatly reduces operational response time ; Second, there is a high risk of metadata loss in memory （ Power off, etc ）, Therefore, the auxiliary metadata image file （fsimage）+ Edit log file （edits） Backup mechanism to ensure the security of metadata .
NameNode Maintain the metadata of the entire file system . therefore , Accurate management of metadata , Affect HDFS The ability to provide file storage services .
copyright：author[yida＆yueda]，Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751392396.html