HDFS roles and responsibilities super detailed overview namenode, datanode, metadata management + fsimage memory image file + edits log editing log, metadata loading sequence

yida&yueda 2022-02-13 07:51:41 阅读数:741

hdfs roles responsibilities super detailed

HDFS Super detailed overview of roles and responsibilities Namenode、Datanode、 Metadata management

1、Namenode duty

a、 NameNode yes HDFS At the heart of , The main character of the cluster , go by the name of Master.

b、 NameNode Storage management only HDFS Metadata : file system namespace Operation and maintenance directory tree , Location information of files and blocks .

c、 NameNode Do not store actual data or data sets . The data itself is actually stored in DataNodes in .

d、 NameNode know HDFS The block list of any given file in and its location . Use this information NameNode Know how to build files from blocks .

e、 NameNode The location of each block in each file is not persisted DataNode Location information for , This information will be sent from... At system startup DataNode Reconstruction in the report .

f、 NameNode about HDFS crucial , When NameNode closed ,HDFS / Hadoop The cluster cannot access .

g、 NameNode yes Hadoop Single point of failure in a cluster .

h、 NameNode The machine is usually configured with a large amount of memory (RAM).

2、 Datanode duty

a、 DataNode Responsible for storing the actual data in HDFS in . Is the slave role of the cluster , go by the name of Slave.

b、 DataNode Startup time , It publishes itself to NameNode And report the list of blocks you are responsible for holding .

c、 according to NameNode Instructions , Execution block creation 、 Copy 、 Delete operation .

d、 DataNode On a regular basis (dfs.heartbeat.interval Configuration Item Configuration , The default is 3 second ) towards NameNode Send a heartbeat , If NameNode I haven't received... For a long time DataNode The heartbeat sent , NameNode Think it's time to DataNode invalid .

e、 DataNode Regularly to NameNode Report the data block information held by yourself , The time interval of the report is taken as a parameter dfs.blockreport.intervalMsec, If the parameter is not configured, the default value is 6 Hours .

f、 DataNode The machine is usually configured with a large amount of hard disk space . Because the actual data is stored in DataNode in .

3、 Namenode Metadata management

3.1 What is metadata

Metadata (Metadata), Also known as intermediary data , For description data The data of (data about data), It mainly describes the data attribute (property) Of Information , Used to support, for example, indication of storage location 、 history data 、 resources lookup 、 File recording and other functions .

stay HDFS in , Metadata mainly refers to File related metadata , from NameNode Management and maintenance . In a broad sense , because NameNode You also need to manage many DataNode node , therefore DataNode The location and health status information of are also metadata .

3.2 Overview of Metadata Management

stay HDFS in , There are two types of file related metadata :

  • Ø Attribute information of the file itself

    File name 、 jurisdiction , Modification time , file size , replicator , Block size .

  • Ø File block location mapping information

    Record file blocks and DataNode Mapping information between , Which block is on which node .

    According to the storage form, it can be divided into memory metadata and metadata file , Stored in memory and on disk respectively .

3.2.1 Memory metadata

In order to ensure the high efficiency of user operation metadata interaction , Low latency ,NameNode Store all metadata in memory , We call it memory metadata . The metadata in memory is the most complete , Including the attribute information of the file itself 、 File block location mapping information .

But the fatal problem with memory is , Breakpoint data missing , Data doesn't persist . therefore NameNode It also assists the metadata file to ensure the security and integrity of metadata .

3.2.2 Disk metadata file

2.2.2.1 fsimage Memory image file

Is a persistent checkpoint for memory metadata . however fsimage It contains only Hadoop Metadata information related to the attributes of files in the file system , But it does not contain information about the location of the file block . File block location information is stored only in memory , When by datanode When starting to join the cluster , towards namenode The result of data block reporting , And the data block report is performed at a specified time interval .

The action of persistence is the transfer of data from memory to disk IO The process . Would be right namenode Normal service has a certain impact , Can't persist frequently .

2.2.2.2 Edits log Edit log

In order to avoid the problem of data loss between two persistence , Designed again Edits log Edit log file . What's recorded in the file is HDFS All changes ( File creation , Delete or modify ) Log , Changes made by the file system client are first logged to edits In file .

3.2.3 Load metadata order

fsimage and edits Files are serialized , stay NameNode When it starts , It will be fsimage The contents of the file are loaded into memory , We'll do it later edits The operations in the file , Make the metadata in memory and the actual synchronization , The metadata in memory supports the read operation of the client , It is also the most complete metadata .

When the client HDFS Add or modify the files in , The operation record is first recorded in edits Log file , When the client operation is successful , The corresponding metadata will be updated to the memory metadata . because fsimage The papers are usually large (GB Levels are very common ), If all the update operations go to fsimage Add... To the file , This will cause the system to run very slowly .

HDFS This design implementation begins with : One is to update the data in memory 、 Quick query , Greatly reduces operational response time ; Second, there is a high risk of metadata loss in memory ( Power off, etc ), Therefore, the auxiliary metadata image file (fsimage)+ Edit log file (edits) Backup mechanism to ensure the security of metadata .

NameNode Maintain the metadata of the entire file system . therefore , Accurate management of metadata , Affect HDFS The ability to provide file storage services .

copyright:author[yida&yueda],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751392396.html