HDFS recycle bin, trash mechanism, trash checkpoint, use of snapshot function, recovery of data through snapshot snapshot, backup of data, implementation of HDFS snapshot

yida&yueda 2022-02-13 07:51:39 阅读数:583

hdfs recycle bin trash mechanism

HDFS The recycle bin 、Trash Mechanism 、Trash Checkpoint、 Snapshot function use

1、 The recycle bin

The function of the recycle bin gives us a dose “ Regret ”. The recycle bin holds deleted files 、 Folder 、 picture 、 Shortcuts, etc . These items will remain in the recycle bin , Until you empty the recycle bin . Many of our mistakenly deleted files were found in it .

HDFS It is also a file system , Then it will involve the deletion of file data . By default ,HDFS There is no recycle bin in the trash can Conceptual , The deleted data will be deleted directly , No regret medicine .
 Insert picture description here

1.1 Function Overview

Trash Mechanism , It's called a recycle bin or trash can .Trash It's like Windows Like the recycle bin in the operating system . Its purpose is to prevent you from inadvertently deleting something . It's not on by default .

Enable Trash After function , from HDFS When deleting something in , Files or directories will not be cleared immediately , They will be moved to the recycle bin Current Directory (/user/${username}/.Trash/current).

.Trash Files in are permanently deleted after a user configurable time delay . You can also simply move the files in the recycle bin to .Trash Recover files and directories in the recycle bin from a location outside the directory .

1.1.1 Trash Checkpoint

A checkpoint is just a directory under the user's Recycle Bin , Used to store all files or directories deleted before the checkpoint is created . If you want to check the recycle bin directory , Can be in /user/${username}/.Trash/{timestamp_of_checkpoint_creation} I can see :

Recently deleted files are moved to the recycle bin Current Catalog , And within a configurable time interval ,HDFS Will be in Current Create checkpoints for files in the recycle bin directory /user/${username}/.Trash/< date >, And delete the old checkpoint when it expires .

1.2 Function of open

1.2.1 close HDFS colony

On the node , Execute one click Close HDFS Cluster command :stop-dfs.sh.

1.2.2 modify core-site.xml file

Modify on the node core-site.xml file , Add the following two properties :

vim /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml

<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>

fs.trash.interval: Minutes , When this number of minutes is exceeded, the checkpoint will be deleted . If it's zero ,Trash The recycle bin feature will be disabled .

fs.trash.checkpoint.interval: Check point creation interval ( In minutes ). Its value should be less than or equal to fs.trash.interval. If it's zero , Set the value to fs.trash.interval Value . Every time you run a checkpoint , It will create a new checkpoint from the current version , And delete the checkpoint created a few minutes ago .

1.2.3 start-up HDFS colony

On the node , Execute one click start HDFS Cluster command :start-dfs.sh.

1.3 Function use

1.3.1 Delete file to Trash

Turn on Trash After function , Delete normally , The file will not actually be deleted directly , Instead, it was moved to the garbage collection bin .

 Insert picture description here
Of course, you can go to Trash Check under the recycle bin :
 Insert picture description here

1.3.2 Delete file skip Trash

sometimes , We want to delete the file directly , There's no need to go through Trash Recycle bin , You can add a parameter when performing the delete operation :

-skipTrash
hadoop fs -rm -skipTrash /smallfile1/3.txt

1.3.3 from Trash Recover files in

The files in the recycle bin , Before expiration is automatically deleted , Can be recovered by command . Use mv、cp Command to remove the data file from Trash Just copy and move it out of the directory .

hadoop fs -mv /user/root/.Trash/Current/smallfile1/* /smallfile1/

1.3.4 Empty Trash

except fs.trash.interval Parameter controls automatic deletion after expiration , The user can also manually empty the recycle bin through the command , Release HDFS Disk storage space .

The first thought is to delete the entire recycle bin directory , The recycle bin will be emptied , It's a choice . Besides .HDFS A command line tool is provided to do this :

hadoop fs -expunge

This command immediately deletes expired checkpoints from the file system .

2. Snapshot snapshot

2.1 Snapshot introduction and function

HDFS snapshot yes HDFS The entire file system , Or the image of a directory at a certain time . The image will not be updated dynamically with the change of the source directory . The snapshot can be understood as the projection of the moment when the photo is taken , After that time , There will be a new projection .

HDFS The core functions of snapshots include : Data recovery 、 The data backup 、 The test data .

2.1.1 Data recovery

You can create important directories by scrolling snapshot The operation of , In this way, there are multiple snapshot versions for a directory in the system . When a user deletes a file by mistake , You can use the latest snapshot To perform relevant recovery operations .

2.1.2 The data backup

have access to snapshot For the entire cluster , Or some directory 、 Backup of files . The administrator at a certain time snapshot As the starting point of backup , Then by comparing the differences between different backups , For incremental backup .

2.1.3 The test data

Test or experiment on some important data , It may destroy the original data directly . You can temporarily create a... For the data you want to operate snapshot, Then let the user in the corresponding snapshot Carry out relevant experiments and tests on , So as to avoid the destruction of the original data .

2.2 HDFS Implementation of snapshot

In understanding HDFS How the snapshot function is implemented , First of all, there is a fundamental principle to remember : Snapshots are not simple copies of data , Snapshots only record differences . This principle is applicable in the snapshot concept of many other systems , Such as disk snapshot , It also does not save real data . Because the actual data is not saved , Therefore, snapshot generation is often very fast .

stay HDFS in , If you are in one of the directories, such as /A Next create a snapshot , Then the snapshot file will exist with /A Completely consistent subdirectory file structure and corresponding attribute information under the directory , You can also see the specific file contents in the snapshot through the command . However, this does not mean that the snapshot has made a full copy of this data . Here's a principle : For most unchanging data , The data you see is actually what the current physical path refers to , And changed inode The data will be additionally copied by the snapshot , That is, the so-called differential copy .

inode Translated into Chinese is the index node , It is used to store the basic information of files and directories , Including time 、 name 、 The owner 、 Information about your group .

HDFS Snapshots do not replicate datanode Blocks in , Only block list and file size are recorded .

HDFS Snapshots do not affect regular HDFS Adverse effects of operation , Modify records in counterclockwise order , Therefore, you can directly access the current data . Calculate snapshot data by subtracting modifications from the current data .

2.3 Commands for snapshots

2.3.1 Snapshot function start / stop command

HDFS You can create a snapshot of the entire file system or a directory in the file system , however The prerequisite for creating a snapshot is to enable the snapshot function in the corresponding directory .

If you create a snapshot for a directory that does not have the snapshot function started, an error will be reported .

Enable snapshot function :

hdfs dfsadmin -allowSnapshot /allenwoon

Disable the snapshot function :

hdfs dfsadmin -disallowSnapshot /allenwoon

2.3.2 Snapshot operation related commands

[[email protected] ~]# hdfs dfs
Usage: hadoop fs [generic options]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[[email protected] ~]# hdfs lsSnapshottableDir
[[email protected] ~]# hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>

The snapshot related operation commands are :createSnapshot Create a snapshot 、deleteSnapshot Delete snapshot 、renameSnapshot Rename snapshot 、lsSnapshottableDir List the snapshot directories 、snapshotDiff Get snapshot difference report .

copyright:author[yida&yueda],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130751371615.html