Isolation and limitation of Linux container

captain 2022-02-13 09:03:48 阅读数:351

isolation limitation linux container

Linux Process introduction

If you want to write a small program to calculate addition , This program needs input from a file , The result of the calculation is input into another file .

Because computers only know 0 and 1, So no matter what language you write this code in , Finally, they need to be translated into binary files in some way , To run in the computer operating system .

In order to make these codes work properly , We often have to provide it with data , For example, the input file required by our addition program . This data plus the binary file of the code itself , Put it on disk , It's what we usually call Program , Also known as the executable image of code (executable image).

then , We can run this on the computer Program 了 .

First , The operating system comes from Program It is found that the input is saved in a file , So the data is loaded into memory . meanwhile , The operating system reads the instruction to calculate the addition , At this time , It needs instructions CPU Complete the addition operation . and CPU Cooperate with memory for addition calculation , It also uses registers to store values 、 The memory stack holds the executed commands and variables . meanwhile , There are open files in the computer , And all kinds of I/O The device changes its state in constant calls .

That's it , once Program To be executed , It's from the binary on disk , It becomes the data in the computer memory 、 The value in the register , Instructions in the stack 、 Open file , And a collection of various device status information . The sum of the computer execution environment after such a program runs , Is our protagonist : process .

therefore , For the process , Its static representation is the program , I usually stay quietly on the disk ; And once it's running , It becomes the sum of data and state in the computer , This is its dynamic performance ,

Linux Isolation of containers

Docker Containers are essentially Linux The process of the operating system , It's just Docker adopt namespace The resource isolation technology between processes is realized , In this way, many people will feel very abstract , Then let's learn about it through actual combat !

First, let's create a container :

# docker run -it busybox /bin/sh
/ # 

Execute... In the container PS Instructions :

/ # ps
1 root 0:00 /bin/sh
6 root 0:00 ps

You can see , We are Docker The first one in /bin/sh, It's the inside of this container 1 Process of no. (PID=1), There are only two processes running in this container . That means , What we did earlier /bin/sh, And what we just did ps, Has been Docker Isolated in a world different from the host .

How on earth did this happen ?

Originally , Every time we run a /bin/sh Program , The operating system assigns it a process number , such as PID=100. This number is the only identification of the process , It's like an employee's badge . therefore PID=100, It can be roughly understood as this /bin/sh It's the number one in our company 100 Staff number , And the first 1 No. 1 employee is bill · Gates, who is in charge of the whole . And now , We're going to pass Docker Put this /bin/sh The program runs in a container . Now ,Docker It will be in this 100 Give employee No. 1 a “ Smoke screen ”, Let him never see the others in front 99 Employees , Not to mention bill · gates . such , He mistakenly thinks he's the number one in the company 1 Staff number . Such mechanism , In fact, it is the process space of isolated applications , So that these processes can only see the recalculated process number , such as PID=1. But actually , They're in the host's operating system , It's still the original 100 Process of no. .

This kind of technology , Namely Linux Inside Namespace Mechanism . and Namespace It's also very interesting : It's just Linux An optional parameter to create a new process . We know , stay Linux The system call to create a thread in the system is clone(), such as :

int pid = clone(main_function, stack_size, SIGCHLD, NULL); 

This system call will create a new process for us , And return its process number pid.

And when we use clone() When a system call creates a new process , You can specify CLONE_NEWPID Parameters , such as :

int pid = clone(main_function, stack_size, CLONE_NEWPID | SIGCHLD, NULL); 

At this time , The newly created process will “ notice ” A new process space , In this process space , its PID yes 1. Reason why “ notice ”, Because it's just a “ Smoke screen ”, In the real process space of the host , Of this process PID It's a real number , such as 100.

Of course , We can also execute the above clone() call , This creates multiple PID Namespace, And each Namespace Application process in , Will think that they are the third in the current container 1 Process of no. , They don't see the real process space in the host , I can't see anything else PID Namespace The details in .

And in addition to what we just used PID Namespace,Linux The operating system also provides Mount、UTS、IPC、Network and User these Namespace, It is used to perform... On various process contexts “ Smoke screen ” operation . such as ,Mount Namespace, Used to make the quarantined process only see the current Namespace There's some information on it ;Network Namespace, Used to let the quarantined process see the current Namespace Network equipment and configuration in .

this , Namely Linux The most basic implementation principle of container .

therefore ,Docker The concept of container sounds mysterious and mysterious , In fact, when creating the container process , Specifies a set of... That this process needs to be enabled Namespace Parameters . such , The container can only “ see ” To the current Namespace Limited resources 、 file 、 equipment 、 state , Or configure . And for the host and other unrelated programs , It can't see at all .

So , Containers , It's actually a special process .

Linux The limitation of the container

Why do I need to do... On the container Limit Well ?

Although the first process in the container is Smoke screen We can only see the situation in the container under the interference of , But on the host , It serves as the first 100 There is still a competitive relationship between process No. 1 and all other processes , That means , Although the first 100 Process No. 1 is ostensibly isolated , But the resources it can use ( such as CPU, Memory ), It can be used by other processes on the host at any time ( Or other machines ) The amount of . Of course, this 100 The process itself may eat up all the resources . These situations , Obviously, it is not a reasonable behavior that a sandbox should mark .

Linux Cgroups What is it? ?

cgroups yes Linux Next control ( Or a group ) Resource restriction mechanism of process , The full name is control groups, It can be done to cpu、 Fine control of memory and other resources , For example, there are many Docker stay Linux The following is based on cgroups Provide resource restriction mechanism to realize resource control ; besides , Developers can also refer to directly based on cgroups To control process resources , such as 8 On the nuclear machine web Service and a computing service , It can make web Services can only use 6 A nuclear , Leave the remaining two cores to the computing service .cgroups cpu Limit not only how much you can use / Beyond which cores , You can also set cpu Occupancy ratio ( Note that the occupancy ratio is the usage ratio when each is full , If one cgroup Idle and another busy , So busy cgroup It is possible to occupy the whole cpu The core ).

stay Linux in ,Cgroups The exposed operation interface to the user is the file system , It is organized in the form of files and directories in the operating system /sys/fs/cgroup Under the path . stay Centos In the machine , We can use mount Command to show them :

/ # mount -t cgroup
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (ro,seclabel,nosuid,nodev,noexec,relatime,pids)

At present, I see , stay /sys/fs/cgroup There are many examples below cpuset、cpu、memory Such subdirectories , Also called subsystem . These are all things that my machine can be used at present Cgroups Types of resources to limit . Under the resource class corresponding to the subsystem , You can see the specific methods that such resources can be restricted .

such as , Yes CPU For subsystems , We can see the following configuration files :

/ # ls -l /sys/fs/cgroup/cpu/
total 0
-rw-r--r-- 1 root root 0 Aug 12 10:55 cgroup.clone_children
--w--w--w- 1 root root 0 Aug 12 10:55 cgroup.event_control
-rw-r--r-- 1 root root 0 Aug 12 10:55 cgroup.procs
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpu.shares
-r--r--r-- 1 root root 0 Aug 12 10:55 cpu.stat
-r--r--r-- 1 root root 0 Aug 12 10:55 cpuacct.stat
-rw-r--r-- 1 root root 0 Aug 12 10:55 cpuacct.usage
-r--r--r-- 1 root root 0 Aug 12 10:55 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Aug 12 10:55 notify_on_release
-rw-r--r-- 1 root root 0 Aug 12 10:55 tasks

Yes Linux CPU Manage familiar classmates , You should notice cfs_period and cfs_quota Such keywords . These two parameters need to be combined , Can be used to limit the length of the process cfs_period For a period of time , Can only be allocated to a total of cfs_quota Of CPU Time .

Next, let's use this configuration ?

First, we need to create a directory under the corresponding subsystem :

# cd /sys/fs/cgroup/cpu
# mkdir container
# cd container/
# ll
total 0
-rw-r--r--. 1 root root 0 Aug 12 19:38 cgroup.clone_children
--w--w--w-. 1 root root 0 Aug 12 19:38 cgroup.event_control
-rw-r--r--. 1 root root 0 Aug 12 19:38 cgroup.procs
-r--r--r--. 1 root root 0 Aug 12 19:38 cpuacct.stat
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpuacct.usage
-r--r--r--. 1 root root 0 Aug 12 19:38 cpuacct.usage_percpu
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpu.cfs_period_us
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpu.cfs_quota_us
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpu.rt_period_us
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpu.rt_runtime_us
-rw-r--r--. 1 root root 0 Aug 12 19:38 cpu.shares
-r--r--r--. 1 root root 0 Aug 12 19:38 cpu.stat
-rw-r--r--. 1 root root 0 Aug 12 19:38 notify_on_release
-rw-r--r--. 1 root root 0 Aug 12 19:38 tasks

This directory is called a control group . You'll find that , The operating system will be in your newly created container Under the table of contents , Automatically generate the resource limit file corresponding to the subsystem .

At the moment , We execute an endless loop script , Put the calculated CPU Eat to 100%

# while : ; do : ; done 
# top
7996 root 20 0 1320 256 212 R 100 0.0 1:12.75 sh 

adopt top The command can be seen ,CPU The utilization rate of has been 100%

here , We can check container A file in a directory , You can see container Control group CPU quota There are no restrictions yet (:-1)

# cat /sys/fs/cgroup/cpu/container/cpu.cfs_quota_us

Next, we set the limit by modifying these files :

towards container In the group cfs_quota File is written to 20ms(20000 us)

 echo 20000 > /sys/fs/cgroup/cpu/container/cpu.cfs_quota_us 
100ms In the time of , Restricted by this control group can only use 20MS Of CPU Time , In other words, this process can only use 20% Of CPU bandwidth

Next , We put the limited process of PID write in container In the group tasks file , The above settings will take effect for the process

# echo 7996 > /sys/fs/cgroup/cpu/container/tasks 

And then through top Look at the :

7996 root 20 0 119484 6140 1652 R 20.3 0.2 3:45.10 sh 

You can see , The computer CPU Usage immediately dropped to 20%

<u> Isn't that amazing ?</u>

except CPU Outside the subsystem ,Cgroups Each subsystem of has its own resource limitation capability : such as

  • blkio, Set... For block devices I/O Limit , Generally used for disk and other equipment
  • cpuset, Assign a separate... To the process CPU Core and corresponding memory nodes
  • memory, Set memory usage limits for processes

Linux Cgroups The design is easy to use , Simply and roughly understand , It is a combination of a subsystem directory and a set of resource limit files . And for Docker etc. Linux For the container project , They just need to be under each subsystem , Create a control group for each container ( Create a directory ), Then after starting the container process , Put this process PID Fill in the tasks Just in the file .

As for the value in the resource file under these control groups , It's up to the user to do docker run You can specify the parameters when , Such as the following command :

# docker run -it --cpu-period=10000 --cpu-quota=20000 ubuntu /bin/bash

After starting the container , We can check Cgroup Under the file system ,CPU In the subsystem ,docker The content of the resource limit file in this control group to confirm :

Click on " Read the original " Get a better reading experience !
copyright:author[captain],Please bring the original link to reprint, thank you.