Song Baohua 2022-01-26 20:52:27 阅读数:564
Original author :dog250
Link to the original text :https://blog.csdn.net/dog250/article/details/48487103
As the first in this series , Let me first describe slab System . Because I have been working with colleagues in recent days , Friends have discussed this topic , And I think this theme is quite typical , So as the first article . In fact, according to the operating system theory , Process management should be more important , According to my own interests ,IO Management, and TCP/IP The protocol stack will have more weight , About these things , I'll give it one after another .
Linux Kernel slab From a very simple idea , That is, prepare some in advance and allocate them frequently , Released data structure . But standard slab The implementation is too complex and the maintenance cost is huge , Therefore, smaller slub, Therefore, this paper discusses slub, All mentioned later slab The place of , It's all about slub. In addition, because this paper mainly describes the content of kernel optimization , It's not an introduction to the basic principles , So I want to know slab For details and code implementation, please Baidu or see the source code .
single CPU Purely slab
The following figure shows a single CPU On slab Scenario sequence when allocating and releasing objects :
It can be seen that , It's very simple , And completely achieved slab The goal at the beginning of the design .
Extend to multi-core CPU
Now let's simply extend the above model to multi-core CPU, Similarly, the similar allocation sequence is shown in the figure below :
We see , There is only one slab When , If more than one CPU Assign objects at the same time , Conflict is inevitable , Almost the only way to solve the conflict is to lock the queue , However, this will greatly increase the delay , We see , The whole delay of applying for a single object is from T0 Start , To T4 end , It's been too long .
many CPU The direct idea of lock free parallelization operation - Copy to each CPU The same set of data structures .
The only way is to increase “ Every time CPU Variable ”. about slab for , It can be extended to the following :
If you think it's so simple, it's over , Then it makes no sense .
problem
First , Let's look at a simple question , If someone alone CPU Of slab The cache has no objects to allocate , But the others CPU Of slab There are still a large number of free objects in the cache , As shown in the figure below :
It's possible , Because for a single slab The need is and the CPU Process executed on / Threads are closely related , For example if CPU0 Only the network , Then it will be right skb And other data structures have a lot of requirements , For the last question in the figure above , If we choose to assign a new... From the partner system page( perhaps pages, Depends on object size and slab cache Of order), Then over time, it will cause slab stay CPU Uneven distribution between , More likely to eat a lot of physical memory , This is what you don't want to see .
Before proceeding , The first thing to be clear is , We need to be in CPU There's a balance between slab, And these must depend on slab The internal mechanism completes itself , This and process are in CPU Inter server load balancing is completely different , For the process , Have a core scheduling mechanism , For example, based on time slice , Or the step rate of the virtual clock , But for slab, It's entirely up to the user , As long as the object is still in use , You can't deprive users of the right to continue to use , Unless the user releases . therefore slab Load balancing must be designed to be cooperative , It's not preemptive .
Okay . Now we know that , Reassign a from the partner system page(s) It's not a good idea , It should be the final decision , Before executing it , First try another route .
Now? , We lead to the second question , As shown in the figure below :
No one can guarantee the distribution slab Object's CPU And release slab Object's CPU Is the same CPU, No one can guarantee a CPU In a slab No new is assigned during the life cycle of the object page(s), No one has specified the complex operations during this period . How to solve these problems ? in fact , Understand how these problems are solved , One slab The framework is completely understood .
Problem solving - layered slab cache
Continuously variable transmission is always desirable .
If one CPU Of slab The cache is full , Just grab other people at the same level CPU Of slab Caching is considered reckless and immoral . So why not set another slab cache , Getting the objects in it is not like getting them directly CPU Of slab Caching is so simple and straightforward , But the difficulty is not big , Just a little more consumption , Isn't that good ? in fact ,CPU Of L1,L2,L3 cache Isn't that what this scheme is designed for ? This has actually become cache The only way to design . This design idea also works on slab, Namely Linux Kernel slub Realization .
Now you can give the concept and explanation .
Linux kernel slab cache: One is divided into 3 The object of the layer cache Model .
Level 1 slab cache: A linked list of free objects , Every CPU An exclusive cache, Allocating and releasing objects does not require locking .
Level 2 slab cache: A linked list of free objects , Every CPU A shared page(s) cache, When allocating a release object, you only need to lock it page(s), And Level 1 slab cache Mutually exclusive , Don't tolerate each other .
Level 3 slab cache: One page(s) Linked list , Every NUMA NODE All of the CPU Shared cache, Unit is page(s), After acquisition, it is promoted to the corresponding CPU Of Level 1 slab cache, At the same time page(s) As Level 2 The share of page(s) There is .
share page(s): The page(s) By one or more CPU occupy , every last CPU In the page(s) You can have a linked list of free objects that are not sufficient for each other , The page(s) Have a unique Level 2 slab cache Free list , The linked list is the same as one or more of the above Level 1 slab cache Free linked lists do not conflict , Multiple CPU Get the Level 2 slab cache We must fight for , After obtaining, you can promote the linked list to your own Level 1 slab cache.
The slab cache The diagram of this is as follows :
Its behavior is shown in the figure below :
2 A scenario
For the regular object allocation process , The following figure shows the details :
in fact , For multiple CPU Share a page(s) The situation of , There can be another way to play , As shown in the figure below :
Buddy system
We have a brief experience Linux Kernel slab Design , Not too long , It's too long to understand . But in the end , If Level 3 And didn't get page(s), Then it will eventually fall to the ultimate partner system .
The partner system is designed to prevent fragmentation of memory allocation , So it does two things as much as possible :
1). Try to allocate as much memory as possible
2). Try to combine successive small pieces of memory into a large memory
We can understand the above principle through the following diagram :
Be careful , This article is about optimization , Not a partner system , So I assume that you have understood the partner system .
Whereas slab Most cache objects are no more than 1 Small structure of two pages ( Not only slab System , exceed 1 Compared to the memory requirements of pages 1 Memory requirements for pages , Very few ), Therefore, there will be a large number of targeted 1 Memory allocation requirements for pages . From the distribution principle of partner system , If you continue to allocate a large number of single pages , There will be a lot of order Greater than 0 Split your page into a single page , In a single core CPU On , That's not a problem , But in multi-core CPU On , Because of each CPU Will be allocated , And the division of the partner system , The merge operation will involve a large number of linked list operations , The cost of this lock is huge , So it needs to be optimized !
Linux The kernel allocates the partner system in batches according to the allocation requirements of a single page “ Every time CPU Single page cache ” The way !
every last CPU Have a single page cache pool , When you need a single page , You can start from the current... Without locking CPU Get pages from the corresponding page pool . And when there are not enough pages in the pool , The system will pull a pile of pages from the partner system to the pool in batch , In turn, , When a single page is released , Will release it to every CPU In a single page cache .
In order to maintain “ Every time CPU Single page cache ” There won't be too many or too few pages in ( Too much will affect the partner system , Too little will affect CPU The needs of ), The system maintains two values , When the number of cached pages is less than low When it's worth it , Get pages from the partner system in batch to the pool , When the number of cached pages is greater than high When , Some pages will be released to the partner system .
Summary
many CPU In the operating system kernel , The key cost is the cost of locks . I think this is caused by the initial design , Because in the beginning , Multicore CPU It didn't show up , Single core CPU Almost all sharing protection on can be used “ No interruptions ”,“ Prohibition of preemption ” To make it simple , In the multi-core era , The operating system also simply translates to the new platform , Therefore, the synchronization operation is added later on the basis of single core . simply , The current mainstream operating systems were created in the era of single core , Therefore, they all conform to the single core environment , For multi-core environments , Maybe there was something wrong with their design at the beginning .
Anyway? , The only way to optimize operation is to prohibit or minimize lock operation . The idea is to create a shared key data structure " Every time CPU The cache of “, There are two types of caches :
1). Data path cache .
Data structures such as routing tables , You can use it. RCU Lock to protect , Of course, if for each CPU Create a local routing table cache , It is good , The question now is when to update them , Because all caches are level , Therefore, a batch synchronization mechanism is necessary .
2). Management mechanism cache .
such as slab Object caching , Its life cycle depends entirely on the user , Therefore, there is no synchronization problem , However, there are management problems . Use graded cache Your thoughts are good , This is very similar to CPU Of L1/L2/L3 cache , The cost of using this smoothing gradually increases , The mechanism of increasing capacity , And cooperate with well-designed replacement / Swap out and other algorithms , The effect is very obvious .
copyright:author[Song Baohua],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/01/202201262052248170.html