InnoDB storage engine

Yanglinshan little wild boar 2022-05-22 12:36:49 阅读数:848


1. InnoDB Architecture

 Insert picture description here

1.1 Background thread

The main function of the background thread is to refresh the data in the memory pool , Ensure that the memory in the buffer pool is cached with the latest data ; In addition, refresh the modified data file to the disk file , At the same time, ensure that in the case of database exceptions InnoDB Can return to normal .

There are three kinds of background threads :

  1. Master Thread( Core thread ; Mainly responsible for asynchronous refresh of data in buffer pool to disk , Ensure data consistency )
  2. IO Thread( be responsible for IO Callback processing of the request )
  3. Purge Thread( Recycle what has been used and distributed undo page )

1.2 Memory

1.2.1 Buffer pool

InnoDB The storage engine is disk based , Think of it as a disk based database system .

 Insert picture description here

A buffer pool is a memory area , Through the speed of memory to make up for the slow disk speed on the database performance .

1.2.2 LRU List、Free List、Flush List

InnoDB The buffer pool in the database is through LRU Algorithm To manage , But the algorithm is optimized ,LRU Added... To the list midpoint Location .

midpoint The following list is called old list , The previous list becomes new list , It can be simply understood as new The pages in the list are the most active hot data , The read data page will be placed in the midpoint Location .

If you only use traditional LRU Algorithm , Then put the read page directly into LRU After the first part of , The required hot data pages may be removed at the end .

InnoDB To further optimize LRU, Introduced innodb_old_blocks_time Further parameter management LRU list , This parameter indicates that the page reads midpoint How long does it take to wait after the location to join LRU The hot end of the list .

When the database just started , All pages are stored in free list in . When the database starts for a period of time , When paging from the buffer pool is required , First of all, from the free list Find out if there are free pages available in , If you have, change the page from free list Delete in , Put in LRU list in ; If there is no , Then use LRU Algorithm .

stay LRU The pages in the list are modified , This page is called a dirty page , That is, the data of the page in the buffer pool is inconsistent with that of the page on the disk . Then the database will CHECKPOINT The mechanism flushes dirty pages to disk , and Flush The pages in the list are dirty pages . It should be noted that ,LRU List and Flush There are dirty pages in the list ,LRU Lists are used to manage the availability of pages in the buffer pool ,Flush The list is used to manage refreshing pages back to disk , The two do not affect each other .

1.2.3 Redo Log Buffer

Redo log will refresh the contents of redo log buffer to redo log file on external disk in the following three cases :

  1. Master Thread Every second
  2. When each transaction is committed
  3. When the remaining space of redo log buffer pool is less than 1/2 when

1.2.4 Extra memory pool

stay InnoDB In the storage engine , Memory is managed in a way called a memory heap .

2. Checkpoint technology

If one page changes at a time , Refresh the data of the new page to disk , So the cost is very large ; If the hotspot data is concentrated in several pages , Then the performance of the database will become very poor ; At the same time, if the new version of the page is flushed from the buffer pool to the disk, there is a downtime , Then the data can't be recovered . Therefore, the current transaction database system generally adopts write ahead log Strategy , When the transaction is committed , Write the redo log first , Change the page again .

The buffer pool in the database cannot cache all the data in the database , Redo logs cannot grow indefinitely .Checkpoint The purpose of technology is to solve the following problems :

  1. Reduces database recovery time
  2. When the buffer pool is insufficient , Flush the dirty page to disk
  3. When the redo log is not available , Refresh the dirty pages

stay InnoDB In the engine , There are two kinds of Checkpoint:

  1. Sharp Checkpoint( Occurs when the database is closed and all dirty pages are flushed )
  2. Fuzzy Checkpoint ( Refresh some dirty pages when the database is running )

3. InnoDB Key features

InnoDB Key features of the storage engine include :

  1. Insert buffer
  2. Write twice
  3. adaptive hash index
  4. asynchronous IO
  5. Refresh adjacent pages

3.1 Insert buffer

InnoDB The storage engine pioneered the design of insert buffering , For nonclustered index insertion or update operations , Not every time it's inserted directly into the index page , Instead, we first determine whether the inserted nonclustered index page is in the buffer pool , If in , Directly inserted into the ; If not , Then put it into a Insert Buffer In the object .

To use insert buffer, you need to meet the following two conditions at the same time :

  1. An index is a secondary index
  2. Index is not unique

With InnoDB Version update , from 1.0.x Version started to be introduced Change Buffer, Think of it as Insert Buffer The upgrade .

Next, let's look at the internal implementation of the insert buffer . The data structure inserted into the buffer is a tree B+ Trees , It consists of leaf nodes and non leaf nodes , The non leaf node stores the queried search key, As shown below :

 Insert picture description here

space: Represents the tablespace of the table where the record to be inserted is located id( Occupy 4 byte )
marker: For compatibility with older versions ( Occupy 1 byte )
offset: Indicates the offset of the page

For inserting into Insert Buffer B+ Records of tree leaf nodes , Instead of directly inserting the record to be inserted , Instead, it needs to be constructed according to the following rules :

 Insert picture description here

3.2 Write twice

Insert buffer InnoDB The of storage engine is the improvement of performance , Write twice to bring InnoDB Storage engine is the reliability of the data page .

 Insert picture description here
When flushing dirty pages from the buffer pool , You don't write directly to the disk , But through memcpy Function to copy the dirty page to the memory first doublewrite buffer, Then divide it twice , Write to shared tablespace at one time , Write to data file at one time , And when the data file is damaged , Extract the written dirty pages from the shared table space for recovery .

3.3 adaptive hash index

InnoDB The storage engine will automatically establish hash indexes for some hot pages according to the frequency and mode of access . Adaptive hash indexing has a requirement , That is, the continuous access mode to this page must be the same .
The access mode is as follows :

  1. WHERE a = xxx
  2. WHERE a = xxx and b = xxx

If the above two access modes are used alternately , Then the adaptive hash index will not be used .

3.4 asynchronous IO

Asynchronous IO, It can improve the performance of disk operation . asynchronous IO That is, the user can send another request immediately after sending another request IO request , When all IO After the request is sent , Wait for all IO Completion of operation .

asynchronous IO Another advantage is the ability to Multiple IO Merge . If asynchronous IO Judge multiple IO When the requested page is continuous , Then it will merge into one IO operation , Without using multiple operations .

3.5 Refresh adjacent pages

When a dirty page is refreshed ,InnoDB The storage engine will detect all pages in the area where the page is located , If it's dirty , So let's refresh together . In fact, through asynchronous IO Will be multiple IO Merge operations into one IO operation .

copyright:author[Yanglinshan little wild boar],Please bring the original link to reprint, thank you.