MySQL case: analysis of full-text indexing

[email protected] 2022-06-24 07:33:02 阅读数:631

mysqlcaseanalysisfull-texttext

Preface

Full text indexing , It is a way to create inverted indexes , Ways to quickly match document content . and B+ Tree index is the same , Inverted index is also an index structure , An inverted index is composed of all non repeated word segmentation in the document and the mapping of its document . Inverted indexes generally have two different structures , One is inverted file index, The other is full inverted index.

(1)inverted file index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID)}

Number

Text

Documents

1

how

(1,3)

2

are

(1,3)

3

you

(1,3)

4

fine

(2,4)

5

thanks

(2,4)

(2)full inverted index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID: In the document )}

Number

Text

Documents

1

how

(1:1),(3:1)

2

are

(1:2),(3:2)

3

you

(1:3),(3:3)

4

fine

(2:1),(4:1)

5

thanks

(2:2),(4:2)

Realization principle

Auxiliary table

stay MySQL InnoDB in , When a full-text index is created , A series of auxiliary tables will be created at the same time , Information for storing inverted indexes .

mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+

(1)FTS_0000000000000147_00000000000001c9_INDEX_1-6: this 6 Auxiliary tables are used to store inverted indexes , Stored is the participle 、 file ID And location ; namely InnoDB It's using full inverted index.

(2)FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE:FTS_0000000000000147_DELETED What is stored is what has been deleted 、 Documents that have not been removed from full-text index data ,FTS_0000000000000147_DELETED_CACHE Is its cache table .

(3)FTS_0000000000000147_BEING_DELETED/FTS_0000000000000147_BEING_DELETED_CACHE:FTS_0000000000000147_BEING_DELETED What is stored is what has been deleted 、 Documents that are being removed from full-text index data ,FTS_0000000000000147_BEING_DELETED_CACHE Is its cache table .

(4)FTS_0000000000000147_CONFIG: Store internal information about full-text indexes ; The most important thing is to store FTS_SYNCED_DOC_ID, Represents a document that has been parsed and flushed ; Happen when crash recovery when , Can pass FTS_SYNCED_DOC_ID To determine which documents have not been swiped 、 It needs to be re parsed and added to the full-text index cache .

Insert data into

If when inserting a document , It is necessary to carry out word segmentation 、 Operations such as updating auxiliary tables , That could cost a lot . To avoid this problem ,InnoDB Full text index cache is introduced , Used to cache recently inserted data , The data will not be written to the auxiliary table in batches until the cache is full ; Can pass INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE Query recently inserted data ; Can pass innodb_ft_cache_size/innodb_ft_total_cache_size Parameters control a single table / Full text index cache size for all tables ; Another thing to note , Full text index cache , Only the recently inserted data is cached , Instead of caching the data of the auxiliary table , When the result is returned , You need to merge the data of the auxiliary table and the recently inserted data in the cache before returning .

Data deletion

If you delete a document , You need to update the auxiliary table , This can also be costly . To avoid this problem ,InnoDB Only deleted documents will be recorded in FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE surface , It will not be deleted from the auxiliary table , If you want to thoroughly clean up the deleted data , Need to pass through optimize table Rebuild full text index .

mysql> set GLOBAL innodb_optimize_fulltext_only=ON;
Query OK, 0 rows affected (0.01 sec)
mysql> OPTIMIZE TABLE opening_lines;
+--------------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| test.opening_lines | optimize | status | OK |
+--------------------+----------+----------+----------+
1 row in set (0.01 sec)

Data update

For data updates ,InnoDB Data is deleted first 、 And then insert the data , Refer to the above for the specific operation process .

Watch

We mentioned before , When a full-text index is created , A series of auxiliary tables are also created at the same time , Used to store information about full-text indexes ; however , We can't directly query these auxiliary tables , Only by querying information_schema Under the encapsulated temporary table to monitor the full-text index status , As follows :

INNODB_FT_CONFIG
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_DELETED
INNODB_FT_BEING_DELETED

Basic grammar

Syntax of full-text indexing , The syntax is not very different from that of a normal index , It's as follows :

(1) Create full text index

alter table $table_name add fulltext index $index_name($column_name);
create fulltext index $index_name on $table_name($column_name);

(2) Delete full text index

alter table $table_name drop index $index_name;

(3) Inquire about

select xxx from $table_name where match($column_name) against(xxx);

summary

In some specific situations , Full text indexing is still very useful , Can greatly speed up the query speed ; however ,MySQL The full-text index of has great limitations , For example, it is not supported to specify the delimiter of the participle ( Default is space ),ngram The parser can specify fixed length participles , But the practicality is still poor . If it is a scenario with high requirements for Full-text Retrieval , Recommended or used ES Products such as .

copyright:author[[email protected]],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/175/20210630195005941p.html