[email protected] 2022-06-24 07:33:02 阅读数:631
Full text indexing , It is a way to create inverted indexes , Ways to quickly match document content . and B+ Tree index is the same , Inverted index is also an index structure , An inverted index is composed of all non repeated word segmentation in the document and the mapping of its document . Inverted indexes generally have two different structures , One is inverted file index, The other is full inverted index.
(1)inverted file index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID)}
Number | Text | Documents |
---|---|---|
1 | how | (1,3) |
2 | are | (1,3) |
3 | you | (1,3) |
4 | fine | (2,4) |
5 | thanks | (2,4) |
(2)full inverted index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID: In the document )}
Number | Text | Documents |
---|---|---|
1 | how | (1:1),(3:1) |
2 | are | (1:2),(3:2) |
3 | you | (1:3),(3:3) |
4 | fine | (2:1),(4:1) |
5 | thanks | (2:2),(4:2) |
stay MySQL InnoDB in , When a full-text index is created , A series of auxiliary tables will be created at the same time , Information for storing inverted indexes .
mysql> CREATE TABLE opening_lines ( id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, opening_line TEXT(500), author VARCHAR(200), title VARCHAR(200), FULLTEXT idx (opening_line) ) ENGINE=InnoDB; mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE name LIKE 'test/%'; +----------+----------------------------------------------------+-------+ | table_id | name | space | +----------+----------------------------------------------------+-------+ | 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 | | 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 | | 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 | | 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 | | 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 | | 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 | | 330 | test/FTS_0000000000000147_BEING_DELETED | 286 | | 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 | | 332 | test/FTS_0000000000000147_CONFIG | 288 | | 328 | test/FTS_0000000000000147_DELETED | 284 | | 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 | | 327 | test/opening_lines | 283 | +----------+----------------------------------------------------+-------+
(1)FTS_0000000000000147_00000000000001c9_INDEX_1-6: this 6 Auxiliary tables are used to store inverted indexes , Stored is the participle 、 file ID And location ; namely InnoDB It's using full inverted index.
(2)FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE:FTS_0000000000000147_DELETED What is stored is what has been deleted 、 Documents that have not been removed from full-text index data ,FTS_0000000000000147_DELETED_CACHE Is its cache table .
(3)FTS_0000000000000147_BEING_DELETED/FTS_0000000000000147_BEING_DELETED_CACHE:FTS_0000000000000147_BEING_DELETED What is stored is what has been deleted 、 Documents that are being removed from full-text index data ,FTS_0000000000000147_BEING_DELETED_CACHE Is its cache table .
(4)FTS_0000000000000147_CONFIG: Store internal information about full-text indexes ; The most important thing is to store FTS_SYNCED_DOC_ID, Represents a document that has been parsed and flushed ; Happen when crash recovery when , Can pass FTS_SYNCED_DOC_ID To determine which documents have not been swiped 、 It needs to be re parsed and added to the full-text index cache .
If when inserting a document , It is necessary to carry out word segmentation 、 Operations such as updating auxiliary tables , That could cost a lot . To avoid this problem ,InnoDB Full text index cache is introduced , Used to cache recently inserted data , The data will not be written to the auxiliary table in batches until the cache is full ; Can pass INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE Query recently inserted data ; Can pass innodb_ft_cache_size/innodb_ft_total_cache_size Parameters control a single table / Full text index cache size for all tables ; Another thing to note , Full text index cache , Only the recently inserted data is cached , Instead of caching the data of the auxiliary table , When the result is returned , You need to merge the data of the auxiliary table and the recently inserted data in the cache before returning .
If you delete a document , You need to update the auxiliary table , This can also be costly . To avoid this problem ,InnoDB Only deleted documents will be recorded in FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE surface , It will not be deleted from the auxiliary table , If you want to thoroughly clean up the deleted data , Need to pass through optimize table Rebuild full text index .
mysql> set GLOBAL innodb_optimize_fulltext_only=ON; Query OK, 0 rows affected (0.01 sec) mysql> OPTIMIZE TABLE opening_lines; +--------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------+----------+----------+----------+ | test.opening_lines | optimize | status | OK | +--------------------+----------+----------+----------+ 1 row in set (0.01 sec)
For data updates ,InnoDB Data is deleted first 、 And then insert the data , Refer to the above for the specific operation process .
We mentioned before , When a full-text index is created , A series of auxiliary tables are also created at the same time , Used to store information about full-text indexes ; however , We can't directly query these auxiliary tables , Only by querying information_schema Under the encapsulated temporary table to monitor the full-text index status , As follows :
INNODB_FT_CONFIG INNODB_FT_INDEX_TABLE INNODB_FT_INDEX_CACHE INNODB_FT_DEFAULT_STOPWORD INNODB_FT_DELETED INNODB_FT_BEING_DELETED
Syntax of full-text indexing , The syntax is not very different from that of a normal index , It's as follows :
(1) Create full text index
alter table $table_name add fulltext index $index_name($column_name); create fulltext index $index_name on $table_name($column_name);
(2) Delete full text index
alter table $table_name drop index $index_name;
(3) Inquire about
select xxx from $table_name where match($column_name) against(xxx);
In some specific situations , Full text indexing is still very useful , Can greatly speed up the query speed ; however ,MySQL The full-text index of has great limitations , For example, it is not supported to specify the delimiter of the participle ( Default is space ),ngram The parser can specify fixed length participles , But the practicality is still poor . If it is a scenario with high requirements for Full-text Retrieval , Recommended or used ES Products such as .
copyright:author[[email protected]],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/175/20210630195005941p.html