what? Is count (*) faster than count (1) in MySQL?

A little rain in the south of the Yangtze River 2022-01-26 18:10:45 阅读数:331

count faster count mysql

Someone told me today MySQL in count(1) Than count(*) fast , It's tolerable ? We have to break up with him .

Statement : The following discussion is based on InnoDB Storage engine ,MyISAM Because of the special situation, I will talk about it separately at the end of the article .

Say first conclusion : There is little difference between the two performances .

1. practice

I have prepared one with 100W Table of data , The table structure is as follows :

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(255) DEFAULT NULL,
`address` varchar(255) DEFAULT NULL,
`password` varchar(255) DEFAULT NULL,

You can see , There is a primary key index .

Let's count the number of records in the table in two ways , as follows :

You can see , Two article SQL The efficiency of execution is almost the same , All are 0.14s.

Let's look at the other two statistics :

id It's the primary key ,username as well as address Is a normal field .

It can be seen that , use id To statistics , There is also a lost advantage . SongGe here because the test data template is relatively small , So the effect is not obvious , Small partners can increase the amount of test data , Then the difference will be more obvious .

So what is the reason for this difference , Next, let's briefly analyze .

2. explain analysis

We use first explain Take a look at these SQL Different execution plans :

You can see , The implementation plan of the first three statistical methods is the same , The latter two are the same .

Let me compare with you here explain Different items in :

  • type: The first three type The value is index, Represents a full index scan , Just go through the whole index ( Note that the index is not the entire table ); The last two type The value is all, Express Full table scan , That is, the index... Will not be used .
  • key: The said MySQL Decide which index to use to optimize access to the table ,PRIMARY Indicates that the primary key index is used ,NULL It means no index .
  • key_len: The said MySQL Key length used , Because our primary key type is INT It's not empty , So the value is 4.
  • Extra: In this Using index It means that the optimizer only needs to access the index to get the required data ( There is no need to return the form ).

adopt explain In fact, we can roughly see that the implementation efficiency of the first three statistical methods is higher ( Because of the index ), The statistical efficiency of the latter two is relatively lower ( Useless index , Need a full table scan ).

The above analysis alone is not enough , Let's analyze it from the perspective of principle .

3. Principle analysis

3.1 Primary key index and ordinary index

Before starting the principle analysis , I want to show you B+ Trees , This is important for us to understand the following content .

Everybody knows ,InnoDB The storage structures of indexes in are B+ Trees ( As for what is B+ Trees , and B What's the difference between trees , This article will not discuss , These two alone can produce an article ), The storage of primary key index is different from that of ordinary index , The following figure shows the primary key index :

You can see , In the primary key index , The leaf node stores the data of each row .

In the ordinary index , The leaf node stores the primary key value , When we use ordinary indexes to search data , First find the primary key in the leaf node , Then take the primary key to find data in the primary key index , It's equivalent to doing two searches , This is what we usually say Back to the table operation .

3.2 Principle analysis

I don't know if my friends have noticed , We learn MySQL When ,count Functions fall into the category of aggregate functions , Namely avg、sum etc. ,count Functions are grouped with these , It is also an aggregate function .

Since it's an aggregate function , Then you need to judge the returned result set line by line , Here is a question , What is the result returned ? Let's look at :

about select count(1) from user; For this query ,InnoDB The engine will find the smallest index tree to traverse ( Not necessarily a primary key index ), But it won't read data , But read a leaf node , Just go back to 1, Finally, add up the results .

about select count(id) from user; For this query ,InnoDB The engine will traverse the entire primary key index , Then read id And back to , But because id It's the primary key , It's just B+ On the leaf node of the tree , So this process doesn't involve random IO( There is no need to go back to the table and other operations to get data from the data page ), Performance is also OK Of .

about select count(username) from user; For this query ,InnoDB The engine will traverse the whole table and scan the whole table , Read each row username Field and return , If username When defining, it sets not null, So direct Statistics username The number of ; If username It is not set at the time of definition not null, Then judge first username Is it empty , And then count it .

Finally, let's talk about select count(*) from user; , This SQL What makes it special is that it is MySQL Optimized , When MySQL notice count(*) I knew you wanted to count the total records , Will go to find the smallest index tree to traverse , Then count the number of records .

Because the primary key index ( Clustered index ) The leaf node of is data , The leaf node of the common index is the primary key value , So the index tree of ordinary index is smaller . However, in the above case , We only have the primary key index , So the final use is the primary key index .

Now? , If I modify the table above , by username Fields are also indexed , Then we'll see explain select count(*) from user; Implementation plan of :

You can see , The index used at this time is username Index , It is consistent with our previous analysis .

From the description above, we can see that , The first query has the highest performance , The second comes next ( Because you need to read id And back to ), The third worst ( Because you need a full table scan ), The query performance of the fourth is close to that of the first .

4. MyISAM Well ?

Maybe a little partner knows ,MyISAM In the engine select count(*) from user; The operation is very fast , That's because MyISAM Store the number of rows in the table directly on disk , Just read it out when you need it , So very fast .

MyISAM The reason why the engine does this , Mainly because it does not support transactions , So its statistics are actually very easy , Just add one line of record .

And we often use InnoDB But you can't do that ! Why? ? because InnoDB Support transactions ! To support transactions ,InnoDB Introduced MVCC Multi version concurrency control , So there may be dirty reading when reading data 、 Unreal reading and non repeatable reading , For details, please refer to https://www.bilibili.com/video/BV14L4y1B7mB video .

therefore ,InnoDB You need to take out each row of data , Judge whether this row of data is visible to the current session , If visible , Just count the data in this line , Otherwise, it will not be counted .

Of course ,MySQL Medium MVCC It's actually a very grand topic , Brother song will introduce to you in detail when he is free MVCC.

All right. , Now you guys understand ? If you have any questions, please leave a message to discuss .

copyright:author[A little rain in the south of the Yangtze River],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/01/202201261810408243.html