MySQL中limit分页优化的方法与原理
MySQL中的limit结构是limit offset, rows或limit rows,这点不再详述。
以http://ourjnu.com/ 的一个Apache日志数据表为例:
表名jnubbs_logs;
字段有id、ip、date(日期)、time(访问的unix时间戳)等等;
主键id,time是索引;
数据量略多于20000000条。
常规分页方式:
mysql> EXPLAIN SELECT * FROM jnubbs_logs LIMIT 20000000, 20; +----+-------------+-------------+------+---------------+------+---------+------+----------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+---------------+------+---------+------+----------+-------+ | 1 | SIMPLE | jnubbs_logs | ALL | NULL | NULL | NULL | NULL | 20281680 | | +----+-------------+-------------+------+---------------+------+---------+------+----------+-------+ 1 row in set (0.01 sec)
直接执行以下语句:
mysql> SELECT id, date, ip FROM jnubbs_logs ORDER BY time DESC LIMIT 20000000, 20;
结果:
+----------+------------+-----------------+ | id | date | ip | +----------+------------+-----------------+ | 20282759 | 2014-12-04 | 125.211.211.43 | | 20282758 | 2014-12-04 | 66.249.75.142 | | 20282757 | 2014-12-04 | 220.181.108.96 | | 20282756 | 2014-12-04 | 123.125.71.104 | | 20282755 | 2014-12-04 | 123.125.71.58 | | 20282754 | 2014-12-04 | 123.125.71.94 | | 20282753 | 2014-12-04 | 61.135.189.179 | | 20282752 | 2014-12-04 | 54.94.225.255 | | 20282751 | 2014-12-04 | 220.181.108.147 | | 20282750 | 2014-12-04 | 123.125.71.31 | | 20282749 | 2014-12-04 | 220.181.108.111 | | 20282748 | 2014-12-04 | 123.125.71.71 | | 20282747 | 2014-12-04 | 123.125.71.12 | | 20282746 | 2014-12-04 | 157.55.39.144 | | 20282745 | 2014-12-04 | 220.181.108.119 | | 20282744 | 2014-12-04 | 220.181.108.173 | | 20282743 | 2014-12-04 | 123.125.71.84 | | 20282742 | 2014-12-04 | 220.181.108.168 | | 20282741 | 2014-12-04 | 123.125.71.51 | | 20282740 | 2014-12-04 | 123.125.71.94 | +----------+------------+-----------------+ 20 rows in set (2 min 30.87 sec)
非常慢是不是?
子查询分页方式:
mysql> SELECT id, date, ip FROM jnubbs_logs WHERE time >= ( SELECT time FROM jnubbs_logs ORDER BY time DESC LIMIT 20000000, 1 ) ORDER BY time DESC LIMIT 20;
结果:
+----------+------------+-----------------+ | id | date | ip | +----------+------------+-----------------+ | 20282759 | 2014-12-04 | 125.211.211.43 | | 20282758 | 2014-12-04 | 66.249.75.142 | | 20282757 | 2014-12-04 | 220.181.108.96 | | 20282756 | 2014-12-04 | 123.125.71.104 | | 20282755 | 2014-12-04 | 123.125.71.58 | | 20282754 | 2014-12-04 | 123.125.71.94 | | 20282753 | 2014-12-04 | 61.135.189.179 | | 20282752 | 2014-12-04 | 54.94.225.255 | | 20282751 | 2014-12-04 | 220.181.108.147 | | 20282750 | 2014-12-04 | 123.125.71.31 | | 20282749 | 2014-12-04 | 220.181.108.111 | | 20282748 | 2014-12-04 | 123.125.71.71 | | 20282747 | 2014-12-04 | 123.125.71.12 | | 20282746 | 2014-12-04 | 157.55.39.144 | | 20282745 | 2014-12-04 | 220.181.108.119 | | 20282744 | 2014-12-04 | 220.181.108.173 | | 20282743 | 2014-12-04 | 123.125.71.84 | | 20282742 | 2014-12-04 | 220.181.108.168 | | 20282741 | 2014-12-04 | 123.125.71.51 | | 20282740 | 2014-12-04 | 123.125.71.94 | +----------+------------+-----------------+ 20 rows in set (6.84 sec)
非常快了,原因呢?
原理:
limit offset, rows是获取offset+rows条数据,然后扔掉前offset条,并且limit过程中并不使用索引,即使是unique,因此limit offset, rows要尽量少用。
而子查询分页方式,主查询没有使用limit offset, rows,而是使用limit rows,这就会快很多,并且使用到了time这个索引。
而子查询中,limit的row只有1的情况下,获取到了结果就会马上结束子查询,并且SELECT的只有一列(time),虽然还是近乎全表扫描,但是时间消耗则少得多。
直接执行子查询:
mysql> SELECT time FROM jnubbs_logs ORDER BY time DESC LIMIT 20000000, 1; +------------+ | time | +------------+ | 1406337134 | +------------+ 1 row in set (5.02 sec)
而将子查询结果替换子查询,可以看到:
mysql> SELECT id, date, ip FROM jnubbs_logs WHERE time >= 1406337134 ORDER BY time DESC LIMIT 20; +----------+------------+-----------------+ | id | date | ip | +----------+------------+-----------------+ | 20282759 | 2014-12-04 | 125.211.211.43 | | 20282758 | 2014-12-04 | 66.249.75.142 | | 20282757 | 2014-12-04 | 220.181.108.96 | | 20282756 | 2014-12-04 | 123.125.71.104 | | 20282755 | 2014-12-04 | 123.125.71.58 | | 20282754 | 2014-12-04 | 123.125.71.94 | | 20282753 | 2014-12-04 | 61.135.189.179 | | 20282752 | 2014-12-04 | 54.94.225.255 | | 20282751 | 2014-12-04 | 220.181.108.147 | | 20282750 | 2014-12-04 | 123.125.71.31 | | 20282749 | 2014-12-04 | 220.181.108.111 | | 20282748 | 2014-12-04 | 123.125.71.71 | | 20282747 | 2014-12-04 | 123.125.71.12 | | 20282746 | 2014-12-04 | 157.55.39.144 | | 20282745 | 2014-12-04 | 220.181.108.119 | | 20282744 | 2014-12-04 | 220.181.108.173 | | 20282743 | 2014-12-04 | 123.125.71.84 | | 20282742 | 2014-12-04 | 220.181.108.168 | | 20282741 | 2014-12-04 | 123.125.71.51 | | 20282740 | 2014-12-04 | 123.125.71.94 | +----------+------------+-----------------+ 20 rows in set (0.00 sec)
几乎不需要时间,因此limit offset, rows一般情况下最好不要用。
回复#8 @lincanbin :
myisam表 主键就是btree的,昨天的那个表有问题,今天重新测试了下,下面是测试数据
结构
id(primary key) result(char(32) 一个md5 hash值 实际并未用到) 总数1100W
SELECT id, result FROM text2 ORDER BY id LIMIT 8000000, 20
执行时间很久
SELECT id FROM text2 ORDER BY id LIMIT 8000000, 20
平均执行时间约1.42秒
SELECT id FROM text2 WHERE id >= ( SELECT id FROM text2 ORDER BY id DESC LIMIT 8000000, 1 ) ORDER BY id DESC LIMIT 20;
平均执行时间约1.57秒
直接按索引排序并且只提取索引最快,我并不知道你的语句为什么要这么写,因为select id已经是最快了,没有必要反方向desc排序吧?不然你要提取出和(limit 8000000,20)相同的数据还要多一步计算过程?
回复#11 @delgustx :
mysql> explain SELECT -> id, -> date, -> ip -> FROM -> jnubbs_logs -> ORDER BY -> time DESC -> LIMIT 20000000, -> 20; +----+-------------+-------------+------+---------------+------+---------+------+---------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+---------------+------+---------+------+---------+----------------+ | 1 | SIMPLE | jnubbs_logs | ALL | NULL | NULL | NULL | NULL | 4277704 | Using filesort | +----+-------------+-------------+------+---------------+------+---------+------+---------+----------------+ 1 row in set (0.00 sec)
你应该explain一下,看看索引是怎么用到的。
登录后方可回帖