Mysql同时使用分页和排序数据重复丢失问题


记录一个很常见的MySQL问题,同时使用limitorder by可能造成数据查询重复和数据丢失问题。我这里测试的MySQL版本号是5.7.20

先来看看表结构:

id(int) name(varchar) sort(int)
1 测试1 1
2 测试2 2
3 测试3 1

以此类推自行添加测试数据…sort必须要有重复的值,并且sort字段没有索引

问题重现

先看带上sort的查询:

mysql> select * from test order by sort;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
|  1 | 测试1    |    1 |
|  4 | 测试4    |    1 |
|  5 | 测试5    |    1 |
|  7 | 测试7    |    1 |
|  8 | 测试8    |    1 |
|  9 | 测试9    |    1 |
| 10 | 测试10   |    1 |
| 12 | 测试12   |    1 |
| 13 | 测试13   |    1 |
| 14 | 测试14   |    1 |
| 15 | 测试15   |    1 |
| 16 | 测试16   |    1 |
| 17 | 测试17   |    1 |
| 18 | 测试18   |    1 |
| 19 | 测试19   |    1 |
| 20 | 测试20   |    1 |
|  2 | 测试2    |    2 |
|  3 | 测试3    |    2 |
|  6 | 测试6    |    2 |
| 11 | 测试11   |    2 |
+----+----------+------+

再来带上limit执行的数据

mysql> select * from test order by sort limit 0, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
|  1 | 测试1    |    1 |
| 20 | 测试20   |    1 |
+----+----------+------+

mysql> select * from test order by sort limit 2, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  7 | 测试7   |    1 |
|  4 | 测试4   |    1 |
+----+---------+------+

mysql> select * from test order by sort limit 4, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  5 | 测试5   |    1 |
|  8 | 测试8   |    1 |
+----+---------+------+

mysql> select * from test order by sort limit 6, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  7 | 测试7   |    1 |
|  8 | 测试8   |    1 |
+----+---------+------+

mysql> select * from test order by sort limit 8, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
|  9 | 测试9    |    1 |
| 10 | 测试10   |    1 |
+----+----------+------+

mysql> select * from test order by sort limit 10, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 14 | 测试14   |    1 |
| 12 | 测试12   |    1 |
+----+----------+------+

mysql> select * from test order by sort limit 12, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 13 | 测试13   |    1 |
| 14 | 测试14   |    1 |
+----+----------+------+

可以很明显的看到查询后的数据顺序和理想中的不一样,造成了数据顺序错乱和数据丢失问题。

原因

这个问题在5.6以后的版本都会有,严格的说这并不算一个问题,MySQL官方对order by limit做了个小优化,就是在排序字段无索引并且列值不唯一时,会使用priority queue(优先队列)

这个优先队列的本质和Java中的堆一样的,可以根据limit的条数维护一个堆。

官方文档也对此做了解释:

If you combine LIMIT *row_count* with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

如果将LIMIT row_count与ORDER BY结合使用,MySQL会在找到排序结果的前row_count行后立即停止排序,而不是对整个结果进行排序。 如果通过使用索引进行排序,这将非常快。 如果必须执行文件排序,则在找到第一个row_count之前,将选择所有与查询匹配的,没有LIMIT子句的行,并对其中的大多数或全部进行排序。 找到初始行后,MySQL不会对结果集的其余部分进行排序。此行为的一种体现是,带有和不带有LIMIT的ORDER BY查询可能以不同的顺序返回行,如本节后面所述。

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

如果多个行在ORDER BY列中具有相同的值,则服务器可以自由以任何顺序返回这些行,并且根据整体执行计划的不同,返回值可以不同。 换句话说,这些行的排序顺序相对于无序列是不确定的。

解决方法

  1. 添加一个唯一的字段,比如id(最优解)
  2. 给排序字段添加索引
mysql> select * from test order by sort,id limit 0, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  1 | 测试1   |    1 |
|  4 | 测试4   |    1 |
+----+---------+------+

mysql> select * from test order by sort,id limit 2, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  5 | 测试5   |    1 |
|  7 | 测试7   |    1 |
+----+---------+------+

mysql> select * from test order by sort,id limit 4, 2;
+----+---------+------+
| id | name    | sort |
+----+---------+------+
|  8 | 测试8   |    1 |
|  9 | 测试9   |    1 |
+----+---------+------+

mysql> select * from test order by sort,id limit 6, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 10 | 测试10   |    1 |
| 12 | 测试12   |    1 |
+----+----------+------+

mysql> select * from test order by sort,id limit 8, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 13 | 测试13   |    1 |
| 14 | 测试14   |    1 |
+----+----------+------+

mysql> select * from test order by sort,id limit 10, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 15 | 测试15   |    1 |
| 16 | 测试16   |    1 |
+----+----------+------+

mysql> select * from test order by sort,id limit 12, 2;
+----+----------+------+
| id | name     | sort |
+----+----------+------+
| 17 | 测试17   |    1 |
| 18 | 测试18   |    1 |
+----+----------+------+

文章作者: Cody_
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Cody_ !
评论
 上一篇
Elasticsearch数据类型 Elasticsearch数据类型
随着版本的改动,小部分数据类型也有细微的改动,就比如string类型新版本就不再支持。我本地的版本是7.9.3文档版本是7.X 1 常见类型1.1 binary - 二进制型二进制值编码为Base64字符串, 不以默认的方式存储, 不能被搜
2020-12-26
下一篇 
Docker中安装Elasticsearch Docker中安装Elasticsearch
Docker给开发人员带来了极大的便利性;舍弃那些安装包和配置环境,拾起docker pull一把梭哈。 本文涉及到的所有版本号都是7.9.3 Docker镜像地址 安装Elasticsearch拉取镜像命令 docker pull ela
2020-12-12
  目录