sql-server – 使用过滤的统计信息的情况

发布时间：2020-12-12 16:51:53 所属栏目：MsSql教程来源：网络整理

导读：我在下面的链接中经过过滤统计. http://blogs.msdn.com/b/psssql/archive/2010/09/28/case-of-using-filtered-statistics.aspx 数据偏重,一个区域有0行,其余的都来自不同的区域. 以下是重现问题的整个代码 create table Region(id int,name nvarchar(100)) go

我在下面的链接中经过过滤统计.

http://blogs.msdn.com/b/psssql/archive/2010/09/28/case-of-using-filtered-statistics.aspx

数据偏重,一个区域有0行,其余的都来自不同的区域.
以下是重现问题的整个代码

create table Region(id int,name nvarchar(100)) 
go 
create table Sales(id int,detail int) 
go 
create clustered index d1 on Region(id) 
go 
create index ix_Region_name on Region(name) 
go 
create statistics ix_Region_id_name on Region(id,name) 
go 
create clustered index ix_Sales_id_detail on Sales(id,detail) 
go

-- only two values in this table as lookup or dim table 
insert Region values(0,'Dallas') 
insert Region values(1,'New York') 
go

set nocount on 
-- Sales is skewed 
insert Sales values(0,0) 
declare @i int 
set @i = 1 
while @i <= 1000 begin 
insert Sales  values (1,@i) 
set @i = @i + 1 
end 
go

update statistics Region with fullscan 
update statistics Sales with fullscan 
go

set statistics profile on 
go 
--note that this query will over estimate 
-- it estimate there will be 500.5 rows 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) 
--this query will under estimate 
-- this query will also estimate 500.5 rows in fact 1000 rows returned 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go

set statistics profile off 
go

create statistics Region_stats_id on Region (id) 
where name = 'Dallas' 
go 
create statistics  Region_stats_id2 on Region (id) 
where name = 'New York' 
go

set statistics profile on 
go 
--now the estimate becomes accurate (1 row) because 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)

--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go

set statistics profile off

我的问题是我们有两个表上可用的统计信息

sp_helpstats 'region','all'
sp_helpstats 'sales','all'

表格区域：

statistics_name   statistics_keys
d1                    id
ix_Region_id_name     id,name
ix_Region_name        name

桌面销售：

statistics_name    statistics_keys
ix_Sales_id_detail     id,detail

为什么下面的查询出现错误

select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)

--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile)

2.当我根据作者创建过滤的统计信息时,我可以正确地看到估计,但为什么我们需要创建过滤的统计信息,我怎么说我需要筛选统计信息,因为即使我创建简单的统计信息,我得到相同的结果.

最好我遇到了这么远
奇怪的tripp倾斜统计视频
技术统计白皮书

但是仍然无法理解为什么筛选的统计数据在这里有所不同

提前致谢.
更新：7/4

马丁和詹姆斯之后回答问题：

有什么办法可以避免数据偏移
除了kimberely脚本之外,还有一种方法可以计算一个值的行数.

2.您有遇到任何与您的经验数据偏差有关的问题.我认为这取决于大桌子.但是我正在寻找一些详细的答案

3.我们必须花费SQL的成本来扫描表格,以及一些阻塞,有时候会在触发更新stats的时候出现一个查询.在维护统计信息时,您会看到除此之外的开销.

原因是我正在考虑根据基于DTA输入的几个条件创建文件统计.

再次感谢

解决方法

我会假设这是为什么会发生.您得到相同的估计(500.5)行,因为SQL Server没有统计信息可以告诉哪些ID是与哪个区域相关的.统计ix_Region_id_name有两个字段,但是由于直方图仅存在于第一列,因此在销售表中有多少行的估计确实不会有帮助.

如果运行dbcc show_statistics(‘Region’,’ix_Region_id_name’),结果将是：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1
1              0            1         0                     1

所以这告诉每个ID有1行,但没有链接到名称.

但是当您创建统计信息Region_stats_id(对于达拉斯),dbcc show_statistics(‘Region’,’Region_stats_id’)将显示：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1

所以SQL Server知道只有1行,它的ID为0.

类似Region_stats_id2：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
1              0            1         0                     1

销售中的行数在ix_Sales_id_detail中将有助于确定每个ID的行数：

RANGE_HI_KEY   RANGE_ROWS   EQ_ROWS   DISTINCT_RANGE_ROWS   AVG_RANGE_ROWS
0              0            1         0                     1
1              0            1000      0                     1

信息：现在是由@MartijnPieters删除的答案的副本,因为这是我打算回答的问题 – 我似乎无法对删除的答案做任何事情.我不小心把这首先写到了今天的TheGameiswar的其他统计问题上,但是我已经删除了自己.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!