加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

Oracle 统计信息收集

发布时间:2020-12-12 13:27:51 所属栏目:百科 来源:网络整理
导读:官网网址参考: https://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_stats.htm#CIHBIEII https://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_4005.htm#i2150533 https://asktom.oracle.com/pls/asktom/f?p=100:11:::::p11_question

ownname

Schema to analyze (NULL?means current schema)

estimate_percent

Percentage of rows to estimate (NULL?means compute): The valid range is [0.000001,100]. Use the constant?DBMS_STATS.AUTO_SAMPLE_SIZEto have Oracle determine the appropriate sample size for good statistics. This is the default.The default value can be changed using the?SET_PARAM Procedure.

block_sample

Whether or not to use random block sampling instead of random row sampling. Random block sampling is more efficient,but if the data is not randomly distributed on disk,then the sample values may be somewhat correlated. Only pertinent when doing an estimate statistics.

method_opt

Accepts:

  • FOR ALL [INDEXED | HIDDEN] COLUMNS?[size_clause]

  • FOR COLUMNS?[size clause] column|attribute [size_clause] [,column|attribute [size_clause]...]

size_clause is defined as size_clause := SIZE {integer | REPEAT | AUTO | SKEWONLY}


- integer?: Number of histogram buckets. Must be in the range [1,254].
- REPEAT?: Collects histograms only on the columns that already have histograms.
- AUTO?: Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
- SKEWONLY?: Oracle determines the columns to collect histograms based on the data distribution of the columns.

The default is?FOR ALL COLUMNS SIZE AUTO.The default value can be changed using the?SET_PARAM Procedure.

degree

Degree of parallelism. The default for?degree?is?NULL. The default value can be changed using the?SET_PARAM Procedure.?NULL?means use the table default value specified by the?DEGREE?clause in the?CREATE TABLE?or?ALTER TABLE?statement. Use the constant?DBMS_STATS.DEFAULT_DEGREE?to specify the default value based on the initialization parameters.The?AUTO_DEGREE?value determines the degree of parallelism automatically. This is either 1 (serial execution) or DEFAULT_DEGREE?(the system default value based on number of CPUs and initialization parameters) according to size of the object.

granularity

Granularity of statistics to collect (only pertinent if the table is partitioned).

‘ALL‘?- gathers all (subpartition,partition,and global) statistics

‘AUTO‘- determines the granularity based on the partitioning type. This is the default value.

‘DEFAULT‘?- gathers global and partition-level statistics. This option is obsolete,and while currently supported,it is included in the documentation for legacy reasons only. You should use the ‘GLOBAL AND PARTITION‘ for this functionality. Note that the default value is now ‘AUTO‘.

‘GLOBAL‘?- gathers global statistics

GLOBAL AND PARTITION‘ - gathers the global and partition level statistics. No subpartition level statistics are gathered even if it is a composite partitioned object.

‘PARTITION?‘- gathers partition-level statistics

‘SUBPARTITION‘?- gathers subpartition-level statistics.

cascade

Gather statistics on the indexes as well.Index statistics gathering is not parallelized. Using this option is equivalent to running the?GATHER_INDEX_STATS Procedure?on each of the indexes in the schema in addition to gathering table and column statistics. Use the constant?DBMS_STATS.AUTO_CASCADE?to have Oracle determine whether index statistics to be collected or not. This is the default. The default value can be changed using the?SET_PARAM Procedure.

stattab

User statistics table identifier describing where to save the current statistics

statid

Identifier (optional) to associate with these statistics within?stattab

options

Further specification of which objects to gather statistics for:

GATHER: Gathers statistics on all objects in the schema.

GATHER?AUTO: Gathers all necessary statistics automatically. Oracle implicitly determines which objects need new statistics,and determines how to gather those statistics. When?GATHER AUTO?is specified,the only additional valid parameters are?ownname,?stattab,?statid,?objlistand?statown; all other parameter settings are ignored. Returns a list of processed objects.

GATHER?STALE: Gathers statistics on stale objects as determined by looking at the *_tab_modifications?views. Also,return a list of objects found to be stale.

GATHER?EMPTY: Gathers statistics on objects which currently have no statistics. also,return a list of objects found to have no statistics.

LIST AUTO: Returns a list of objects to be processed with?GATHER AUTO.

LIST?STALE: Returns list of stale objects as determined by looking at the *_tab_modifications?views.

LIST?EMPTY: Returns list of objects which currently have no statistics.

objlist

List of objects found to be stale or empty

statown

Schema containing?stattab?(if different than?ownname)

no_invalidate

Does not invalidate the dependent cursors if set to?TRUE. The procedure invalidates the dependent cursors immediately if set to?FALSE. Use?DBMS_STATS.AUTO_INVALIDATE. to have Oracle decide when to invalidate dependent cursors. This is the default. The default can be changed using the?SET_PARAM Procedure.

force

Gather statistics on objects even if they are locked

一些实际用例:

EXEC DBMS_STATS.GATHER_SCHEMA_STATS(‘SCOTT‘,estimate_percent=>80,method_opt=>‘FOR ALL COLUMNS SIZE AUTO‘,degree=>4,cascade=>TRUE);
EXEC DBMS_STATS.GATHER_TABLE_STATS(‘SCOTT‘,‘EMP‘,cascade=>TRUE);
EXEC DBMS_STATS.GATHER_INDEX_STATS(‘SCOTT‘,‘PK_EMP‘,degree=>4);

一些特别提示:

  • 虽然method_opt的description中并未提及for table这个选项,但其实这个选项也是有效的,同analyze一样,这个参数在不同版本的表现也是不一样的,具体差异也可以轻易的使用本文中提供的SQL观察到。
  • Oracle有auto optimizer stats collection的自动维护任务定期的收集统计信息,这些任务是默认开启的,但当数据库变的很大之后就会引发严重的性能问题,建议只保留周末的一个窗口,其他窗口全部关闭。
  • 直方图统计信息并不是那么的重要,只有在遇到对倾斜列(skew)的查询很频繁时才有用,这种情况并不常见。
  • 不再推荐使用analyze来收集统计信息,除非是做测试或者表很小,dbms_stats的并行度选项能加快收集速度。
  • 对大表采样收集统计信息时一般采样比例不需要很大,通常10%到30%即可,如果业务可以提供维护窗口,那100%也没什么大不了。
  • 如果要详细了解统计信息收集了什么内容,可以参考本文提供的网址链接和视图。

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

官网网址参考:

https://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_stats.htm#CIHBIEII

https://docs.oracle.com/cd/B12037_01/server.101/b10759/statements_4005.htm#i2150533

https://asktom.oracle.com/pls/asktom/f?p=100:11:::::p11_question_id:5792247321358

https://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL366

查询表上一次收集统计信息的时间:

select owner,table_name,last_analyzed from dba_tables where owner=‘SCOTT‘;

统计信息涉及的视图:

Column statistics appear in the data dictionary views?USER_TAB_COLUMNS,?ALL_TAB_COLUMNS, and?DBA_TAB_COLUMNS. Histograms appear in the data dictionary views?USER_TAB_HISTOGRAMS,?DBA_TAB_HISTOGRAMS,and?ALL_TAB_HISTOGRAMS;?USER_PART_HISTOGRAMS,?DBA_PART_HISTOGRAMS, and?ALL_PART_HISTOGRAMS; and?USER_SUBPART_HISTOGRAMS,?DBA_SUBPART_HISTOGRAMS, and?ALL_SUBPART_HISTOGRAMS.

收集统计信息主要有2种方法:

1. analyze

analyze可以用来收集表,索引,列以及系统的统计信息和直方图,以下为一些典型用法:

analyze table scott.emp compute statistics; --收集所有的统计信息和直方图信息,包括表、列、索引。
analyze table scott.emp compute statistics for table; --收集emp表的统计信息,不含列、索引统计信息和直方图。
analyze table scott.emp compute statistics for all columns;  --收集所有列的统计信息和直方图(超大表较耗资源,因为只要列中有非空值,那么就会收集这个列的统计信息和直方图)。
analyze table scott.emp compute statistics for all indexed columns;  --收集所有索引列的统计信息和直方图。
analyze table scott.emp compute statistics for all indexes; --收集所有索引统计信息,不含列的统计信息和直方图。
analyze table scott.emp compute statistics for columns 列1,列2; --收集2个列的统计信息和直方图。
analyze index idx_ename delete statistics; --删除索引idx_ename的统计信息。
ananlyze table scott.emp delete statistics; --删除表t1所有的表,列,索引的统计信息和列直方图。
analyze table scott.emp estimate statistics sample 15 percent for table; --收集emp表的统计信息,以估算模式采样比例为15%进行收集,不含列、索引统计信息和直方图。

从语法可以看出,只有指定列统计信息收集时,才会收集相关列的直方图,此外收集直方图时for子句还可以加size子句,size的取值范围是1-254,默认值是75,表示直方图的buckets的最大数目。而dbms_stats包的size选择则有:数字|auto|repeat|skewonly选项,但analyze的size只能是数字。

?

关于直方图:

A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into "buckets," as you might sort coins into buckets.

从官网解释(参考第四个网址)来看,直方图就是一种特殊的列统计信息,这也与我们上边的推断相符,只有列才有直方图。

这里贴一个Tom Kyte用于查看analyze后统计信息的SQL:(已稍作改进,仅示例,这种格式的SQL不推荐,原SQL较简单参考第三个网址)

select t.num_rows as num_rows_in_table,i.index_name,i.num_rows as num_rows_in_index,c.cnt
from (select num_rows from user_tables where table_name =‘EMP‘) t,(select index_name,num_rows from user_indexes where table_name = ‘EMP‘) i,(select count(distinct column_name) cnt from user_tab_histograms where table_name = ‘EMP‘ ) c;

需要注意的一点是for table选项在某些版本中并不只收集表统计信息,而是连列和索引的统计信息一块收集了,至于具体哪些版本的表现不同这里不做深究,使用上述SQL可以轻易的测试出你的analyze和dbms_stats语句到底收集了什么统计信息和直方图。

?

2. 调用dbms_stats包

dbms_stats于analyze的区别是:

analyze收集系统内部对象会报错,而dbms_stats不会

analyze不能正确的收集分区表的统计信息而dbms_stats可以通过指定粒度来实现(granularity)。

analyze不能并行的收集统计信息,而dbms_stats可以(可以加上degree=>4来实现并行度为4的收集)。

Oracle推荐使用dbms_stats来收集统计信息,analyze将会被逐渐抛弃。

dbms_stats中负责收集统计信息的是以下几个存储过程:

GATHER_DATABASE_STATS
    --This procedure gathers statistics for all objects in the database.
GATHER_DICTIONARY_STATS 
    --This procedure gathers statistics for dictionary schemas ‘SYS‘,‘SYSTEM‘ and schemas of RDBMS components.
GATHER_FIXED_OBJECTS_STATS
    --This procedure gathers statistics for all fixed objects (dynamic performance tables).
GATHER_INDEX_STATS 
    --This procedure gathers index statistics. It attempts to parallelize as much of the work as possible. Restrictions are described in the individual parameters. This operation will not parallelize with certain types of indexes,including cluster indexes,domain indexes,and bitmap join indexes. The granularity and no_invalidate arguments are not relevant to these types of indexes.
GATHER_SCHEMA_STATS 
    --This procedure gathers statistics for all objects in a schema.
GATHER_SYSTEM_STATS
    --This procedure gathers system statistics.
GATHER_TABLE_STATS
    --This procedure gathers table and column (and index) statistics. It attempts to parallelize as much of the work as possible,but there are some restrictions as described in the individual parameters.
 

三个常用Procedure用法详解:GATHER_SCHEMA_STATS(两种用法)、GATHER_TABLE_STATS、GATHER_INDEX_STATS

PROCEDURE GATHER_SCHEMA_STATS
 Argument Name          Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 OWNNAME            VARCHAR2        IN
 ESTIMATE_PERCENT   NUMBER          IN     DEFAULT
 BLOCK_SAMPLE       BOOLEAN         IN     DEFAULT
 METHOD_OPT         VARCHAR2        IN     DEFAULT
 DEGREE             NUMBER          IN     DEFAULT
 GRANULARITY        VARCHAR2        IN     DEFAULT
 CASCADE            BOOLEAN         IN     DEFAULT
 STATTAB            VARCHAR2        IN     DEFAULT
 STATID             VARCHAR2        IN     DEFAULT
 OPTIONS            VARCHAR2        IN     DEFAULT
 OBJLIST            OBJECTTAB       OUT
 STATOWN            VARCHAR2        IN     DEFAULT
 NO_INVALIDATE      BOOLEAN         IN     DEFAULT
 GATHER_TEMP        BOOLEAN         IN     DEFAULT
 GATHER_FIXED       BOOLEAN         IN     DEFAULT
 STATTYPE           VARCHAR2        IN     DEFAULT
 FORCE              BOOLEAN         IN     DEFAULT
 OBJ_FILTER_LIST    OBJECTTAB       IN     DEFAULT

PROCEDURE GATHER_SCHEMA_STATS
 Argument Name          Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 OWNNAME            VARCHAR2        IN
 ESTIMATE_PERCENT   NUMBER          IN     DEFAULT
 BLOCK_SAMPLE       BOOLEAN         IN     DEFAULT
 METHOD_OPT         VARCHAR2        IN     DEFAULT
 DEGREE             NUMBER          IN     DEFAULT
 GRANULARITY        VARCHAR2        IN     DEFAULT
 CASCADE            BOOLEAN         IN     DEFAULT
 STATTAB            VARCHAR2        IN     DEFAULT
 STATID             VARCHAR2        IN     DEFAULT
 OPTIONS            VARCHAR2        IN     DEFAULT
 STATOWN            VARCHAR2        IN     DEFAULT
 NO_INVALIDATE      BOOLEAN         IN     DEFAULT
 GATHER_TEMP        BOOLEAN         IN     DEFAULT
 GATHER_FIXED       BOOLEAN         IN     DEFAULT
 STATTYPE           VARCHAR2        IN     DEFAULT
 FORCE              BOOLEAN         IN     DEFAULT
 OBJ_FILTER_LIST    OBJECTTAB       IN     DEFAULT
 
PROCEDURE GATHER_TABLE_STATS
 Argument Name          Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 OWNNAME            VARCHAR2        IN
 TABNAME            VARCHAR2        IN
 PARTNAME           VARCHAR2        IN     DEFAULT
 ESTIMATE_PERCENT   NUMBER          IN     DEFAULT
 BLOCK_SAMPLE       BOOLEAN         IN     DEFAULT
 METHOD_OPT         VARCHAR2        IN     DEFAULT
 DEGREE             NUMBER          IN     DEFAULT
 GRANULARITY        VARCHAR2        IN     DEFAULT
 CASCADE            BOOLEAN         IN     DEFAULT
 STATTAB            VARCHAR2        IN     DEFAULT
 STATID             VARCHAR2        IN     DEFAULT
 STATOWN            VARCHAR2        IN     DEFAULT
 NO_INVALIDATE      BOOLEAN         IN     DEFAULT
 STATTYPE           VARCHAR2        IN     DEFAULT
 FORCE              BOOLEAN         IN     DEFAULT
 
PROCEDURE GATHER_INDEX_STATS
 Argument Name          Type            In/Out Default?
 ------------------------------ ----------------------- ------ --------
 OWNNAME            VARCHAR2        IN
 INDNAME            VARCHAR2        IN
 PARTNAME           VARCHAR2        IN     DEFAULT
 ESTIMATE_PERCENT   NUMBER          IN     DEFAULT
 STATTAB            VARCHAR2        IN     DEFAULT
 STATID             VARCHAR2        IN     DEFAULT
 STATOWN            VARCHAR2        IN     DEFAULT
 DEGREE             NUMBER          IN     DEFAULT
 GRANULARITY        VARCHAR2        IN     DEFAULT
 NO_INVALIDATE      BOOLEAN         IN     DEFAULT
 STATTYPE           VARCHAR2        IN     DEFAULT
 FORCE              BOOLEAN         IN     DEFAULT

GATHER_SCHEMA_STATS参数详解:(其他存储过程的参数解释参见官方页面,很多参数description都是通用的)

Parameter Description
    推荐文章
      热点阅读