New interesting features are available in 11G related to statistics collection
- New statistics preferences
- PUBLISH
- INCREMENTAL
- STALE_PERCENT
- Hash based sampling for column statistics
- Multicolumn statistics
- Expression statistics
Setup for examples
create table test_tbl ( id1 number, id2 varchar2(10), id3 varchar2(10) ) partition by range(id1) ( partition p1 values less than(100), partition p2 values less than(200), partition p3 values less than(300), partition p4 values less than(400) ); begin for i in 0..399 loop insert into test_tbl values(i, 'a'||i, 'b'||i); end loop; commit; end; / select count(*) from test_tbl; COUNT(*) ---------- 400
New statistics preferences
There are many default preferences that can be set for collection statistics in previous versions like:
- CASCADE
- DEGREE
- ESTIMATE_PERCENT
- METHOD_OPT
- NO_INVALIDATE
- GRANULARITY
Oracle 11G introduces new preferences that can be set for collection statistics:
- PUBLISH – decides if new statistics are published to dictionary immediately or stored in pending area before.
- INCREMENTAL – used to gather global statistics on partitioned tables in incremental way
- STALE_PERCENT – determines threshold level (percentage of rows modified since last statistics gathering) at which an object is considered to have stale statistics
The parameters can be set using DBMS_STATS
begin dbms_stats.set_table_prefs ( OWNNAME => 'TOMASZ', TABNAME => 'TEST_TBL', PNAME => 'INCREMENTAL', PVALUE => 'TRUE'); end; /
To see statistics preferences check following views
- USER | ALL | DBA_TAB_STAT_PREFS
select * from user_tab_stat_prefs where table_name='TEST_TBL'; TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE ----------- ----------------- -------------------- TEST_TBL INCREMENTAL TRUE
Statistics level preferences that can be set
- GLOBAL – set_global_prefs – sets default preferences for all tables in database
- DATABASE – set_database_prefs – sets default preferences for all tables in database excluding tables owned by Oracle
- SCHEMA – set_schema_prefs – sets default preferences for all tables in a schema
- TABLE – set_table_prefs – sets default preferences for a table
- STATEMENT – gather_*_stats – collect statistics ignoring default preferences
Global preferences can be set easily in Enterprise Manager
Server->Manage Optimizer Statistics->Global Statistics Gathering Options
INCREMENTAL preferences
It’s very nice feature to improve global statistics gathering on partitioned tables. For a partitioned table, the system maintains both the statistics on each partition and the overall global statistics for the table.
Global statistics collection for INCREMENTAL:
- FALSE – global statistics are recollected for all partitions
- TRUE – new option – global statistics are incrementally collected only for modified partitions
GRANULARITY can be specified by package DBMS_STATS as:
- AUTO – determine level statistics based on partitioning (global included)
- GLOBAL – gathers global level statistics
- GLOBAL AND PARTITION – gathers global and partition level statistics
- ALL – gathers global, partition and subpartition level statistics
- PARTITION – gathers partition level statistics
- SUBPARTITION – gathers subpartition level statistics
- DEFAULT obsolete use GLOBAL AND PARTITION
If GRANULARITY includes global level and the table is marked as INCREMENTAL then the global statistics are gathered using the incremental mechanism. Of course statistics for modified partitions are gathered as well.
Example:
Verify that INCREMENTAL preferences are set to TRUE
select * from user_tab_stat_prefs where table_name='TEST_TBL'; TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE ----------- ----------------- -------------------- TEST_TBL INCREMENTAL TRUE
Gather statistics on table TEST_TBL. The stats will be collected using INCREMENTAL method.
begin dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL'); end; /
Verify collected statistics on table, partition and column level
select table_name, last_analyzed from user_tables where table_name='TEST_TBL'; TABLE_NAME LAST_ANALYZED ------------------------------ ------------------- TEST_TBL 16.04.2014 10:00:31 select partition_name, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TBL' order by partition_position; PARTITION_NAME LAST_ANALYZED NUM_ROWS ------------------------------ ------------------- ---------- P1 16.04.2014 10:00:31 100 P2 16.04.2014 10:00:31 100 P3 16.04.2014 10:00:31 100 P4 16.04.2014 10:00:31 100 select column_name, num_distinct, num_nulls from user_tab_col_statistics where table_name='TEST_TBL'; COLUMN_NAME NUM_DISTINCT NUM_NULLS ------------------------------ ------------ ---------- ID3 400 0 ID2 400 0 ID1 400 0
Verify as user SYS statistics were collected using INCREMENTAL method
select o.name, c.name, decode(bitand(h.spare2, 8), 8, 'yes', 'no') incremental from sys.hist_head$ h, sys.obj$ o, sys.col$ c where h.obj#=o.obj# and o.obj# = c.obj# and h.intcol# = c.intcol# and o.name='TEST_TBL'; NAME NAME INCREMENTAL -------------- --------------- ----------- TEST_TBL ID1 yes TEST_TBL ID2 yes TEST_TBL ID3 yes
Now it’s time to modify data in one partition TEST_TBL(P1)
update test_tbl partition(p1) set id2='c'||rownum where rownum < 70; 69 rows updated. commit; committed. insert into test_tbl partition(p1) select level, 'd'||level, 'e'||level from dual connect by level < 100; 99 rows inserted. commit; committed.
Gather statistics once again
begin dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL'); end; /
Verification of recollection of statistics
select table_name, last_analyzed from user_tables where table_name='TEST_TBL'; TABLE_NAME LAST_ANALYZED ------------------------------ ------------------- TEST_TBL 16.04.2014 10:20:51 select partition_name, last_analyzed, num_rows from user_tab_partitions where table_name='TEST_TBL' order by partition_position; PARTITION_NAME LAST_ANALYZED NUM_ROWS ------------------------------ ------------------- ---------- P1 16.04.2014 10:20:51 199 P2 16.04.2014 10:00:31 100 P3 16.04.2014 10:00:31 100 P4 16.04.2014 10:00:31 100 select column_name, num_distinct, num_nulls from user_tab_col_statistics where table_name='TEST_TBL'; COLUMN_NAME NUM_DISTINCT NUM_NULLS ------------------------------ ------------ ---------- ID3 499 0 ID2 499 0 ID1 400 0
NOTE – this new mechanism does not incrementally maintain histograms and density global statistics
STALE_PERCENT preferences
It defines threshold at which statistics for an object are considered as STALE. It’s percentage of rows modified since last statistics gathering.
Example:
Check global STALE_PERCENT settings
select dbms_stats.get_prefs('STALE_PERCENT') stale_percent from dual; STALE_PERCENT -------------- 10
Set dedicated settings for table TEST_TBL
begin dbms_stats.set_table_prefs ('TOMASZ', 'TEST_TBL', 'STALE_PERCENT', 18); end; / select * from user_tab_stat_prefs where table_name='TEST_TBL'; TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE ----------- ----------------- -------------------- TEST_TBL INCREMENTAL TRUE TEST_TBL STALE_PERCENT 18
PUBLISH preferences
In previous release 10g collected statistics are automatically published. In 11g you have new option. PUBLISH preference is used to decide whether to publish collected statistics to the dictionary immediately(still default in 11g) or first store them in a pending area.
PUBLISH settings in 11g:
- TRUE – statistics are automatically published
- FALSE – statistics are stored in pending area
Problems with automatic publishing:
- DBA, developers could not verify statistics before publishing
- statistics could be not consistent not published in the same time – example: for partitioned table first are published table then partitions then indexes statistics or first are published statistics for “master” table then for related “dependent” tables
- statistics collection can fail on a step with part published informations – example: stats collection for a schema can fail in the middle
If PUBLISH is set to FALSE then new statistics are visible from pending area in following views:
- ALL|DBA|USER_TAB_PENDING_STATS
- ALL|DBA|USER_COL_PENDING_STATS
- ALL|DBA|USER_IND_PENDING_STATS
- ALL|DBA|USER_TAB_HISTGRM_PENDING_STATS
Pending statistics can be tested using two methods before publishing:
- export pending statistics and import them to separate system using:
- DBMS_STATS.EXPORT_PENDING_STATS
- DBMS_STATS.IMPORT_TABLE_STATS
- use new session parameter OPTIMIZER_USE_PENDING_STATISTICS set to TRUE to make pending statistics visible by optimizer. As default the parameter is set to FALSE.
Example:
Set PUBLISH to FALSE
begin dbms_stats.set_table_prefs ('TOMASZ', 'TEST_TBL', 'PUBLISH', 'FALSE'); end; / select * from user_tab_stat_prefs where table_name='TEST_TBL'; TABLE_NAME PREFERENCE_NAME PREFERENCE_VALUE ----------- ----------------- -------------------- TEST_TBL INCREMENTAL TRUE TEST_TBL STALE_PERCENT 18 TEST_TBL PUBLISH FALSE
Double data in table TEST_TBL
select count(*) from test_tbl; COUNT(*) ---------- 499 insert into test_tbl select * from test_tbl; commit; select count(*) from test_tbl; COUNT(*) ---------- 998
Explain plan and table stats still shows old information – number of rows 499
select table_name, num_rows from user_tables where table_name='TEST_TBL'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST_TBL 499 explain plan for select * from test_tbl; select * from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS')); Plan hash value: 928296031 ------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------ | 0 | SELECT STATEMENT | | 499 | | 1 | PARTITION RANGE ALL| | 499 | | 2 | TABLE ACCESS FULL | TEST_TBL | 499 | ------------------------------------------------
Gather statistics for TEST_TBL and check them in pending area.
begin dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL'); end; / select table_name, partition_name, num_rows, last_analyzed from user_tab_pending_stats where table_name='TEST_TBL'; TABLE_NAME PARTITION_NAME NUM_ROWS LAST_ANALYZED ----------- --------------- ---------- ------------------- TEST_TBL P1 398 16.04.2014 13:01:57 TEST_TBL P2 200 16.04.2014 13:01:57 TEST_TBL P3 200 16.04.2014 13:01:57 TEST_TBL P4 200 16.04.2014 13:01:57 TEST_TBL 998 16.04.2014 13:01:57 select distinct table_name, last_analyzed from user_col_pending_stats where table_name='TEST_TBL'; TABLE_NAME LAST_ANALYZED ------------------------------ ------------------- TEST_TBL 16.04.2014 13:01:57
Use pending statistics on session level. This time data are read from pending area because it has the latest version of statistics for table TEST_TBL.
alter session set optimizer_use_pending_statistics=true; select * from table(dbms_xplan.display); Plan hash value: 928296031 ------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------ | 0 | SELECT STATEMENT | | 998 | | 1 | PARTITION RANGE ALL| | 998 | | 2 | TABLE ACCESS FULL | TEST_TBL | 998 | ------------------------------------------------
Publish pending statistics and verify it.
begin dbms_stats.publish_pending_stats('TOMASZ','TEST_TBL'); end; / select table_name, num_rows from user_tables where table_name='TEST_TBL'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST_TBL 998 alter session set optimizer_use_pending_statistics=false; select * from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS')); Plan hash value: 928296031 ------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------ | 0 | SELECT STATEMENT | | 998 | | 1 | PARTITION RANGE ALL| | 998 | | 2 | TABLE ACCESS FULL | TEST_TBL | 998 | ------------------------------------------------
Hash based sampling for column statistics
Computing column statistics is the most expensive step in statistic gathering. Oracle has introduced new method to calculate the statistics – approximate counts. It’s much better and accurate option than old row-sampling technique.
To use approximate counts method:
- ESTIMATE_PERCENT must be set to AUTO_SAMPLE_SIZE otherwise previous method is used row-sampling
Multicolumn statistics
In 10g Oracle is computing selectivity of multiple predicates in following way:
- If all columns of conjunctive predicates matches all columns of a concatenated index key then optimizer uses formula 1/NDK where NDK is number of distinct keys in the index.
- For DYNAMIC_SAMPLING set to 4, optimizer uses dynamic sampling to estimate selectivity of complex predicates involving several columns form the same table. It increases parsing, sampling is very small and as result can lead to inaccurate statistics.
- For other cases Oracle multiplies the selectivity of individual predicates to estimate selectivity. It always lead to under-estimation of the selectivity.
To overcome these problems Oracle 11G has introduced possibility to collect/store and use dedicated statistics between two or more columns:
- Number of distinct values
- Number of nulls
- Frequency histograms
- Density
Following function DBMS_STATS.CREATE_EXTENDED_STATS creates virtual hidden column that will store extended statistics to shows correlation between column ID2 and ID3. The function returns virtual column name.
--create extended stats select dbms_stats.create_extended_stats ( ownname => user, tabname => 'TEST_TBL', extension => '(ID2,ID3)' ) stat from dual; STAT ----------------------------------- SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D
Information about extended statistics can be found in views:
- USER|ALL|DBA_STAT_EXTENSIONS
select * from USER_STAT_EXTENSIONS; TABLE_NAME EXTENSION_NAME EXTENSION CREATOR DROPPABLE ----------- ------------------------------ -------------- ------- --------- TEST_TBL SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D ("ID2","ID3") USER YES
Following function DBMS_STATS.SHOW_EXTENDED_STATS_NAME is another way to get virtual name.
--display virtual column name for extended stats select dbms_stats.show_extended_stats_name ( ownname => user, tabname => 'TEST_TBL', extension => '(ID2,ID3)' ) stat from dual; STAT ----------------------------------- SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D
Here is following example how to collect extended statistics
--gather extended statistics begin dbms_stats.gather_table_stats ( ownname => user, tabname => 'TEST_TBL', method_opt => 'for columns (id2,id3) size 3' ); end; / --histogram stats select column_name, endpoint_number from user_tab_histograms where table_name='TEST_TBL' order by endpoint_number; COLUMN_NAME ENDPOINT_NUMBER ------------------------------- --------------- SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D 1 SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D 2 SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D 3
Procedure DBMS_STATS.DROP_EXTENDED_STATS is used to drop extended statistics
-- Drop the extended statistics begin dbms_stats.drop_extended_stats ( ownname => user, tabname => 'TEST_TBL', extension => '(ID2,ID3)' ); end; /
Extended statistics can be created implicitly during collection of statistics
--create extended statistics implicit begin dbms_stats.gather_table_stats ( ownname => user, tabname => 'TEST_TBL', method_opt => 'for columns (id1,id3) size 3' ); end; / select column_name, endpoint_number from user_tab_histograms where table_name='TEST_TBL' order by endpoint_number; COLUMN_NAME ENDPOINT_NUMBER ------------------------------- --------------- SYS_STU7RX03D858C$2W$0LDWVEY24 1 SYS_STU7RX03D858C$2W$0LDWVEY24 2 SYS_STU7RX03D858C$2W$0LDWVEY24 3
Expression statistics
Oracle has got always problems with determining correct selectivity for predicates like
function(column) = constant
Optimizer assumes static selectivity value of 1 percent
begin dbms_stats.gather_table_stats ( ownname => user, tabname => 'TEST_TBL' ); end; / select count(*) from test_tbl; COUNT(*) ---------- 400
As you can see below optimizer estimation for upper(id2)=’b2′ gives 4 and it’s 1% of 400 rows 🙂
explain plan for select * from test_tbl where upper(id3)='B2'; select * from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS')); Plan hash value: 928296031 ------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------ | 0 | SELECT STATEMENT | | 4 | | 1 | PARTITION RANGE ALL| | 4 | | 2 | TABLE ACCESS FULL | TEST_TBL | 4 | ------------------------------------------------
To make optimizer estimation more precise you can create/collect extended statistics
--implicit creation of extended statistics and collect statistics begin dbms_stats.gather_table_stats ( ownname => user, tabname => 'TEST_TBL', method_opt => 'for columns (upper(id2)) size 3' ); end; / explain plan for select * from test_tbl where upper(id3)='B2'; select * from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS')); Plan hash value: 928296031 ------------------------------------------------ | Id | Operation | Name | Rows | ------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1 | PARTITION RANGE ALL| | 1 | | 2 | TABLE ACCESS FULL | TEST_TBL | 1 | ------------------------------------------------
Have a fun 🙂
Tomasz
i want brief explanation about why we need to collect gather stats ?
please forward to my mail it will be helpful
As brief as possible – without statistics Oracle optimizer will not work efficiently because statistics are main information to generate proper explain plans.
If explain plans are wrong – queries are slow.
Regards
Tomasz
Thank you . It helped me
is there any table i can query the availabe prefs , let’s say i want to check the availabel prefs in which i can change, when i query dba_tab_stat_prefs there are now rows selected for default values.
thank you.
what is estatimate_percent value by default???
and what should be the 10% and 20% will works???