Statistics improvements Oracle Database 11G release 2 (11.2)

New interesting features are available in 11G related to statistics collection

  • New statistics preferences
    • PUBLISH
    • INCREMENTAL
    • STALE_PERCENT
  • Hash based sampling for column statistics
  • Multicolumn statistics
  • Expression statistics

Setup for examples

create table test_tbl
( 
  id1 number,
  id2 varchar2(10),
  id3 varchar2(10)
)
partition by range(id1)
(
  partition p1 values less than(100),
  partition p2 values less than(200),
  partition p3 values less than(300),
  partition p4 values less than(400)
);
begin
  for i in 0..399
  loop
    insert into test_tbl values(i, 'a'||i, 'b'||i);
  end loop;

  commit;
end;
/

select count(*)
from test_tbl;

  COUNT(*)
----------
       400

New statistics preferences

There are many default preferences that can be set for collection statistics in previous versions like:

  • CASCADE
  • DEGREE
  • ESTIMATE_PERCENT
  • METHOD_OPT
  • NO_INVALIDATE
  • GRANULARITY

Oracle 11G introduces new preferences that can be set for collection statistics:

  • PUBLISH – decides if new statistics are published to dictionary immediately or stored in pending area before.
  • INCREMENTAL – used to gather global statistics on partitioned tables in incremental way
  • STALE_PERCENT – determines threshold level (percentage of rows modified since last statistics gathering) at which an object is considered to have stale statistics

The parameters can be set using DBMS_STATS

begin
  dbms_stats.set_table_prefs
  ( OWNNAME => 'TOMASZ',
    TABNAME => 'TEST_TBL',
    PNAME   => 'INCREMENTAL',
    PVALUE  => 'TRUE');
end;
/

To see statistics preferences check following views

  • USER | ALL | DBA_TAB_STAT_PREFS
select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE

Statistics level preferences that can be set

  • GLOBAL – set_global_prefs – sets default preferences for all tables in database
  • DATABASE – set_database_prefs – sets default preferences for all tables in database excluding tables owned by Oracle
  • SCHEMA – set_schema_prefs – sets default preferences for all tables in a schema
  • TABLE – set_table_prefs – sets default preferences for a table
  • STATEMENT – gather_*_stats – collect statistics ignoring default preferences

oracle_11g_stats_1

Global preferences can be set easily in Enterprise Manager

Server->Manage Optimizer Statistics->Global Statistics Gathering Options

oracle_11g_stats_2

INCREMENTAL preferences

It’s very nice feature to improve global statistics gathering on partitioned tables. For a partitioned table, the system maintains both the statistics on each partition and the overall global statistics for the table.

Global statistics collection for INCREMENTAL:

  • FALSE – global statistics are recollected for all partitions
  • TRUE – new option – global statistics are incrementally collected only for modified partitions

GRANULARITY can be specified by package DBMS_STATS as:

  • AUTO – determine level statistics based on partitioning (global included)
  • GLOBAL – gathers global level statistics
  • GLOBAL AND PARTITION – gathers global and partition level statistics
  • ALL – gathers global, partition and subpartition level statistics
  • PARTITION – gathers partition level statistics
  • SUBPARTITION – gathers subpartition level statistics
  • DEFAULT obsolete use GLOBAL AND PARTITION

If GRANULARITY includes global level and the table is marked as INCREMENTAL then the global statistics are gathered using the incremental mechanism. Of course statistics for modified partitions are gathered as well.

Example:

Verify that INCREMENTAL preferences are set to TRUE

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE

Gather statistics on table TEST_TBL. The stats will be collected using INCREMENTAL method.

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

Verify collected statistics on table, partition and column level

select table_name, last_analyzed
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 10:00:31

select partition_name, last_analyzed, num_rows
from user_tab_partitions
where table_name='TEST_TBL'
order by partition_position;

PARTITION_NAME                 LAST_ANALYZED         NUM_ROWS
------------------------------ ------------------- ----------
P1                             16.04.2014 10:00:31        100 
P2                             16.04.2014 10:00:31        100 
P3                             16.04.2014 10:00:31        100 
P4                             16.04.2014 10:00:31        100

select column_name, num_distinct, num_nulls
from user_tab_col_statistics
where table_name='TEST_TBL';

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
ID3                                     400          0 
ID2                                     400          0 
ID1                                     400          0

Verify as user SYS statistics were collected using INCREMENTAL method

select 
  o.name, 
  c.name, 
  decode(bitand(h.spare2, 8), 8, 'yes', 'no') incremental
from sys.hist_head$ h, sys.obj$ o, sys.col$ c
where h.obj#=o.obj#
  and o.obj# = c.obj#
  and h.intcol# = c.intcol#
  and o.name='TEST_TBL';

NAME           NAME            INCREMENTAL
-------------- --------------- -----------
TEST_TBL       ID1             yes         
TEST_TBL       ID2             yes         
TEST_TBL       ID3             yes

Now it’s time to modify data in one partition TEST_TBL(P1)

update test_tbl partition(p1)
set id2='c'||rownum
where rownum < 70;

69 rows updated.

commit;

committed.

insert into test_tbl partition(p1)
select level, 'd'||level, 'e'||level from dual
connect by level < 100;

99 rows inserted.

commit;

committed.

Gather statistics once again

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

Verification of recollection of statistics

select table_name, last_analyzed
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 10:20:51

select partition_name, last_analyzed, num_rows
from user_tab_partitions
where table_name='TEST_TBL'
order by partition_position;

PARTITION_NAME                 LAST_ANALYZED         NUM_ROWS
------------------------------ ------------------- ----------
P1                             16.04.2014 10:20:51        199 
P2                             16.04.2014 10:00:31        100 
P3                             16.04.2014 10:00:31        100 
P4                             16.04.2014 10:00:31        100

select column_name, num_distinct, num_nulls
from user_tab_col_statistics
where table_name='TEST_TBL'; 

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
ID3                                     499          0 
ID2                                     499          0 
ID1                                     400          0

NOTE – this new mechanism does not incrementally maintain histograms and density global statistics

STALE_PERCENT preferences

It defines threshold at which statistics for an object are considered as STALE. It’s percentage of rows modified since last statistics gathering.

Example:

Check global STALE_PERCENT settings

select dbms_stats.get_prefs('STALE_PERCENT') stale_percent 
from dual;

STALE_PERCENT 
--------------
10

Set dedicated settings for table TEST_TBL

begin
  dbms_stats.set_table_prefs
  ('TOMASZ', 'TEST_TBL', 'STALE_PERCENT', 18);
end;
/

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE
TEST_TBL    STALE_PERCENT     18

PUBLISH preferences

In previous release 10g collected statistics are automatically published. In 11g you have new option. PUBLISH preference is used to decide whether to publish collected statistics to the dictionary immediately(still default in 11g) or first store them in a pending area.

PUBLISH settings in 11g:

  • TRUE – statistics are automatically published
  • FALSE – statistics are stored in pending area

Problems with automatic publishing:

  • DBA, developers could not verify statistics before publishing
  • statistics could be not consistent  not published in the same time – example: for partitioned table first are published table then partitions then indexes statistics or first are published statistics for “master” table then for related “dependent” tables
  • statistics collection can fail on a step with part published informations – example: stats collection for a schema can fail in the middle

If PUBLISH is set to FALSE then new statistics are visible from pending area in following views:

  • ALL|DBA|USER_TAB_PENDING_STATS
  • ALL|DBA|USER_COL_PENDING_STATS
  • ALL|DBA|USER_IND_PENDING_STATS
  • ALL|DBA|USER_TAB_HISTGRM_PENDING_STATS

Pending statistics can be tested using two methods before publishing:

  • export pending statistics and import them to separate system using:
    • DBMS_STATS.EXPORT_PENDING_STATS
    • DBMS_STATS.IMPORT_TABLE_STATS
  • use new session parameter OPTIMIZER_USE_PENDING_STATISTICS set to TRUE to make pending statistics visible by optimizer. As default the parameter is set to FALSE.

Example:

Set PUBLISH to FALSE

begin
  dbms_stats.set_table_prefs
  ('TOMASZ', 'TEST_TBL', 'PUBLISH', 'FALSE');
end;
/

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE
TEST_TBL    STALE_PERCENT     18
TEST_TBL    PUBLISH           FALSE

Double data in table TEST_TBL

select count(*) from test_tbl;

  COUNT(*)
----------
       499

insert into test_tbl
select * from test_tbl;

commit;

select count(*) from test_tbl;

  COUNT(*)
----------
       998

Explain plan and table stats still shows old information – number of rows 499

select table_name, num_rows
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
TEST_TBL                              499

explain plan for
select * from test_tbl;

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   499 |
|   1 |  PARTITION RANGE ALL|          |   499 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   499 |
------------------------------------------------

Gather statistics for TEST_TBL and check them in pending area.

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

select table_name, partition_name, num_rows, last_analyzed
from user_tab_pending_stats
where table_name='TEST_TBL';

TABLE_NAME  PARTITION_NAME    NUM_ROWS LAST_ANALYZED     
----------- --------------- ---------- -------------------
TEST_TBL    P1                     398 16.04.2014 13:01:57 
TEST_TBL    P2                     200 16.04.2014 13:01:57 
TEST_TBL    P3                     200 16.04.2014 13:01:57 
TEST_TBL    P4                     200 16.04.2014 13:01:57 
TEST_TBL                           998 16.04.2014 13:01:57 

select distinct table_name, last_analyzed 
from user_col_pending_stats
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 13:01:57

Use pending statistics on session level. This time data are read from pending area because it has the latest version of statistics for table TEST_TBL.

alter session set optimizer_use_pending_statistics=true;

select * 
from table(dbms_xplan.display);

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   998 |
|   1 |  PARTITION RANGE ALL|          |   998 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   998 |
------------------------------------------------

Publish pending statistics and verify it.

begin
  dbms_stats.publish_pending_stats('TOMASZ','TEST_TBL');
end;
/

select table_name, num_rows
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
TEST_TBL                              998

alter session set optimizer_use_pending_statistics=false;

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   998 |
|   1 |  PARTITION RANGE ALL|          |   998 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   998 |
------------------------------------------------

Hash based sampling for column statistics

Computing column statistics  is the most expensive step in statistic gathering. Oracle has introduced new method to calculate the statistics – approximate counts. It’s much better and accurate option than old row-sampling technique.

To use approximate counts method:

  • ESTIMATE_PERCENT must be set to AUTO_SAMPLE_SIZE otherwise previous method is used row-sampling

Multicolumn statistics

In 10g Oracle is computing selectivity of multiple predicates in following way:

  • If all columns of conjunctive predicates matches all columns of a concatenated index key then optimizer uses formula 1/NDK where NDK is number of distinct keys in the index.
  • For DYNAMIC_SAMPLING set to 4, optimizer uses dynamic sampling to estimate selectivity of complex predicates involving several columns form the same table. It increases parsing, sampling is very small and as result can lead to inaccurate statistics.
  • For other cases Oracle multiplies the selectivity of individual predicates to estimate selectivity. It always lead to under-estimation of the selectivity.

To overcome these problems Oracle 11G has introduced possibility to collect/store and use dedicated statistics between two or more columns:

  • Number of distinct values
  • Number of nulls
  • Frequency histograms
  • Density

Following function DBMS_STATS.CREATE_EXTENDED_STATS creates virtual hidden column that will store extended statistics to shows correlation between column ID2 and ID3. The function returns virtual column name.

--create extended stats
select
  dbms_stats.create_extended_stats
  (
    ownname   => user, 
    tabname   => 'TEST_TBL', 
    extension => '(ID2,ID3)'
  ) stat
from dual;

STAT                               
-----------------------------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D

Information about extended statistics can be found in views:

  • USER|ALL|DBA_STAT_EXTENSIONS
select * from USER_STAT_EXTENSIONS;

TABLE_NAME  EXTENSION_NAME                 EXTENSION      CREATOR DROPPABLE
----------- ------------------------------ -------------- ------- ---------
TEST_TBL    SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D ("ID2","ID3")  USER    YES

Following function DBMS_STATS.SHOW_EXTENDED_STATS_NAME is another way to get virtual name.

--display virtual column name for extended stats
select
  dbms_stats.show_extended_stats_name
  ( 
    ownname   => user, 
    tabname   => 'TEST_TBL', 
    extension => '(ID2,ID3)'
  ) stat
from dual;

STAT                               
-----------------------------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D

Here is following example how to collect extended statistics

--gather extended statistics
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (id2,id3) size 3'
  );
end;
/

--histogram stats
select column_name, endpoint_number 
from user_tab_histograms 
where table_name='TEST_TBL'
order by endpoint_number;

COLUMN_NAME                     ENDPOINT_NUMBER
------------------------------- ---------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                1 
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                2 
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                3

Procedure  DBMS_STATS.DROP_EXTENDED_STATS is used to drop extended statistics

-- Drop the extended statistics
begin
  dbms_stats.drop_extended_stats
  (
    ownname   => user,
    tabname   => 'TEST_TBL',
    extension => '(ID2,ID3)'
  );
end;
/

Extended statistics can be created implicitly during collection of statistics

--create extended statistics implicit
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (id1,id3) size 3'
  );
end;
/

select column_name, endpoint_number 
from user_tab_histograms 
where table_name='TEST_TBL'
order by endpoint_number;

COLUMN_NAME                     ENDPOINT_NUMBER
------------------------------- ---------------
SYS_STU7RX03D858C$2W$0LDWVEY24                1
SYS_STU7RX03D858C$2W$0LDWVEY24                2
SYS_STU7RX03D858C$2W$0LDWVEY24                3

Expression statistics

Oracle has got always problems with determining correct selectivity for predicates like

function(column) = constant

Optimizer assumes static selectivity value of 1 percent

begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL'
  );
end;
/

select count(*) from test_tbl;
  COUNT(*)
----------
       400

As you can see below optimizer estimation for upper(id2)=’b2′ gives 4 and it’s 1% of 400 rows 🙂

explain plan for
select * from test_tbl
where upper(id3)='B2';

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031 

------------------------------------------------ 
| Id  | Operation           | Name     | Rows  | 
------------------------------------------------ 
|   0 | SELECT STATEMENT    |          |     4 | 
|   1 |  PARTITION RANGE ALL|          |     4 | 
|   2 |   TABLE ACCESS FULL | TEST_TBL |     4 | 
------------------------------------------------

To make optimizer estimation more precise you can create/collect extended statistics

--implicit creation of extended statistics and collect statistics
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (upper(id2)) size 3'
  );
end;
/

explain plan for
select * from test_tbl
where upper(id3)='B2';

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031 

------------------------------------------------ 
| Id  | Operation           | Name     | Rows  | 
------------------------------------------------ 
|   0 | SELECT STATEMENT    |          |     1 | 
|   1 |  PARTITION RANGE ALL|          |     1 | 
|   2 |   TABLE ACCESS FULL | TEST_TBL |     1 | 
------------------------------------------------

Have a fun 🙂

Tomasz

5 thoughts on “Statistics improvements Oracle Database 11G release 2 (11.2)

    • As brief as possible – without statistics Oracle optimizer will not work efficiently because statistics are main information to generate proper explain plans.
      If explain plans are wrong – queries are slow.

      Regards
      Tomasz

  1. is there any table i can query the availabe prefs , let’s say i want to check the availabel prefs in which i can change, when i query dba_tab_stat_prefs there are now rows selected for default values.
    thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.