Statistics improvements Oracle Database 11G release 2 (11.2)

New interesting features are available in 11G related to statistics collection

New statistics preferences
- PUBLISH
- INCREMENTAL
- STALE_PERCENT
Hash based sampling for column statistics
Multicolumn statistics
Expression statistics

Setup for examples

create table test_tbl
( 
  id1 number,
  id2 varchar2(10),
  id3 varchar2(10)
)
partition by range(id1)
(
  partition p1 values less than(100),
  partition p2 values less than(200),
  partition p3 values less than(300),
  partition p4 values less than(400)
);
begin
  for i in 0..399
  loop
    insert into test_tbl values(i, 'a'||i, 'b'||i);
  end loop;

  commit;
end;
/

select count(*)
from test_tbl;

  COUNT(*)
----------
       400

New statistics preferences

There are many default preferences that can be set for collection statistics in previous versions like:

CASCADE
DEGREE
ESTIMATE_PERCENT
METHOD_OPT
NO_INVALIDATE
GRANULARITY

Oracle 11G introduces new preferences that can be set for collection statistics:

PUBLISH – decides if new statistics are published to dictionary immediately or stored in pending area before.
INCREMENTAL – used to gather global statistics on partitioned tables in incremental way
STALE_PERCENT – determines threshold level (percentage of rows modified since last statistics gathering) at which an object is considered to have stale statistics

The parameters can be set using DBMS_STATS

begin
  dbms_stats.set_table_prefs
  ( OWNNAME => 'TOMASZ',
    TABNAME => 'TEST_TBL',
    PNAME   => 'INCREMENTAL',
    PVALUE  => 'TRUE');
end;
/

To see statistics preferences check following views

USER | ALL | DBA_TAB_STAT_PREFS

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE

Statistics level preferences that can be set

GLOBAL – set_global_prefs – sets default preferences for all tables in database
DATABASE – set_database_prefs – sets default preferences for all tables in database excluding tables owned by Oracle
SCHEMA – set_schema_prefs – sets default preferences for all tables in a schema
TABLE – set_table_prefs – sets default preferences for a table
STATEMENT – gather_*_stats – collect statistics ignoring default preferences

Global preferences can be set easily in Enterprise Manager

Server->Manage Optimizer Statistics->Global Statistics Gathering Options

INCREMENTAL preferences

It’s very nice feature to improve global statistics gathering on partitioned tables. For a partitioned table, the system maintains both the statistics on each partition and the overall global statistics for the table.

Global statistics collection for INCREMENTAL:

FALSE – global statistics are recollected for all partitions
TRUE – new option – global statistics are incrementally collected only for modified partitions

GRANULARITY can be specified by package DBMS_STATS as:

AUTO – determine level statistics based on partitioning (global included)
GLOBAL – gathers global level statistics
GLOBAL AND PARTITION – gathers global and partition level statistics
ALL – gathers global, partition and subpartition level statistics
PARTITION – gathers partition level statistics
SUBPARTITION – gathers subpartition level statistics
DEFAULT obsolete use GLOBAL AND PARTITION

If GRANULARITY includes global level and the table is marked as INCREMENTAL then the global statistics are gathered using the incremental mechanism. Of course statistics for modified partitions are gathered as well.

Example:

Verify that INCREMENTAL preferences are set to TRUE

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE

Gather statistics on table TEST_TBL. The stats will be collected using INCREMENTAL method.

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

Verify collected statistics on table, partition and column level

select table_name, last_analyzed
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 10:00:31

select partition_name, last_analyzed, num_rows
from user_tab_partitions
where table_name='TEST_TBL'
order by partition_position;

PARTITION_NAME                 LAST_ANALYZED         NUM_ROWS
------------------------------ ------------------- ----------
P1                             16.04.2014 10:00:31        100 
P2                             16.04.2014 10:00:31        100 
P3                             16.04.2014 10:00:31        100 
P4                             16.04.2014 10:00:31        100

select column_name, num_distinct, num_nulls
from user_tab_col_statistics
where table_name='TEST_TBL';

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
ID3                                     400          0 
ID2                                     400          0 
ID1                                     400          0

Verify as user SYS statistics were collected using INCREMENTAL method

select 
  o.name, 
  c.name, 
  decode(bitand(h.spare2, 8), 8, 'yes', 'no') incremental
from sys.hist_head$ h, sys.obj$ o, sys.col$ c
where h.obj#=o.obj#
  and o.obj# = c.obj#
  and h.intcol# = c.intcol#
  and o.name='TEST_TBL';

NAME           NAME            INCREMENTAL
-------------- --------------- -----------
TEST_TBL       ID1             yes         
TEST_TBL       ID2             yes         
TEST_TBL       ID3             yes

Now it’s time to modify data in one partition TEST_TBL(P1)

update test_tbl partition(p1)
set id2='c'||rownum
where rownum < 70;

69 rows updated.

commit;

committed.

insert into test_tbl partition(p1)
select level, 'd'||level, 'e'||level from dual
connect by level < 100;

99 rows inserted.

commit;

committed.

Gather statistics once again

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

Verification of recollection of statistics

select table_name, last_analyzed
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 10:20:51

select partition_name, last_analyzed, num_rows
from user_tab_partitions
where table_name='TEST_TBL'
order by partition_position;

PARTITION_NAME                 LAST_ANALYZED         NUM_ROWS
------------------------------ ------------------- ----------
P1                             16.04.2014 10:20:51        199 
P2                             16.04.2014 10:00:31        100 
P3                             16.04.2014 10:00:31        100 
P4                             16.04.2014 10:00:31        100

select column_name, num_distinct, num_nulls
from user_tab_col_statistics
where table_name='TEST_TBL'; 

COLUMN_NAME                    NUM_DISTINCT  NUM_NULLS
------------------------------ ------------ ----------
ID3                                     499          0 
ID2                                     499          0 
ID1                                     400          0

NOTE – this new mechanism does not incrementally maintain histograms and density global statistics

STALE_PERCENT preferences

It defines threshold at which statistics for an object are considered as STALE. It’s percentage of rows modified since last statistics gathering.

Example:

Check global STALE_PERCENT settings

select dbms_stats.get_prefs('STALE_PERCENT') stale_percent 
from dual;

STALE_PERCENT 
--------------
10

Set dedicated settings for table TEST_TBL

begin
  dbms_stats.set_table_prefs
  ('TOMASZ', 'TEST_TBL', 'STALE_PERCENT', 18);
end;
/

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE
TEST_TBL    STALE_PERCENT     18

PUBLISH preferences

In previous release 10g collected statistics are automatically published. In 11g you have new option. PUBLISH preference is used to decide whether to publish collected statistics to the dictionary immediately(still default in 11g) or first store them in a pending area.

PUBLISH settings in 11g:

TRUE – statistics are automatically published
FALSE – statistics are stored in pending area

Problems with automatic publishing:

DBA, developers could not verify statistics before publishing
statistics could be not consistent not published in the same time – example: for partitioned table first are published table then partitions then indexes statistics or first are published statistics for “master” table then for related “dependent” tables
statistics collection can fail on a step with part published informations – example: stats collection for a schema can fail in the middle

If PUBLISH is set to FALSE then new statistics are visible from pending area in following views:

ALL|DBA|USER_TAB_PENDING_STATS
ALL|DBA|USER_COL_PENDING_STATS
ALL|DBA|USER_IND_PENDING_STATS
ALL|DBA|USER_TAB_HISTGRM_PENDING_STATS

Pending statistics can be tested using two methods before publishing:

export pending statistics and import them to separate system using:
- DBMS_STATS.EXPORT_PENDING_STATS
- DBMS_STATS.IMPORT_TABLE_STATS
use new session parameter OPTIMIZER_USE_PENDING_STATISTICS set to TRUE to make pending statistics visible by optimizer. As default the parameter is set to FALSE.

Example:

Set PUBLISH to FALSE

begin
  dbms_stats.set_table_prefs
  ('TOMASZ', 'TEST_TBL', 'PUBLISH', 'FALSE');
end;
/

select * from user_tab_stat_prefs
where table_name='TEST_TBL';

TABLE_NAME  PREFERENCE_NAME   PREFERENCE_VALUE    
----------- ----------------- --------------------
TEST_TBL    INCREMENTAL       TRUE
TEST_TBL    STALE_PERCENT     18
TEST_TBL    PUBLISH           FALSE

Double data in table TEST_TBL

select count(*) from test_tbl;

  COUNT(*)
----------
       499

insert into test_tbl
select * from test_tbl;

commit;

select count(*) from test_tbl;

  COUNT(*)
----------
       998

Explain plan and table stats still shows old information – number of rows 499

select table_name, num_rows
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
TEST_TBL                              499

explain plan for
select * from test_tbl;

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   499 |
|   1 |  PARTITION RANGE ALL|          |   499 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   499 |
------------------------------------------------

Gather statistics for TEST_TBL and check them in pending area.

begin
  dbms_stats.gather_table_stats('TOMASZ', 'TEST_TBL');
end;
/

select table_name, partition_name, num_rows, last_analyzed
from user_tab_pending_stats
where table_name='TEST_TBL';

TABLE_NAME  PARTITION_NAME    NUM_ROWS LAST_ANALYZED     
----------- --------------- ---------- -------------------
TEST_TBL    P1                     398 16.04.2014 13:01:57 
TEST_TBL    P2                     200 16.04.2014 13:01:57 
TEST_TBL    P3                     200 16.04.2014 13:01:57 
TEST_TBL    P4                     200 16.04.2014 13:01:57 
TEST_TBL                           998 16.04.2014 13:01:57 

select distinct table_name, last_analyzed 
from user_col_pending_stats
where table_name='TEST_TBL';

TABLE_NAME                     LAST_ANALYZED     
------------------------------ -------------------
TEST_TBL                       16.04.2014 13:01:57

Use pending statistics on session level. This time data are read from pending area because it has the latest version of statistics for table TEST_TBL.

alter session set optimizer_use_pending_statistics=true;

select * 
from table(dbms_xplan.display);

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   998 |
|   1 |  PARTITION RANGE ALL|          |   998 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   998 |
------------------------------------------------

Publish pending statistics and verify it.

begin
  dbms_stats.publish_pending_stats('TOMASZ','TEST_TBL');
end;
/

select table_name, num_rows
from user_tables
where table_name='TEST_TBL';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
TEST_TBL                              998

alter session set optimizer_use_pending_statistics=false;

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031

------------------------------------------------
| Id  | Operation           | Name     | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT    |          |   998 |
|   1 |  PARTITION RANGE ALL|          |   998 |
|   2 |   TABLE ACCESS FULL | TEST_TBL |   998 |
------------------------------------------------

Hash based sampling for column statistics

Computing column statistics is the most expensive step in statistic gathering. Oracle has introduced new method to calculate the statistics – approximate counts. It’s much better and accurate option than old row-sampling technique.

To use approximate counts method:

ESTIMATE_PERCENT must be set to AUTO_SAMPLE_SIZE otherwise previous method is used row-sampling

Multicolumn statistics

In 10g Oracle is computing selectivity of multiple predicates in following way:

If all columns of conjunctive predicates matches all columns of a concatenated index key then optimizer uses formula 1/NDK where NDK is number of distinct keys in the index.
For DYNAMIC_SAMPLING set to 4, optimizer uses dynamic sampling to estimate selectivity of complex predicates involving several columns form the same table. It increases parsing, sampling is very small and as result can lead to inaccurate statistics.
For other cases Oracle multiplies the selectivity of individual predicates to estimate selectivity. It always lead to under-estimation of the selectivity.

To overcome these problems Oracle 11G has introduced possibility to collect/store and use dedicated statistics between two or more columns:

Number of distinct values
Number of nulls
Frequency histograms
Density

Following function DBMS_STATS.CREATE_EXTENDED_STATS creates virtual hidden column that will store extended statistics to shows correlation between column ID2 and ID3. The function returns virtual column name.

--create extended stats
select
  dbms_stats.create_extended_stats
  (
    ownname   => user, 
    tabname   => 'TEST_TBL', 
    extension => '(ID2,ID3)'
  ) stat
from dual;

STAT                               
-----------------------------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D

Information about extended statistics can be found in views:

USER|ALL|DBA_STAT_EXTENSIONS

select * from USER_STAT_EXTENSIONS;

TABLE_NAME  EXTENSION_NAME                 EXTENSION      CREATOR DROPPABLE
----------- ------------------------------ -------------- ------- ---------
TEST_TBL    SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D ("ID2","ID3")  USER    YES

Following function DBMS_STATS.SHOW_EXTENDED_STATS_NAME is another way to get virtual name.

--display virtual column name for extended stats
select
  dbms_stats.show_extended_stats_name
  ( 
    ownname   => user, 
    tabname   => 'TEST_TBL', 
    extension => '(ID2,ID3)'
  ) stat
from dual;

STAT                               
-----------------------------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D

Here is following example how to collect extended statistics

--gather extended statistics
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (id2,id3) size 3'
  );
end;
/

--histogram stats
select column_name, endpoint_number 
from user_tab_histograms 
where table_name='TEST_TBL'
order by endpoint_number;

COLUMN_NAME                     ENDPOINT_NUMBER
------------------------------- ---------------
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                1 
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                2 
SYS_STUGV1W6GVSZ5Y91NUO0W6_D3D                3

Procedure DBMS_STATS.DROP_EXTENDED_STATS is used to drop extended statistics

-- Drop the extended statistics
begin
  dbms_stats.drop_extended_stats
  (
    ownname   => user,
    tabname   => 'TEST_TBL',
    extension => '(ID2,ID3)'
  );
end;
/

Extended statistics can be created implicitly during collection of statistics

--create extended statistics implicit
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (id1,id3) size 3'
  );
end;
/

select column_name, endpoint_number 
from user_tab_histograms 
where table_name='TEST_TBL'
order by endpoint_number;

COLUMN_NAME                     ENDPOINT_NUMBER
------------------------------- ---------------
SYS_STU7RX03D858C$2W$0LDWVEY24                1
SYS_STU7RX03D858C$2W$0LDWVEY24                2
SYS_STU7RX03D858C$2W$0LDWVEY24                3

Expression statistics

Oracle has got always problems with determining correct selectivity for predicates like

function(column) = constant

Optimizer assumes static selectivity value of 1 percent

begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL'
  );
end;
/

select count(*) from test_tbl;
  COUNT(*)
----------
       400

As you can see below optimizer estimation for upper(id2)=’b2′ gives 4 and it’s 1% of 400 rows 🙂

explain plan for
select * from test_tbl
where upper(id3)='B2';

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031 

------------------------------------------------ 
| Id  | Operation           | Name     | Rows  | 
------------------------------------------------ 
|   0 | SELECT STATEMENT    |          |     4 | 
|   1 |  PARTITION RANGE ALL|          |     4 | 
|   2 |   TABLE ACCESS FULL | TEST_TBL |     4 | 
------------------------------------------------

To make optimizer estimation more precise you can create/collect extended statistics

--implicit creation of extended statistics and collect statistics
begin
  dbms_stats.gather_table_stats
  ( 
    ownname    => user,
    tabname    => 'TEST_TBL',
    method_opt => 'for columns (upper(id2)) size 3'
  );
end;
/

explain plan for
select * from test_tbl
where upper(id3)='B2';

select * 
from table(dbms_xplan.display('PLAN_TABLE', null, 'BASIC ROWS'));

Plan hash value: 928296031 

------------------------------------------------ 
| Id  | Operation           | Name     | Rows  | 
------------------------------------------------ 
|   0 | SELECT STATEMENT    |          |     1 | 
|   1 |  PARTITION RANGE ALL|          |     1 | 
|   2 |   TABLE ACCESS FULL | TEST_TBL |     1 | 
------------------------------------------------

Have a fun 🙂

Tomasz

5 thoughts on “Statistics improvements Oracle Database 11G release 2 (11.2)”

ravi cr on August 26, 2014 at 4:14 pm said:

i want brief explanation about why we need to collect gather stats ?

please forward to my mail it will be helpful

Reply ↓
- admin on August 27, 2014 at 3:09 pm said:
  
  As brief as possible – without statistics Oracle optimizer will not work efficiently because statistics are main information to generate proper explain plans.
  If explain plans are wrong – queries are slow.
  
  Regards
  Tomasz
  
  Reply ↓
satya on November 17, 2015 at 6:14 pm said:

Thank you . It helped me

Reply ↓
mazen on May 9, 2017 at 2:22 pm said:

is there any table i can query the availabe prefs , let’s say i want to check the availabel prefs in which i can change, when i query dba_tab_stat_prefs there are now rows selected for default values.
thank you.

Reply ↓
nikesh on October 11, 2017 at 6:55 am said:

what is estatimate_percent value by default???
and what should be the 10% and 20% will works???

Reply ↓

DBAORA

oracle expert presents

Statistics improvements Oracle Database 11G release 2 (11.2)

5 thoughts on “Statistics improvements Oracle Database 11G release 2 (11.2)”

Leave a Reply Cancel reply