Online Statistics Gathering for Bulk Loads Oracle Database 12C release 1 (12.1)

In release 12C Oracle automatically gathers statistics for following operations:

  • CREATE TABLE AS SELECT
  • INSERT /*+ APPEND */ INTO … SELECT on en empty table using direct path

It’s very similar behavior to statistics gathering done during a CREATE INDEX or INDEX REBUILD. No init.ora or any manual settings are required to turn on this feature. Now you can save a lot of time by skipping extra step to collect statistics Example Let’s build source table with data. Data is loaded to empty table test_src using INSERT INTO … VALUE so you can see Oracle is not collecting automatically stats on the table.

create table test_src
(
  id1 number,
  id2 number
);

begin
  for i in 1..200
  loop
    insert into test_src values(i, i);
  end loop;

  commit;
end;
/

select count(*) from test_src;

  COUNT(*)
----------
       200 

select table_name, num_rows, last_analyzed from user_tables
where table_name='TEST_SRC';

TABLE_NAME  NUM_ROWS   LAST_ANALYZED       
----------- ---------- ---------------------
TEST_SRC

Now it’s time to verify CREATE TABLE AS SELECT – in short CTS. This time Oracle automatically collected statistics for the table. It’s fantastic feature 🙂 especially for warehouse tables. Auto collecting stats should improve and stabilize your ETL. Consider some complex flow with many steps CTS 🙂

create table test_dst1
as select * from test_src;

select table_name, num_rows, last_analyzed from user_tables
where table_name='TEST_DST1';

TABLE_NAME  NUM_ROWS   LAST_ANALYZED       
----------- ---------- ---------------------
TEST_DST1   200        08/10/2013 14:03:56

it works for direct INSERT on empty table as well

create table test_dst2
(
  id1 number,
  id2 number
);

insert /*+ APPEND */ into test_dst2
select * from test_src;

commit;

select table_name, num_rows, last_analyzed from user_tables
where table_name='TEST_DST2';

TABLE_NAME  NUM_ROWS LAST_ANALYZED       
----------- ---------- ---------------------
TEST_DST2   200 08/10/2013 14:05:56

Subsequent direct bulk loads don’t change statistics

insert /*+ APPEND */ into test_dst1
select * from test_src;

commit;

insert /*+ APPEND */ into test_dst2
select * from test_src;

commit;

select table_name, num_rows, last_analyzed from user_tables
where table_name in ('TEST_DST1', 'TEST_DST2');

TABLE_NAME  NUM_ROWS   LAST_ANALYZED       
----------- ---------- ---------------------
TEST_DST1   200        08/10/2013 14:03:56
TEST_DST2   200        08/10/2013 14:05:56

Explain plan Automatic gathering statistics is visible in explain plan as “OPTIMIZER STATISTICS GATHERING

explain plan for
create table test_tbl1
as
select * from test_src;

select * from table(dbms_xplan.display(null, null, 'BASIC'));

------------------------------------------------------
| Id  | Operation                        | Name      |
------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |           |
|   1 |  LOAD AS SELECT                  | TEST_TBL1 |
|   2 |   OPTIMIZER STATISTICS GATHERING |           |
|   3 |    TABLE ACCESS FULL             | TEST_SRC  |
------------------------------------------------------

you can block this step using new hint NO_GATHER_OPTIMIZER_STATISTICS

explain plan for
create table test_tbl1
as
select /*+ NO_GATHER_OPTIMIZER_STATISTICS */ * from test_src;

select * from table(dbms_xplan.display(null, null, 'BASIC'));

--------------------------------------------
| Id  | Operation              | Name      |
--------------------------------------------
|   0 | CREATE TABLE STATEMENT |           |
|   1 |  LOAD AS SELECT        | TEST_TBL1 |
|   2 |   TABLE ACCESS FULL    | TEST_SRC  |
--------------------------------------------

Statistics To verify how statistics were collected you can check column NOTES in view USER_TAB_COL_STATISTICS. It has STATS_ON_LOAD when the column statistics are gathered during the bulk load.

select table_name, column_name, num_distinct, notes
from user_tab_col_statistics
where table_name in ('TEST_DST1', 'TEST_DST2');

TABLE_NAME  COLUMN_NAME  NUM_DISTINCT NOTES
----------- ------------ ------------ --------------
TEST_DST1   ID1          200          STATS_ON_LOAD 
TEST_DST1   ID2          200          STATS_ON_LOAD 
TEST_DST2   ID1          200          STATS_ON_LOAD 
TEST_DST2   ID2          200          STATS_ON_LOAD

Currently, statistics gathering does not happen for bulk load statements when any of the following conditions apply to the target table:

  • It is in an Oracle-owned schema such as SYS.
  • It is a nested table.
  • It is an index-organized table (IOT).
  • It is an external table.
  • It is a global temporary table defined as ON COMMIT DELETE ROWS.
  • It has virtual columns.
  • It has a PUBLISH preference set to FALSE.
  • It is partitioned, INCREMENTAL is set to TRUE, and extended syntax is not used.

This functionality is controlled by new hidden parameter in 12C which defaults to TRUE, so to block the feature set it to FALSE

_optimizer_gather_stats_on_load

Have a fun 🙂

Tomasz

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.