In release 12C Oracle automatically gathers statistics for following operations:
- CREATE TABLE AS SELECT
- INSERT /*+ APPEND */ INTO … SELECT on en empty table using direct path
It’s very similar behavior to statistics gathering done during a CREATE INDEX or INDEX REBUILD. No init.ora or any manual settings are required to turn on this feature. Now you can save a lot of time by skipping extra step to collect statistics Example Let’s build source table with data. Data is loaded to empty table test_src using INSERT INTO … VALUE so you can see Oracle is not collecting automatically stats on the table.
create table test_src ( id1 number, id2 number ); begin for i in 1..200 loop insert into test_src values(i, i); end loop; commit; end; / select count(*) from test_src; COUNT(*) ---------- 200 select table_name, num_rows, last_analyzed from user_tables where table_name='TEST_SRC'; TABLE_NAME NUM_ROWS LAST_ANALYZED ----------- ---------- --------------------- TEST_SRC
Now it’s time to verify CREATE TABLE AS SELECT – in short CTS. This time Oracle automatically collected statistics for the table. It’s fantastic feature 🙂 especially for warehouse tables. Auto collecting stats should improve and stabilize your ETL. Consider some complex flow with many steps CTS 🙂
create table test_dst1 as select * from test_src; select table_name, num_rows, last_analyzed from user_tables where table_name='TEST_DST1'; TABLE_NAME NUM_ROWS LAST_ANALYZED ----------- ---------- --------------------- TEST_DST1 200 08/10/2013 14:03:56
it works for direct INSERT on empty table as well
create table test_dst2 ( id1 number, id2 number ); insert /*+ APPEND */ into test_dst2 select * from test_src; commit; select table_name, num_rows, last_analyzed from user_tables where table_name='TEST_DST2'; TABLE_NAME NUM_ROWS LAST_ANALYZED ----------- ---------- --------------------- TEST_DST2 200 08/10/2013 14:05:56
Subsequent direct bulk loads don’t change statistics
insert /*+ APPEND */ into test_dst1 select * from test_src; commit; insert /*+ APPEND */ into test_dst2 select * from test_src; commit; select table_name, num_rows, last_analyzed from user_tables where table_name in ('TEST_DST1', 'TEST_DST2'); TABLE_NAME NUM_ROWS LAST_ANALYZED ----------- ---------- --------------------- TEST_DST1 200 08/10/2013 14:03:56 TEST_DST2 200 08/10/2013 14:05:56
Explain plan Automatic gathering statistics is visible in explain plan as “OPTIMIZER STATISTICS GATHERING”
explain plan for create table test_tbl1 as select * from test_src; select * from table(dbms_xplan.display(null, null, 'BASIC')); ------------------------------------------------------ | Id | Operation | Name | ------------------------------------------------------ | 0 | CREATE TABLE STATEMENT | | | 1 | LOAD AS SELECT | TEST_TBL1 | | 2 | OPTIMIZER STATISTICS GATHERING | | | 3 | TABLE ACCESS FULL | TEST_SRC | ------------------------------------------------------
you can block this step using new hint NO_GATHER_OPTIMIZER_STATISTICS
explain plan for create table test_tbl1 as select /*+ NO_GATHER_OPTIMIZER_STATISTICS */ * from test_src; select * from table(dbms_xplan.display(null, null, 'BASIC')); -------------------------------------------- | Id | Operation | Name | -------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | 1 | LOAD AS SELECT | TEST_TBL1 | | 2 | TABLE ACCESS FULL | TEST_SRC | --------------------------------------------
Statistics To verify how statistics were collected you can check column NOTES in view USER_TAB_COL_STATISTICS. It has STATS_ON_LOAD when the column statistics are gathered during the bulk load.
select table_name, column_name, num_distinct, notes from user_tab_col_statistics where table_name in ('TEST_DST1', 'TEST_DST2'); TABLE_NAME COLUMN_NAME NUM_DISTINCT NOTES ----------- ------------ ------------ -------------- TEST_DST1 ID1 200 STATS_ON_LOAD TEST_DST1 ID2 200 STATS_ON_LOAD TEST_DST2 ID1 200 STATS_ON_LOAD TEST_DST2 ID2 200 STATS_ON_LOAD
Currently, statistics gathering does not happen for bulk load statements when any of the following conditions apply to the target table:
- It is in an Oracle-owned schema such as
SYS
. - It is a nested table.
- It is an index-organized table (IOT).
- It is an external table.
- It is a global temporary table defined as ON COMMIT DELETE ROWS.
- It has virtual columns.
- It has a PUBLISH preference set to FALSE.
- It is partitioned, INCREMENTAL is set to TRUE
, and extended syntax is not used.
This functionality is controlled by new hidden parameter in 12C which defaults to TRUE, so to block the feature set it to FALSE
_optimizer_gather_stats_on_load
Have a fun 🙂
Tomasz