APPROX_COUNT_DISTINCT is new function available since version 12.1.0.2. It returns approximate number of rows that contain distinct values of expr.
APPROX_COUNT_DISTINCT(expr)
It’s alternative to the COUNT (DISTINCT expr) function, which returns the exact number of rows that contain distinct values of expr.
For processing large amounts of data it’s significantly faster than COUNT, with negligible deviation from the exact result.
Example
Simple table TEST_TBL is created with 32 million of rows to test function APPROX_COUNT_DISTINCT.
CREATE TABLE test_tbl ( id1 NUMBER, id2 NUMBER, id3 NUMBER ) NOLOGGING; INSERT /*+ APPEND */ INTO test_tbl SELECT mod(LEVEL, 3000), mod(LEVEL, 1000), mod(LEVEL, 100) FROM dual CONNECT BY LEVEL <= 1000000; COMMIT; BEGIN FOR i IN 1..5 loop INSERT /*+ APPEND */ INTO test_tbl SELECT * FROM test_tbl; COMMIT; END loop; END; / SELECT COUNT(*) FROM test_tbl; COUNT(*) ---------- 32000000
Now simple performance test. It’s significant difference in time however results are very precise.
set timing on SELECT COUNT(DISTINCT id1) cd1, COUNT(DISTINCT id2) cd2, COUNT(DISTINCT id3) cd3 FROM test_tbl; CD1 CD2 CD3 ---------- ---------- ---------- 3000 1000 100 Elapsed: 00:00:35.298 SELECT APPROX_COUNT_DISTINCT(id1) acd1, APPROX_COUNT_DISTINCT(id2) acd2, APPROX_COUNT_DISTINCT(id3) acd3 FROM test_tbl; ACD1 ACD2 ACD3 ---------- ---------- ---------- 2962 1012 100 Elapsed: 00:00:04.468
I find this function very useful especially in warehouses 🙂 when number of rows in tables are big.
Have a fun 🙂
Tomasz