Partition outer join is a method for “data-densification”. If you have sparse data it helps to easily duplicate data with new SQL syntax:
SELECT select_expression FROM table_reference PARTITION BY (expr [,expr ]...) RIGHT OUTER JOIN table_reference
or
SELECT select_expression FROM table_reference LEFT OUTER JOIN table_reference PARTITION BY (expr [,expr ]...)
Example
Let’s prepare some test data
CREATE TABLE fct_tbl ( cust_id VARCHAR2(10), mth DATE, vol NUMBER ); INSERT INTO fct_tbl SELECT 'CUST_'||mod(LEVEL,2) cust_id, add_months(trunc(SYSDATE, 'YY'), mod(LEVEL, 4)) mth, level vol FROM dual CONNECT BY LEVEL < 100; COMMIT;
Following query doesn’t report all months in 2014 and each customer is not reported in each month.
SELECT cust_id, to_char(mth, 'DD.MON.YYYY') mth_name, sum(vol) vol FROM fct_tbl f GROUP BY mth, cust_id ORDER BY mth, cust_id; CUST_ID MTH_NAME VOL ---------- -------------------- ---------- CUST_0 01.JAN.2014 1200 CUST_1 01.FEB.2014 1225 CUST_0 01.MAR.2014 1250 CUST_1 01.APR.2014 1275
Following query shows how to generate missing months (6 months) and each customer in each month. It’s old method.
WITH time_dim AS ( SELECT add_months(trunc(SYSDATE, 'YY'), LEVEL-1) mth FROM dual CONNECT BY LEVEL < 7 ), cust_dim AS ( SELECT DISTINCT cust_id FROM fct_tbl ) SELECT c.cust_id, to_char(t.mth, 'DD.MON.YYYY') mth_name, sum(nvl(f.vol,0)) vol FROM time_dim t, cust_dim c, fct_tbl f WHERE t.mth = f.mth(+) AND c.cust_id = f.cust_id(+) GROUP BY t.mth, c.cust_id ORDER BY t.mth, c.cust_id; CUST_ID MTH_NAME VOL ---------- -------------------- ---------- CUST_0 01.JAN.2014 1200 CUST_1 01.JAN.2014 0 CUST_0 01.FEB.2014 0 CUST_1 01.FEB.2014 1225 CUST_0 01.MAR.2014 1250 CUST_1 01.MAR.2014 0 CUST_0 01.APR.2014 0 CUST_1 01.APR.2014 1275 CUST_0 01.MAY.2014 0 CUST_1 01.MAY.2014 0 CUST_0 01.JUN.2014 0 CUST_1 01.JUN.2014 0 12 rows selected
It can be done in much easier way using PARTITION OUTER JOIN
WITH time_dim AS (SELECT add_months(trunc(SYSDATE, 'YY'), LEVEL-1) mth FROM dual CONNECT BY LEVEL < 7) SELECT f.cust_id, to_char(t.mth, 'DD.MON.YYYY') mth_name, sum(nvl(vol,0)) vol FROM time_dim t LEFT OUTER JOIN fct_tbl f PARTITION BY (f.cust_id) ON(t.mth = f.mth) GROUP BY f.cust_id, t.mth order by t.mth, f.cust_id; CUST_ID MTH_NAME VOL ---------- -------------------- ---------- CUST_0 01.JAN.2014 1200 CUST_1 01.JAN.2014 0 CUST_0 01.FEB.2014 0 CUST_1 01.FEB.2014 1225 CUST_0 01.MAR.2014 1250 CUST_1 01.MAR.2014 0 CUST_0 01.APR.2014 0 CUST_1 01.APR.2014 1275 CUST_0 01.MAY.2014 0 CUST_1 01.MAY.2014 0 CUST_0 01.JUN.2014 0 CUST_1 01.JUN.2014 0 12 rows selected
or
WITH time_dim AS (SELECT add_months(trunc(SYSDATE, 'YY'), LEVEL-1) mth FROM dual CONNECT BY LEVEL < 7) SELECT f.cust_id, to_char(t.mth, 'DD.MON.YYYY') mth_name, sum(nvl(vol,0)) vol FROM fct_tbl f PARTITION BY (f.cust_id) RIGHT OUTER JOIN time_dim t ON(t.mth = f.mth) GROUP BY f.cust_id, t.mth ORDER BY t.mth, f.cust_id; CUST_ID MTH_NAME VOL ---------- -------------------- ---------- CUST_0 01.JAN.2014 1200 CUST_1 01.JAN.2014 0 CUST_0 01.FEB.2014 0 CUST_1 01.FEB.2014 1225 CUST_0 01.MAR.2014 1250 CUST_1 01.MAR.2014 0 CUST_0 01.APR.2014 0 CUST_1 01.APR.2014 1275 CUST_0 01.MAY.2014 0 CUST_1 01.MAY.2014 0 CUST_0 01.JUN.2014 0 CUST_1 01.JUN.2014 0 12 rows selected
Have a fun 🙂
Tomasz