If you are developer no doubt you have situation that you need to run your code in parallel. I’m not talking about running DML, DDL in parallel – read following article to understand how to do it effectively – Effective parallel execution for DML, DDL in Oracle.
I’m talking about running many independent pieces of your code – SQL, PL/SQL. Everybody knows it can be done using DBMS_SCHEDULER package, however it requires a lot of steps to be done. Define programs, jobs, chains, schedules.
To simplify this process Oracle has introduced very nice package that can help you do it very quickly DBMS_PARALLEL_EXECUTE.
Following picture describes general concept of this package
On the beginning we have “cloud” of independent code to execute:
- 4 SQL
- 3 PL/SQL
as next step we split “cloud” code into 4 chunks. You can treat chunk as separate workloads.
- Chunk 1
- Chunk 2
- Chunk 3
- Chunk 4
and as last step we decide about number of parallel jobs that will run chunks. Oracle will create jobs using DBMS_SCHEDULER and randomly distribute chunks to each job.
- Job 1 – executes in serial Chunk 1, 2
- Job 2 – executes in serial Chunk 3, 4
Jobs 1 and 2 will be controlled by master job when flow is started. The flow is called in Oracle documentation as “Task”.
Test data
First test data must be prepared to show functionality of the package
Table CLOUD_OF_CODE that will store our code to execute
create table cloud_of_code ( code_id number, code_type varchar2(10), code clob ); insert into cloud_of_code values (1, 'SQL', 'select count(*) from user_tables'); insert into cloud_of_code values (2, 'SQL', 'select count(*) from user_tables'); insert into cloud_of_code values (3, 'PLSQL', 'begin null; end;'); insert into cloud_of_code values (4, 'PLSQL', 'begin null; end;'); commit; exec dbms_stats.gather_table_stats( user, 'CLOUD_OF_CODE');
Log table LOG_DATA that will store logs about execution of “cloud” code
create table log_data ( task_name varchar2(30), start_id varchar2(30), end_id varchar2(30), log_details clob );
Task
Next step is to create flow – Task to execute. Four Tasks are created to show full functionality of the package.
- TASK_CHUNKS_BY_COL
- TASK_CHUNKS_BY_ROWID
- TASK_CHUNKS_BY_SQL
- TASK_TO_DROP
begin dbms_parallel_execute.create_task ( task_name => 'TASK_CHUNKS_BY_COL', comment => 'Task with chunks generated by column' ); dbms_parallel_execute.create_task ( task_name => 'TASK_CHUNKS_BY_ROWID', comment => 'Task with chunks generated by rowid' ); dbms_parallel_execute.create_task ( task_name => 'TASK_CHUNKS_BY_SQL', comment => 'Task with chunks generated by sql' ); dbms_parallel_execute.create_task ( task_name => 'TASK_TO_DROP', comment => 'Task just to show how to drop it' ); end; /
You can find information about created tasks in view
- DBA|ALL|USER_PARALLEL_EXECUTE_TASKS
select task_name, chunk_type, status, table_owner, table_name, number_column from user_parallel_execute_tasks; TASK_NAME CHUNK_TYPE STATUS TABLE_NAME NUMBER_COLUMN --------------------- ----------- -------- ----------- -------------- TASK_CHUNKS_BY_ROWID UNDELARED CREATED TASK_CHUNKS_BY_COL UNDELARED CREATED TASK_CHUNKS_BY_SQL UNDELARED CREATED TASK_TO_DROP UNDELARED CREATED
Unique task name can be easily generated using function GENERATE_TASK_NAME
select dbms_parallel_execute.generate_task_name from dual; GENERATE_TASK_NAME ---------------------- TASK$_22
To drop task run following code
begin dbms_parallel_execute.drop_task ( task_name => 'TASK_TO_DROP' ); end; /
Chunks
There are three methods to create chunks using DBMS_PARALLEL_PACKAGE
- CREATE_CHUNKS_BY_NUMBER_COL
This procedure chunks the table (associated with the specified task) by the specified column. The specified column must be a NUMBER column. This procedure takes the MIN and MAX value of the column, and then divide the range evenly according to CHUNK_SIZE.
begin dbms_parallel_execute.create_chunks_by_number_col ( task_name => 'TASK_CHUNKS_BY_COL', table_owner => user, table_name => 'CLOUD_OF_CODE', table_column => 'CODE_ID', chunk_size => 2 ); end; /
- CREATE_CHUNKS_BY_ROWID
This procedure chunks the table (associated with the specified task) by ROWID. Table statistics NUM_ROWS
and BLOCKS are approximate guidance for the size of each chunk. The table to be chunked must be a physical table with physical ROWID having views and table functions. Index Organized Tables are not allowed.
Parameter by_row
- TRUE refers to NUM_ROWS statistic
- FALSE refers to BLOCKS statistic
Following example chunks by ROWID based on NUM_ROWS statistic
select table_name, num_rows, blocks from user_tables where table_name='CLOUD_OF_CODE'; TABLE_NAME NUM_ROWS BLOCKS -------------- ---------- ---------- CLOUD_OF_CODE 4 5 begin dbms_parallel_execute.create_chunks_by_rowid ( task_name => 'TASK_CHUNKS_BY_ROWID', table_owner => user, table_name => 'CLOUD_OF_CODE', by_row => true, chunk_size => 2 ); end; /
- CREATE_CHUNKS_BY_SQL
This procedure chunks the table (associated with the specified task) by means of a user-provided SELECT statement. The select statement which returns the range of each chunk must have two columns: START_ID and END_ID
. If task is to chunk by ROWID
, then the two columns must be of ROWID type. If the task is to chunk the table by NUMBER column, then the two columns must be of NUMBER
type.
Following example chunks by NUMBER and SQL statement
--Query used to create chunks select 1 start_id, 2 end_id from dual union all select 3 start_id, 4 end_id from dual; START_ID END_ID ---------- ---------- 1 2 3 4
begin dbms_parallel_execute.create_chunks_by_sql ( task_name => 'TASK_CHUNKS_BY_SQL', sql_stmt => 'select 1 start_id, 2 end_id from dual union all select 3 start_id, 4 end_id from dual', by_rowid => false ); end; /
Task definitions now looks following
select task_name, chunk_type, status, table_name, number_column from user_parallel_execute_tasks; TASK_NAME CHUNK_TYPE STATUS TABLE_NAME NUMBER_COLUMN --------------------- ------------ -------- -------------- -------------- TASK_CHUNKS_BY_COL NUMBER_RANGE CHUNKED CLOUD_OF_CODE CODE_ID TASK_CHUNKS_BY_ROWID ROWID_RANGE CHUNKED CLOUD_OF_CODE TASK_CHUNKS_BY_SQL NUMBER_RANGE CHUNKED
Chunks details
select chunk_id, task_name, start_id, end_id from user_parallel_execute_chunks where task_name in ('TASK_CHUNKS_BY_COL', 'TASK_CHUNKS_BY_SQL') order by 1,3; CHUNK_ID TASK_NAME START_ID END_ID ---------- ------------------- ---------- ---------- 93 TASK_CHUNKS_BY_COL 1 2 94 TASK_CHUNKS_BY_COL 3 4 95 TASK_CHUNKS_BY_SQL 1 2 96 TASK_CHUNKS_BY_SQL 3 4 select chunk_id, task_name, start_rowid, end_rowid from user_parallel_execute_chunks where task_name in ('TASK_CHUNKS_BY_ROWID') order by 1,3; CHUNK_ID TASK_NAME START_ROWID END_ROWID -------- --------------------- ------------------- ------------------- 97 TASK_CHUNKS_BY_ROWID AAAWnaAAJAAAAFwAAA AAAWnaAAJAAAAF3CcP
To drop chunks you can run procedure DROP_CHUNKS
DBMS_PARALLEL_EXECUTE.DROP_CHUNKS ( task_name IN VARCHAR2 );
Wrapper code
It’s time for package DBMS_PARALLEL_EXECUTE to run our chunks. We must provide some SQL code with two placeholders:
- start_id
- end_id
like procedure
begin some_procedure(:start_id, :end_id) end;
or some dml
delete from some_table where some_column between :start_id and :end_id; insert into some_table values(...,...,:start_id, :end_id); update some_table where some_column between :start_id and :end_id;
Placeholders are automatically assigned when a given chunk is started and is NUMBER type for chunks NUMBER_RANGE or ROWID type for ROWID_RANGE.
So let’s prepare such SQL code. In this case it will be procedure WRAPPER_CODE. I have just added to wrapper extra column “p_task_name” to recognize if chunk code is based on ROWID or NUMBER. The wrapper is quite simple it reads in serial “cloud” code for range start_id-end_id of given chunks and logs it into log table.
create or replace procedure wrapper_code ( p_task_name varchar2, p_start_id varchar2, p_end_id varchar2 ) is l_cur sys_refcursor; l_sql clob; l_code_id cloud_of_code.code_id%type; l_code_type cloud_of_code.code_type%type; l_code cloud_of_code.code%type; begin if p_task_name = 'TASK_CHUNKS_BY_ROWID' then l_sql := 'select code_id, code_type, code from cloud_of_code where rowid between '||p_start_id||' and '||p_end_id; else l_sql := 'select code_id, code_type, code from cloud_of_code where code_id between '||p_start_id||' and '||p_end_id; end if; open l_cur for l_sql; loop fetch l_cur into l_code_id, l_code_type, l_code; exit when l_cur%notfound; insert into log_data ( task_name, start_id, end_id, log_details ) values ( p_task_name, p_start_id, p_end_id, l_code_id||chr(10)|| l_code_type||chr(10)|| l_code ); end loop; end; /
Run code
Now it’s time to run task using DBMS_PARALLEL_EXECUTE.RUN_TASK. Parallel level is set to 2. So two DBMS_SCHEDULER jobs will be created.
declare l_sql clob; l_run_id number; begin l_sql := 'begin wrapper_code(''TASK_CHUNKS_BY_COL'', :start_id, :end_id); end;'; dbms_parallel_execute.run_task ( task_name => 'TASK_CHUNKS_BY_COL', sql_stmt => l_sql, language_flag => dbms_sql.native, parallel_level => 2 ); end; /
Details for execution can be found in view
- DBA|ALL|USER_PARALLEL_EXECUTE_CHUNKS
Here you can see that two separate jobs were scheduled.
select chunk_id, task_name, status, start_id, end_id, job_name from user_parallel_execute_chunks where task_name='TASK_CHUNKS_BY_COL'; CHUNK_ID TASK_NAME STATUS START_ID END_ID JOB_NAME -------- ------------------- ---------- -------- ------ ------------ 93 TASK_CHUNKS_BY_COL PROCESSED 1 2 TASK$_208_1 94 TASK_CHUNKS_BY_COL PROCESSED 3 4 TASK$_208_2
Details can be found as well in standard oracle scheduler job logs view DBA_SCHEDULER_JOB_RUN_DETAILS
select job_name, status, error#, errors from dba_scheduler_job_run_details where job_name in ('TASK$_208_1', 'TASK$_208_2'); JOB_NAME STATUS ERROR# ERRORS ------------ ---------- ---------- ---------------- TASK$_208_1 SUCCEEDED 0 TASK$_208_2 SUCCEEDED 0
our procedure WRAPPER_CODE has also written details logs to own table LOG_DATA.
select * from log_data; TASK_NAME START_ID END_ID LOG_DETAILS ------------------ -------- ------ -------------------------------- TASK_CHUNKS_BY_COL 1 2 1 SQL select count(*) from user_tables TASK_CHUNKS_BY_COL 1 2 2 SQL select count(*) from all_tables TASK_CHUNKS_BY_COL 3 4 3 PLSQL begin null; end; TASK_CHUNKS_BY_COL 3 4 4 PLSQL begin null; null; end;
Have a fun 🙂
Tomasz
Hi. When executing DBMS_PARALLEL_EXECUTE repeatedly within one session, using CHUNKS_BY_NUMBER_COL approach, after execution and successfull finish of task, implicit cursor leaves unclosed.
Thus, when I have task of 500 bulk payments, with 1000 payments in each, in session I sequentially open 500 cursors untill session ends. Basically, 500 bulk payments are processed sequentially and for each bulk payment parallel task is executed to process 1000 payments in 25-entity chunks.
How to avoid unclosed cursors?
Do you get any Oracle error ? Please specify it. Implicit cursors are always closed automatically by Oracle. If it’s not true for your application than it seems you encountered Oracle bug. If you are referring by mistake to explicit cursors like
OPEN cur
then you should always close such cursor with
CLOSE cur
Regards
Tomasz
Hi, thanks for your quick reply. After exceeding value of max_open_cursors, newly executed parallel task fails on error: ORA-01000: maximum open cursors exceeded.
I simulated scenario with 456 bulk payments (i.e. starting same parallel task 456 times sequentialy) and started to fail at 286th iteration, while I have max_open_cursors set on 300 on my database.
Implicit vs. explicit cursor – I am pretty sure, that cursor is implicit:
1) It is sort of SELECT MIN(“PKID”), MAX(“PKID”) FROM “MY_PRL_ENTITIES” – and does not occur in application source code
2) while traced step-by-step, cursor has been open when we executed “DBMS_PARALLEL_EXECUTE.run_task”
Thanks much,
Ivan.
… in my own opinion, Oracle did not expected to use DBMS_PRL_EXECUTE this way – I mean repeated calls within one session. But sometimes, situation requires that.
Please help !!
My table N_MVT_NEW_PREV has 80.000.000 lignes. I want to run an update using tasks.
But that doesn’t work :
I think that the tasks hasn’t been runned. it stilled statued at ‘CHUNKED’ (status 4)
DECLARE
l_task0 VARCHAR2(30) := ‘test_task’;
l_sql_stmt VARCHAR2(32767);
l_try NUMBER;
l_status NUMBER;
BEGIN
SELECT DBMS_PARALLEL_EXECUTE.generate_task_name into l_task0
FROM dual;
dbms_output.put_line(l_task0);
DBMS_PARALLEL_EXECUTE.create_task (l_task0);
DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(l_task0,’BO’,’N_MVT_NEW_PREV’,TRUE,1000000);
l_sql_stmt :=’UPDATE /*+ ROWID(dda) */ bo.n_mvt_new_prev N_MVT_NEW_PREV ‘||chr(10)||
‘ SET (N_MVT_NEW_PREV.MVT_VER_MT_HT,N_MVT_NEW_PREV.MVT_VER_MT_ADD_HT) = ‘||chr(10)||
‘(select B.MVT_VER_MT_HT,B.MVT_VER_MT_ADD_HT ‘||chr(10)||
‘from BO.MVT_VER_MAJ_HT_TMP B ‘||chr(10)||
‘WHERE N_MVT_NEW_PREV.MVT_CTA_D 4);
— If there is error, RESUME it for at most 2 times.
l_try := 0;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task0);
dbms_output.put_line(‘statut ‘||l_status);
WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED and l_status != DBMS_PARALLEL_EXECUTE.CHUNKED)
Loop
l_try := l_try + 1;
DBMS_PARALLEL_EXECUTE.resume_task(l_task0);
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task0);
END LOOP;
l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task0);
dbms_output.put_line('statut '||l_status);
DBMS_PARALLEL_EXECUTE.drop_task(l_task0);
END;
/
PL/SQLprocedure successfully completed
statut 4
statut 4
I don’t understand why people are trying to use dbms_parallel_execute for DML operations. It’s not recommended.
Much better option is to use clear and easy syntax:
alter session enable paralel dml;
then run user DML;
commit;
It’s still not recommended if you have many records to update and your table is compressed or you want to avoid row chaining. In such case it’s better to rebuild a given table by CREATE TABLE AS SELECT.
DBMS_PARALLEL_EXECUTE is created to avoid using complex scheduler. That’s my private opinion :).
Regards
Tomasz
There are situations where dbms_parallel_execute for DML operations is the most practical way for DML operations.
For example: even with parallel, one-shot DML of merge into…complex-select-from-huge-table-that-takes-hours-to-open-cursor can run into:
– out of sort space
– snapshot too old
– huge amounts of rolllback generated when other operations are going against the tables by other sessions
When dealing with high-volume complex SQL, dbms_parallel_execute chunked SQL can open the cursor much quicker and allows for chunked commits; you don’t really want to have to re-execute an operation that takes hours to run from scratch because your one-shot merge has failed after 18 hours with “snapshot too old”.
There are always such cases where it can be usefully that’s true. It’s all about situation.
Nice, clear, very useful website, by the way Tomasz, congratulations!
Hi,
i have 3 lacks of records in my table, for update the table, i am using
chunk_size => 1000 in DBMS_PARALLEL_EXECUTION with parallel_level => 10
AND
chunk_size => 10000 in DBMS_PARALLEL_EXECUTION with parallel_level => 100
in difference in Execution time is only 1 minutes only, please tell me what is reason behind that.
where we can see how much max chunk size and parallel_level we can use in oracle 11.2G
Thanks in advance:)
So many chunks is very weird for me. Next isn’t it better to do update using simple update or CTAS ?
Regards
Tomasz
Hi, the available views are DBA|USER_PARALLEL_EXECUTE_TASKS and DBA|USER_PARALLEL_EXECUTE_CHUNKS (in Oracle DB 11.g, 12c). Not ALL prefix.
Fixed 🙂
Thanks
I have a situation where I need to update 150 Million records specific column in a go that too based on some where clause. Please suggest me whether using DBMS_PARALLEL_EXECUTE is a better option or is their a better way out?
There are a lot things to consider before running such update:
– how many columns has got your table
– is it compressed table
– is it partitioned table
– what is current size of the table
– can you run your sql in parallel or you are blocked by resource manager
– must the table be online during update
– how many indexes you have are the global, local
– have you got extra space in case of CTS (create table as select option)
and so on and so on
There is no the better, worse or good option
Regards
Tomasz
– how many columns has got your table
table have 17 columns out of which 2 are CLOB which needs to be updated with NULL.
– is it compressed table
Yes
– is it partitioned table
Yes
– what is current size of the table
94401600 MB
– can you run your sql in parallel or you are blocked by resource manager
Blocked by resource manager.
– must the table be online during update
Yes
– how many indexes you have are the global, local
2 LOB type indexes.
– have you got extra space in case of CTS (create table as select option)
It will be an extra overhead for such a big table I guess.
and so on and so on
Is this thread still alive?