Monday, November 24, 2014

Summary - Control on Parallel operation and Nologging in INSERT .. SELECT ..


Summary of Parallel operation in INSERT .. SELECT ..

1. PDML Mode and Insert Hint

<PDML statement>
alter session enable parallel dml;
--alter session disable parallel dml;
INSERT /*+ append parallel */ INTO TG_TB SELECT .. FROM SRC_TB;

PDML modehint on Inserteffect on Inserteffect on Select
OFFnoneconventionalserial
OFFAPPENDserial direct-pathserial
OFFPARALLELconventionalparallel
OFFAPPEND PARALLELserial direct-pathparallel
ONnoneconventionalserial
ONAPPENDserial direct-pathserial
ONPARALLELparallel direct-pathparallel
ONAPPEND PARALLELparallel direct-pathparallel

If you add APPEND with PARALLEL hint in PDML session, then APPEND is ignored.
But in the session PDML disabled, the PARALLEL hint not work on the insert operation.

Note. 
Test DBMS : 12c Enterprise Edition Release 12.1.0.1.0


2. NOLOGGING setting

NOLOGGING can be set only with ALTER TABLE statement. If you add 'NOLOGGING' as a option in INSERT statement like below, it is considered as a table alias - nologging not works.

       INSERT /*+APPEND */ INTO TG_TB NOLOGGING
       SELECT ...

<Setting NOLOGGING on a Table>
ALTER TABLE TG_TB NOLOGGING;
ALTER TABLE TG_TB LOGGING;


Insert ModeDB Mode테이블 Logging OptionEffect
Direct-pathNOARCHIVELOGGINGminimum redo
NOARCHIVENOLOGGINGminimum redo
ARCHIVELOGGINGsmall redo
ARCHIVENOLOGGINGminimum redo
Conventionalall caseall casefull redo



Sunday, November 23, 2014

Performance overhead in parallel direct path insert with non-partitioned index

Recently I performed a simple test that in case of partitioned or non-partitioned index exists how much the performance difference will be there when a parallel direct path insert job was performed.
To be precise for the test, of course, there must be a exact calculation on the numbers of statistics of v$sysstat, but I choose Deacon Session monitor to just briefly monitor the operation. Because that will be enough to distinguish the difference.


-------------------------
--Create the Test Tables
-------------------------
--source table
drop table emp_src;
create table emp_src as 
select * from emp
;
insert into emp_src
select * from emp_src
;
--execute several times

commit;

--confirm the segment size
SQL > @lseg emp_src

SEGMENT_NAME                   SEGMENT_TYPE               BLOCKS  [MB/8192]
------------------------------ -------------------- ------------ ----------
EMP_SRC                        TABLE                      97,792        764

--target table
drop table emp_tg1;
drop table emp_tg2;
drop table emp_tg3;

create table emp_tg1 partition by hash(empno) partitions 8 nologging as 
select * from emp_src where rownum <= 1000000
;
create table emp_tg2 partition by hash(empno) partitions 8 nologging as 
select * from emp_src where rownum <= 1000000
;
create table emp_tg3 partition by hash(empno) partitions 8 nologging as 
select * from emp_src where rownum <= 1000000
;

-------------------------------------------
--Create the Test Indexes and execute load
-------------------------------------------
--case 1. partitioned local index
drop index emp_tg1_n1;
create index emp_tg1_n1 on emp_tg1(empno) 
local 
nologging;

alter session enable parallel dml;

insert /*+parallel(8) */ into emp_tg1 
select * from emp_src;
commit;

--case 2. non-partitioned index
drop index emp_tg2_n1;
create index emp_tg2_n1 on emp_tg2(empno) 
nologging;

alter session enable parallel dml;

insert /*+parallel(8) */ into emp_tg2 
select * from emp_src;
commit;

--case 3. partitioned global index
drop index emp_tg3_n1;
create index emp_tg3_n1 on emp_tg3(empno) 
global partition by hash(empno) partitions 8
nologging;

alter session enable parallel dml;

insert /*+parallel(8) */ into emp_tg3 
select * from emp_src;
commit;


--All 3 cases are same in execution plan except the segment names

Execution Plan
-----------------------------------------------------------
   0      INSERT STATEMENT Optimizer=FIRST_ROWS (Cost=3 Card=572 Bytes=49K)
   1    0   PX COORDINATOR
   2    1     PX SEND (QC (RANDOM)) OF 'SYS.:TQ10002' (Cost=3 Card=572 Bytes=49K) (PARALLEL_TO_SERIAL) (QC (RANDOM))
   3    2       INDEX MAINTENANCE OF 'EMP_TG1' (PARALLEL_COMBINED_WITH_PARENT)
   4    3         PX RECEIVE (Cost=3 Card=572 Bytes=49K) (PARALLEL_COMBINED_WITH_PARENT)
   5    4           PX SEND (RANGE) OF 'SYS.:TQ10001' (Cost=3 Card=572 Bytes=49K) (PARALLEL_TO_PARALLEL) (RANGE)
   6    5             LOAD AS SELECT OF 'EMP_TG1' (PARALLEL_COMBINED_WITH_PARENT)
   7    6               PX RECEIVE (Cost=3 Card=572 Bytes=49K) (PARALLEL_COMBINED_WITH_PARENT)
   8    7                 PX SEND (ROUND-ROBIN) OF 'SYS.:TQ10000' (Cost=3 Card=572 Bytes=49K) (PARALLEL_FROM_SERIAL) (ROUND-ROBIN)
   9    8                   TABLE ACCESS (FULL) OF 'EMP_SRC' (TABLE) (Cost=3 Card=572 Bytes=49K)
-----------------------------------------------------------

----------------------
Results of Monitoring
----------------------
Note. Below are the results of Deacon session monitor that was bit edited for better view.
      And individual lines are printed in every 5 seconds.

--case 1. partitioned local index
--------------------------------------------------------------------------
PXs |  Exec   Logrds  Phyrds   Drrds  Phywts   Drwts   PGA[m] Rdo[m] Lfspl
-------------m-------m-------m--------------------------------------------
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0
 16 |     5     1037       0       0     213     213     106       0     0
 16 |     9     5669       0       0    3613    3613      88       0     0
 16 |     9     5903       0       0    3633    3633     102       0     0
 16 |     9     5738       0       0    3713    3713     109       0     0
 16 |     0        6   14541   14540    4394    4394     227       0     0
 16 |     0       12    3510    3510    3592    3592      76       0     0
  0 |    58   201014    6122    6105       0       0       3   166.5  6559
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0

--case 2. non-partitioned index
--------------------------------------------------------------------------
PXs |  Exec   Logrds  Phyrds   Drrds  Phywts   Drwts   PGA[m] Rdo[m] Lfspl
-------------m-------m-------m--------------------------------------------
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0
 16 |    10     4218       1       0    2401    2401     101       0     0
 16 |     9     5740       0       0    3663    3663      93       0     0
 16 |     7     4644       0       0    2880    2880     104       0     0
 16 |     9     5794       0       0    3662    3662      99       0     0
 16 |     0        2    2255    2255    1530    1530      57       0     0
 16 |     0        4    1200    1200    1189    1189      67       0     0
 16 |     0        4    1275    1275     859     859     136       0     0
 16 |     0        5    1410    1410    1871    1871      85       0     0
 16 |    16    31360   12544   12526   11736   11736      55    28.3  1115
  0 |    20    99631    2782    2716       0       0       3    81.9  3235
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0

--case 3. partitioned global index
--------------------------------------------------------------------------
PXs |  Exec   Logrds  Phyrds   Drrds  Phywts   Drwts   PGA[m] Rdo[m] Lfspl
-------------m-------m-------m--------------------------------------------
  0 |     0        0       0       0       0       0       3       0     0
  0 |     0        0       0       0       0       0       3       0     0
 16 |     7     2369       2       0    1159    1159      92       0     0
 16 |     8     5534       0       0    3622    3622      97       0     0
 16 |     8     5346       0       0    3351    3351     105       0     0
 16 |     9     5883       0       0    3874    3874      96       0     0
 16 |     4     2131   17886   17886   17163   17163      59       0     0
 16 |     0       16    5266    5266    5323    5323      79       0     0
 16 |    71   107435   14597   14411   11131   11131      57    98.5  3845
  0 |     3    11793     378     332       0       0       3    10.1   398
  0 |     0        0       0       0       0       0       3       0     0


For this is a direct path insert and the target table is Nologging mode, Redo are generated only when index 'leaf node splits' (Lfspl) occur in Index Maintenance phase. And the sort operation must be completed before this write operation. Probably the sort operation started when 'Physical reads direct' (Drrds) begins to be appear in the monitoring.

As expected, case. 2 is the longest. Probably because the number of the index segment is only one, so sort and write operation have to be done with one process, whereas in other two test cases the same jobs could be done with 8 processes.