An UPDATE statement might appear to be quite simple but it can be particularly difficult to tune. This post looks at the evolution of an update from an iterative cursor-based approach to a more optimal MERGE approach, by way of a couple of UPDATE variations.
For this exercise we’ll use the following scenario:
- A table holding account transactions
- A daily summary table storing the transaction and fee amounts for each account
The objective is to update the daily summary table for a day from the details in the transaction table. The day chosen was the 11th day in the month. The full DDL for the tables is available for download but the basic column structure of the tables is:
CREATE TABLE transactions (trans_id NUMBER(10) NOT NULL ,account NUMBER(6) NOT NULL ,trans_datetime DATE NOT NULL ,trans_type VARCHAR2(30) NOT NULL ,trans_amount NUMBER(9,2) NOT NULL) CREATE TABLE account_daily_summary (account NUMBER(6) NOT NULL ,summary_date DATE NOT NULL ,trans_amount NUMBER(9,2) ,fee_amount NUMBER(9,2))
We’ll be summing up the entries in TRANSACTIONS for each account to update the corresponding entry in the ACCOUNT_DAILY_SUMARY table.
Someone new to database systems would turn to a row by row based approach, e.g:
DECLARE CURSOR acct_dtls_cur IS SELECT t.account , TRUNC(t.trans_datetime) AS summary_date , SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount , SUM(CASE WHEN t.trans_type='FEE' THEN t.trans_amount END) AS fee_amount FROM transactions t WHERE t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10 AND t.trans_datetime < TRUNC(SYSDATE,'YY') + 11 GROUP BY t.account , TRUNC(t.trans_datetime); BEGIN FOR acct_dtls_rec IN acct_dtls_cur LOOP UPDATE account_daily_summary SET trans_amount = acct_dtls_rec.trans_amount , fee_amount = acct_dtls_rec.fee_amount WHERE account = acct_dtls_rec.account AND summary_date = acct_dtls_rec.summary_date; END LOOP; END;
Here we’ve opened a cursor that gives us the details in the destination table that we need to update, retrieve the rows from that cursor and update the destination table one row at a time. Row by row type processing like this is notoriously resource consuming and slow. Instead we want to avoid row by row processing and instead do everything in a single SQL UPDATE statement. Here’s one way that we might do that:
UPDATE account_daily_summary ads SET ads.trans_amount = (SELECT SUM(t.trans_amount) FROM transactions t WHERE t.account = ads.account AND t.trans_datetime >= ads.summary_date AND t.trans_datetime < ads.summary_date + 1 AND t.trans_type = 'TRANSACTION') , ads.fee_amount = (SELECT SUM(t.trans_amount) FROM transactions t WHERE t.account = ads.account AND t.trans_datetime >= ads.summary_date AND t.trans_datetime < ads.summary_date + 1 AND t.trans_type = 'FEE') WHERE ads.account IN (SELECT t.account FROM transactions t WHERE t.trans_datetime >= ads.summary_date AND t.trans_datetime < ads.summary_date + 1) AND ads.summary_date = TRUNC(SYSDATE) + 10
Ummm… now that’s looking quite messy. We have 3 sub-queries that are very similar to each other. Each attribute in the destination table that we’re updating has its own correlated sub-query to obtain the value and the UPDATE itself has a sub-query to identify the set of accounts that need updating.
Running the above through Autotrace we get the following execution plan and statistics:
-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | 26 | 26 (8)| 00:00:01 | | 1 | UPDATE | ACCOUNT_DAILY_SUMMARY | | | | | | 2 | NESTED LOOPS SEMI | | 1 | 26 | 10 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | ACCOUNT_DAILY_SUMMARY | 5 | 75 | 5 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | TRANSACTIONS_IX1 | 3024 | 33264 | 1 (0)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 22 | | | |* 6 | FILTER | | | | | | |* 7 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 1 | 22 | 7 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | TRANSACTIONS_IX1 | 4 | | 2 (0)| 00:00:01 | | 9 | SORT AGGREGATE | | 1 | 22 | | | |* 10 | FILTER | | | | | | |* 11 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 1 | 22 | 7 (0)| 00:00:01 | |* 12 | INDEX RANGE SCAN | TRANSACTIONS_IX1 | 4 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10) 4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1) filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE") 6 - filter(:B1+1>:B2) 7 - filter("T"."TRANS_TYPE"='TRANSACTION') 8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1) 10 - filter(:B1+1>:B2) 11 - filter("T"."TRANS_TYPE"='FEE') 12 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1) Statistics ---------------------------------------------------------- 0 recursive calls 202 db block gets 6306 consistent gets 0 physical reads 24860 redo size 1138 bytes sent via SQL*Net to client 2292 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 100 rows processed
We can see that each sub-query is being executed separately. The last two on the plan, appearing on lines 7/8 and 11/12 are being executed for each account. This is reflected in our statistics with 6,306 consistent gets (consistent mode logical IO) and 202 db block gets (current mode logical IO).
Can we simplify the query? Well, one way is to update both columns together as a set. I’ve previously discussed set-based operations in relation to queries but didn’t touch on UPDATE statements. This is how we would do such an update:
UPDATE account_daily_summary ads SET (ads.trans_amount,ads.fee_amount) = (SELECT SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) , SUM(CASE WHEN t.trans_type='FEE' THEN t.trans_amount END) FROM transactions t WHERE t.account = ads.account AND t.trans_datetime >= ads.summary_date AND t.trans_datetime < ads.summary_date + 1) WHERE ads.account IN (SELECT t.account FROM transactions t WHERE t.trans_datetime >= ads.summary_date AND t.trans_datetime < ads.summary_date + 1) AND ads.summary_date = TRUNC(SYSDATE,'YY') + 10
Updating both destination columns has certainly simplified the UPDATE statement. We now have just 2 sub-queries; one to identify the account population and another to find the values of the columns we’re updating. So how does it perform? Autotrace once again shows us:
-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | 26 | 26 (8)| 00:00:01 | | 1 | UPDATE | ACCOUNT_DAILY_SUMMARY | | | | | | 2 | NESTED LOOPS SEMI | | 1 | 26 | 10 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | ACCOUNT_DAILY_SUMMARY | 5 | 75 | 5 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | TRANSACTIONS_IX1 | 3024 | 33264 | 1 (0)| 00:00:01 | | 5 | SORT AGGREGATE | | 1 | 22 | | | |* 6 | FILTER | | | | | | | 7 | TABLE ACCESS BY INDEX ROWID| TRANSACTIONS | 2 | 44 | 7 (0)| 00:00:01 | |* 8 | INDEX RANGE SCAN | TRANSACTIONS_IX1 | 4 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10) 4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1) filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE") 6 - filter(:B1+1>:B2) 8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1) Statistics ---------------------------------------------------------- 0 recursive calls 202 db block gets 3211 consistent gets 0 physical reads 24900 redo size 1138 bytes sent via SQL*Net to client 1937 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 100 rows processed
As might have been expected since we’ve reduced the number of sub-queries we have reduced the number of operations in the execution plan. This in turn has reduced the resource consumption as shown by the consistent gets metric, which has dropped from 6,242 to 3,211. db block gets remains constant at 202.
The key cost in the above query is the way we’re performing an index looking into the TRANSACTION table for every account to work out the destination values, i.e. the correlated sub-query in the SET part of the UPDATE is being executed multiple times. Would it not be more efficient to identify the account and obtain the update values at the same time in a single operation and use those values to go update the summary table? This type of operation is possible by using the MERGE statement.
MERGE is used to perform what is frequently called an “UPSERT”; update the destination if a row already exists, otherwise insert into the destination. What can be overlooked with MERGE is that both the update or insert parts do not have to be specified, i.e. MERGE can be used to perform just an update operation. Rewriting our UPDATE as a MERGE we get:
MERGE INTO account_daily_summary dest USING (SELECT t.account , TRUNC(t.trans_datetime) AS summary_date , SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount , SUM(CASE WHEN t.trans_type='FEE' THEN t.trans_amount END) AS fee_amount FROM transactions t WHERE t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10 AND t.trans_datetime < TRUNC(SYSDATE,'YY') + 11 GROUP BY t.account , TRUNC(t.trans_datetime)) src ON (src.account = dest.account AND src.summary_date = dest.summary_date) WHEN MATCHED THEN UPDATE SET dest.trans_amount = src.trans_amount , dest.fee_amount = src.fee_amount
The USING clause contains the SELECT statement that defines the accounts that need updating as well as the values required. No more correlated sub-queries so the entire statement is quite clear and concise. Running the MERGE through Autotrace we get the following execution plan:
------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 2882 | 86460 | 117 (4)| 00:00:02 | | 1 | MERGE | ACCOUNT_DAILY_SUMMARY | | | | | | 2 | VIEW | | | | | | |* 3 | HASH JOIN | | 2882 | 168K| 117 (4)| 00:00:02 | | 4 | TABLE ACCESS FULL | ACCOUNT_DAILY_SUMMARY | 3100 | 46500 | 5 (0)| 00:00:01 | | 5 | VIEW | | 2882 | 126K| 112 (4)| 00:00:02 | | 6 | SORT GROUP BY | | 2882 | 63404 | 112 (4)| 00:00:02 | |* 7 | FILTER | | | | | | |* 8 | TABLE ACCESS FULL| TRANSACTIONS | 2882 | 63404 | 111 (3)| 00:00:02 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("SRC"."ACCOUNT"="DEST"."ACCOUNT" AND "DEST"."SUMMARY_DATE"=INTERNAL_FUNCTION("SRC"."SUMMARY_DATE")) 7 - filter(TRUNC(SYSDATE@!,'fmyy')+11>TRUNC(SYSDATE@!,'fmyy')+10) 8 - filter("T"."TRANS_DATETIME"<TRUNC(SYSDATE@!,'fmyy')+11 AND "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10) Statistics ---------------------------------------------------------- 0 recursive calls 102 db block gets 410 consistent gets 0 physical reads 24860 redo size 1137 bytes sent via SQL*Net to client 1963 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 100 rows processed
The execution plan shows us that Oracle has done a single join between our tables and merged the results back into the summary table. To do this it elected to use a hash join but it could easily have chosen a nest loops operation if that were more efficient. The outcome of this plan is shown in the I/O metrics; 410 consistent gets and 102 db block gets, which is far more efficient than the previous approaches.
Download the set-up script for the tables and queries used in this post.