Evolution of an Update

An UPDATE statement might appear to be quite simple but it can be particularly difficult to tune. This post looks at the evolution of an update from an iterative cursor-based approach to a more optimal MERGE approach, by way of a couple of UPDATE variations.

For this exercise we’ll use the following scenario:

  • A table holding account transactions
  • A daily summary table storing the transaction and fee amounts for each account

The objective is to update the daily summary table for a day from the details in the transaction table. The day chosen was the 11th day in the month. The full DDL for the tables is available for download but the basic column structure of the tables is:

CREATE TABLE transactions
   (trans_id        NUMBER(10)   NOT NULL
   ,account         NUMBER(6)    NOT NULL
   ,trans_datetime  DATE         NOT NULL
   ,trans_type      VARCHAR2(30) NOT NULL
   ,trans_amount    NUMBER(9,2)  NOT NULL)

CREATE TABLE account_daily_summary
   (account      NUMBER(6)    NOT NULL
   ,summary_date DATE         NOT NULL
   ,trans_amount NUMBER(9,2)
   ,fee_amount   NUMBER(9,2))

We’ll be summing up the entries in TRANSACTIONS for each account to update the corresponding entry in the ACCOUNT_DAILY_SUMARY table.

Someone new to database systems would turn to a row by row based approach, e.g:

DECLARE
   CURSOR acct_dtls_cur 
   IS
   SELECT t.account
   ,      TRUNC(t.trans_datetime) AS summary_date
   ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
   ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
   FROM   transactions t
   WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
   AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
   GROUP  BY
          t.account
   ,      TRUNC(t.trans_datetime);
BEGIN
   FOR acct_dtls_rec IN acct_dtls_cur
   LOOP
      UPDATE account_daily_summary
      SET    trans_amount = acct_dtls_rec.trans_amount
      ,      fee_amount   = acct_dtls_rec.fee_amount
      WHERE  account      = acct_dtls_rec.account
      AND    summary_date = acct_dtls_rec.summary_date;
   END LOOP;
END;

Here we’ve opened a cursor that gives us the details in the destination table that we need to update, retrieve the rows from that cursor and update the destination table one row at a time. Row by row type processing like this is notoriously resource consuming and slow. Instead we want to avoid row by row processing and instead do everything in a single SQL UPDATE statement. Here’s one way that we might do that:

UPDATE account_daily_summary ads
SET    ads.trans_amount = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'TRANSACTION')
,       ads.fee_amount  = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'FEE')                       
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE) + 10

Ummm… now that’s looking quite messy. We have 3 sub-queries that are very similar to each other. Each attribute in the destination table that we’re updating has its own correlated sub-query to obtain the value and the UPDATE itself has a sub-query to identify the set of accounts that need updating.

Running the above through Autotrace we get the following execution plan and statistics:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|*  7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
|   9 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|* 10 |    FILTER                      |                       |       |       |            |          |
|* 11 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   7 - filter("T"."TRANS_TYPE"='TRANSACTION')
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)
  10 - filter(:B1+1>:B2)
  11 - filter("T"."TRANS_TYPE"='FEE')
  12 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       6306  consistent gets
          0  physical reads
      24860  redo size
       1138  bytes sent via SQL*Net to client
       2292  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

We can see that each sub-query is being executed separately. The last two on the plan, appearing on lines 7/8 and 11/12 are being executed for each account. This is reflected in our statistics with 6,306 consistent gets (consistent mode logical IO) and 202 db block gets (current mode logical IO).

Can we simplify the query? Well, one way is to update both columns together as a set. I’ve previously discussed set-based operations in relation to queries but didn’t touch on UPDATE statements. This is how we would do such an update:

UPDATE account_daily_summary ads
SET    (ads.trans_amount,ads.fee_amount) 
     = (SELECT SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END)
        ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END)
        FROM   transactions t
        WHERE  t.account = ads.account
        AND    t.trans_datetime >= ads.summary_date
        AND    t.trans_datetime <  ads.summary_date + 1)
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE,'YY') + 10

Updating both destination columns has certainly simplified the UPDATE statement. We now have just 2 sub-queries; one to identify the account population and another to find the values of the columns we’re updating. So how does it perform? Autotrace once again shows us:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     2 |    44 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       3211  consistent gets
          0  physical reads
      24900  redo size
       1138  bytes sent via SQL*Net to client
       1937  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

As might have been expected since we’ve reduced the number of sub-queries we have reduced the number of operations in the execution plan. This in turn has reduced the resource consumption as shown by the consistent gets metric, which has dropped from 6,242 to 3,211. db block gets remains constant at 202.

The key cost in the above query is the way we’re performing an index looking into the TRANSACTION table for every account to work out the destination values, i.e. the correlated sub-query in the SET part of the UPDATE is being executed multiple times. Would it not be more efficient to identify the account and obtain the update values at the same time in a single operation and use those values to go update the summary table? This type of operation is possible by using the MERGE statement.

MERGE is used to perform what is frequently called an “UPSERT”; update the destination if a row already exists, otherwise insert into the destination. What can be overlooked with MERGE is that both the update or insert parts do not have to be specified, i.e. MERGE can be used to perform just an update operation. Rewriting our UPDATE as a MERGE we get:

MERGE INTO account_daily_summary dest
USING (SELECT t.account
       ,      TRUNC(t.trans_datetime) AS summary_date
       ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
       ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
       FROM   transactions t
       WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
       AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
       GROUP  BY
              t.account
       ,      TRUNC(t.trans_datetime)) src
ON (src.account = dest.account AND src.summary_date = dest.summary_date)
WHEN MATCHED THEN UPDATE
SET dest.trans_amount = src.trans_amount
,   dest.fee_amount   = src.fee_amount

The USING clause contains the SELECT statement that defines the accounts that need updating as well as the values required. No more correlated sub-queries so the entire statement is quite clear and concise. Running the MERGE through Autotrace we get the following execution plan:

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |                       |  2882 | 86460 |   117   (4)| 00:00:02 |
|   1 |  MERGE                  | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   VIEW                  |                       |       |       |            |          |
|*  3 |    HASH JOIN            |                       |  2882 |   168K|   117   (4)| 00:00:02 |
|   4 |     TABLE ACCESS FULL   | ACCOUNT_DAILY_SUMMARY |  3100 | 46500 |     5   (0)| 00:00:01 |
|   5 |     VIEW                |                       |  2882 |   126K|   112   (4)| 00:00:02 |
|   6 |      SORT GROUP BY      |                       |  2882 | 63404 |   112   (4)| 00:00:02 |
|*  7 |       FILTER            |                       |       |       |            |          |
|*  8 |        TABLE ACCESS FULL| TRANSACTIONS          |  2882 | 63404 |   111   (3)| 00:00:02 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("SRC"."ACCOUNT"="DEST"."ACCOUNT" AND
              "DEST"."SUMMARY_DATE"=INTERNAL_FUNCTION("SRC"."SUMMARY_DATE"))
   7 - filter(TRUNC(SYSDATE@!,'fmyy')+11>TRUNC(SYSDATE@!,'fmyy')+10)
   8 - filter("T"."TRANS_DATETIME"<TRUNC(SYSDATE@!,'fmyy')+11 AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10)


Statistics
----------------------------------------------------------
          0  recursive calls
        102  db block gets
        410  consistent gets
          0  physical reads
      24860  redo size
       1137  bytes sent via SQL*Net to client
       1963  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        100  rows processed

The execution plan shows us that Oracle has done a single join between our tables and merged the results back into the summary table. To do this it elected to use a hash join but it could easily have chosen a nest loops operation if that were more efficient. The outcome of this plan is shown in the I/O metrics; 410 consistent gets and 102 db block gets, which is far more efficient than the previous approaches.


Download the set-up script for the tables and queries used in this post.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s