Evolution of an Update

An UPDATE statement might appear to be quite simple but it can be particularly difficult to tune. This post looks at the evolution of an update from an iterative cursor-based approach to a more optimal MERGE approach, by way of a couple of UPDATE variations.

For this exercise we’ll use the following scenario:

  • A table holding account transactions
  • A daily summary table storing the transaction and fee amounts for each account

The objective is to update the daily summary table for a day from the details in the transaction table. The day chosen was the 11th day in the month. The full DDL for the tables is available for download but the basic column structure of the tables is:

CREATE TABLE transactions
   (trans_id        NUMBER(10)   NOT NULL
   ,account         NUMBER(6)    NOT NULL
   ,trans_datetime  DATE         NOT NULL
   ,trans_type      VARCHAR2(30) NOT NULL
   ,trans_amount    NUMBER(9,2)  NOT NULL)

CREATE TABLE account_daily_summary
   (account      NUMBER(6)    NOT NULL
   ,summary_date DATE         NOT NULL
   ,trans_amount NUMBER(9,2)
   ,fee_amount   NUMBER(9,2))

We’ll be summing up the entries in TRANSACTIONS for each account to update the corresponding entry in the ACCOUNT_DAILY_SUMARY table.

Someone new to database systems would turn to a row by row based approach, e.g:

DECLARE
   CURSOR acct_dtls_cur 
   IS
   SELECT t.account
   ,      TRUNC(t.trans_datetime) AS summary_date
   ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
   ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
   FROM   transactions t
   WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
   AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
   GROUP  BY
          t.account
   ,      TRUNC(t.trans_datetime);
BEGIN
   FOR acct_dtls_rec IN acct_dtls_cur
   LOOP
      UPDATE account_daily_summary
      SET    trans_amount = acct_dtls_rec.trans_amount
      ,      fee_amount   = acct_dtls_rec.fee_amount
      WHERE  account      = acct_dtls_rec.account
      AND    summary_date = acct_dtls_rec.summary_date;
   END LOOP;
END;

Here we’ve opened a cursor that gives us the details in the destination table that we need to update, retrieve the rows from that cursor and update the destination table one row at a time. Row by row type processing like this is notoriously resource consuming and slow. Instead we want to avoid row by row processing and instead do everything in a single SQL UPDATE statement. Here’s one way that we might do that:

UPDATE account_daily_summary ads
SET    ads.trans_amount = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'TRANSACTION')
,       ads.fee_amount  = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'FEE')                       
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE) + 10

Ummm… now that’s looking quite messy. We have 3 sub-queries that are very similar to each other. Each attribute in the destination table that we’re updating has its own correlated sub-query to obtain the value and the UPDATE itself has a sub-query to identify the set of accounts that need updating.

Running the above through Autotrace we get the following execution plan and statistics:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|*  7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
|   9 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|* 10 |    FILTER                      |                       |       |       |            |          |
|* 11 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   7 - filter("T"."TRANS_TYPE"='TRANSACTION')
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)
  10 - filter(:B1+1>:B2)
  11 - filter("T"."TRANS_TYPE"='FEE')
  12 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       6306  consistent gets
          0  physical reads
      24860  redo size
       1138  bytes sent via SQL*Net to client
       2292  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

We can see that each sub-query is being executed separately. The last two on the plan, appearing on lines 7/8 and 11/12 are being executed for each account. This is reflected in our statistics with 6,306 consistent gets (consistent mode logical IO) and 202 db block gets (current mode logical IO).

Can we simplify the query? Well, one way is to update both columns together as a set. I’ve previously discussed set-based operations in relation to queries but didn’t touch on UPDATE statements. This is how we would do such an update:

UPDATE account_daily_summary ads
SET    (ads.trans_amount,ads.fee_amount) 
     = (SELECT SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END)
        ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END)
        FROM   transactions t
        WHERE  t.account = ads.account
        AND    t.trans_datetime >= ads.summary_date
        AND    t.trans_datetime <  ads.summary_date + 1)
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE,'YY') + 10

Updating both destination columns has certainly simplified the UPDATE statement. We now have just 2 sub-queries; one to identify the account population and another to find the values of the columns we’re updating. So how does it perform? Autotrace once again shows us:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     2 |    44 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       3211  consistent gets
          0  physical reads
      24900  redo size
       1138  bytes sent via SQL*Net to client
       1937  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

As might have been expected since we’ve reduced the number of sub-queries we have reduced the number of operations in the execution plan. This in turn has reduced the resource consumption as shown by the consistent gets metric, which has dropped from 6,242 to 3,211. db block gets remains constant at 202.

The key cost in the above query is the way we’re performing an index looking into the TRANSACTION table for every account to work out the destination values, i.e. the correlated sub-query in the SET part of the UPDATE is being executed multiple times. Would it not be more efficient to identify the account and obtain the update values at the same time in a single operation and use those values to go update the summary table? This type of operation is possible by using the MERGE statement.

MERGE is used to perform what is frequently called an “UPSERT”; update the destination if a row already exists, otherwise insert into the destination. What can be overlooked with MERGE is that both the update or insert parts do not have to be specified, i.e. MERGE can be used to perform just an update operation. Rewriting our UPDATE as a MERGE we get:

MERGE INTO account_daily_summary dest
USING (SELECT t.account
       ,      TRUNC(t.trans_datetime) AS summary_date
       ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
       ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
       FROM   transactions t
       WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
       AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
       GROUP  BY
              t.account
       ,      TRUNC(t.trans_datetime)) src
ON (src.account = dest.account AND src.summary_date = dest.summary_date)
WHEN MATCHED THEN UPDATE
SET dest.trans_amount = src.trans_amount
,   dest.fee_amount   = src.fee_amount

The USING clause contains the SELECT statement that defines the accounts that need updating as well as the values required. No more correlated sub-queries so the entire statement is quite clear and concise. Running the MERGE through Autotrace we get the following execution plan:

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |                       |  2882 | 86460 |   117   (4)| 00:00:02 |
|   1 |  MERGE                  | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   VIEW                  |                       |       |       |            |          |
|*  3 |    HASH JOIN            |                       |  2882 |   168K|   117   (4)| 00:00:02 |
|   4 |     TABLE ACCESS FULL   | ACCOUNT_DAILY_SUMMARY |  3100 | 46500 |     5   (0)| 00:00:01 |
|   5 |     VIEW                |                       |  2882 |   126K|   112   (4)| 00:00:02 |
|   6 |      SORT GROUP BY      |                       |  2882 | 63404 |   112   (4)| 00:00:02 |
|*  7 |       FILTER            |                       |       |       |            |          |
|*  8 |        TABLE ACCESS FULL| TRANSACTIONS          |  2882 | 63404 |   111   (3)| 00:00:02 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("SRC"."ACCOUNT"="DEST"."ACCOUNT" AND
              "DEST"."SUMMARY_DATE"=INTERNAL_FUNCTION("SRC"."SUMMARY_DATE"))
   7 - filter(TRUNC(SYSDATE@!,'fmyy')+11>TRUNC(SYSDATE@!,'fmyy')+10)
   8 - filter("T"."TRANS_DATETIME"<TRUNC(SYSDATE@!,'fmyy')+11 AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10)


Statistics
----------------------------------------------------------
          0  recursive calls
        102  db block gets
        410  consistent gets
          0  physical reads
      24860  redo size
       1137  bytes sent via SQL*Net to client
       1963  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        100  rows processed

The execution plan shows us that Oracle has done a single join between our tables and merged the results back into the summary table. To do this it elected to use a hash join but it could easily have chosen a nest loops operation if that were more efficient. The outcome of this plan is shown in the I/O metrics; 410 consistent gets and 102 db block gets, which is far more efficient than the previous approaches.


Download the set-up script for the tables and queries used in this post.

DBMS_LOCK

Concurrent activity in a complex system is, well… complex. Oracle’s standard locking and blocking mechanisms do an admirable job in general but there will be times when more careful control over processes is required. The package DBMS_LOCK exposes Oracle’s internal locking mechanism specifically for this purpose. According to the DBMS_LOCK documentation, user locks can be used for:

  • Providing exclusive access to a device, such as a terminal
  • Providing application-level enforcement of read locks
  • Detecting when a lock is released and cleanup after the application
  • Synchronizing applications and enforcing sequential processing

The last point is where DBMS_LOCK is frequently useful. Systems often have critical background processes that are required to be run serially. To ensure multiple instances of the job are not executed at once it’s quite simple to use DBMS_LOCK to take an exclusive lock and thereby signal to any other processes that the job is in-flight. Essentially this can be coded as:

DECLARE
   -- the number of seconds before the lock request time outs
   c_request_timeout       CONSTANT NUMBER(4) := 5;
   
   -- constants for lock request responses
   -- from dbms_lock documentation
   c_response_success      CONSTANT NUMBER := 0;
   c_response_timeout      CONSTANT NUMBER := 1;
   c_response_deadlock     CONSTANT NUMBER := 2;
   c_response_param_err    CONSTANT NUMBER := 3;
   c_response_own_lock     CONSTANT NUMBER := 4;
   c_response_illegal_lock CONSTANT NUMBER := 5;

   -- the unique name of the lock identifier
   l_job_identifier        VARCHAR2(128) := 'job_101';
   
   l_lock_handle           VARCHAR2(128);
   l_request_response      INTEGER;
   l_release_response      INTEGER;
BEGIN

   -- obtain a lock handle from our job identifier string 
   dbms_lock.allocate_unique (lockname        => l_job_identifier
                             ,lockhandle      => l_lock_handle
                             ,expiration_secs => 864000);

   -- now take out a an exclusive lock using the lock handle
   l_request_response := dbms_lock.request (lockhandle        => l_lock_handle
                                           ,lockmode          => dbms_lock.x_mode
                                           ,timeout           => c_request_timeout
                                           ,release_on_commit => FALSE);

   IF (l_request_response = c_response_timeout) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Unable to obtain job lock due to time-out. Another process currently holds the lock');
   ELSIF (l_request_response = c_response_success) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Job lock request failed with response of ' || TO_CHAR(l_request_response));
   END IF;
   
   -- the job itself goes here
   
   -- job finished so we'll clean up after ourselves
   l_release_response := dbms_lock.release (lockhandle => l_lock_handle);
   
   IF (l_release_response = c_response_success) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Job lock failed with response of ' || TO_CHAR(l_release_response));
   END IF;
END;

There are a few points to note in the above script:

  • The lock can be set to be released on COMMIT, just like a DML operation row level lock. Otherwise it will be retained until explicitly released or the session terminated.
  • The lock name is global across the entire database. Hence it is important the name chosen can in no way conflict with a request for a totally different purpose. Choose a name that can not be mistaken for any other purpose.
  • Sessions waiting to obtain a lock via DBMS_LOCK.REQUEST wait on the event “enq: UL – contention”
  • The ALLOCATE_UNIQUE assigns a lock handle to the lock name specified. This association will remain in force for a minimum of time as specified by the expiration_secs parameter, which defaults to 864,000 seconds (10 days). After this time Oracle may assign a new lock handle to the lock name.

The last point above bears expansion. DBMS_LOCK.ALLOCATE_UNIQUE sets an expiry time on the lock handle, which defaults to 10 days. If you use the generated lock handle after this period of time then expect trouble. The lock handle will still be a valid handle but if another session calls ALLOCATE_UNIQUE with the same lock name it may be assigned a different handle. You will then end up with multiple sessions all believing they have obtained the lock on the lock name… problems are more than likely to occur.

You can see a listing of the locks allocated by DBMS_LOCK.ALLOCATE_UNIQUE by querying SYS.DBMS_LOCK_ALLOCATED:

SELECT *
FROM    sys.dbms_lock_allocated