Increasing Row Length and PCTFREE

In the last post we looked at how adding a new column can cause row migration and its impact this can have on our queries. In that post I noted that row migration may occur through normal DML so this post looks at that scenario and what we can do to manage it.

A relatively common scenario with table data is that a row is initially created with only a subset of the columns populated. Over time the rest of the columns are then populated with values as the data becomes available. Tables that follow this pattern are prone to suffer from migrated rows much like tables that get new columns added to them. The solution to this problem is different though as we know in advance the columns that exist in the table and can plan accordingly.

We can illustrate the problem using the same table from the previous post. Instead of adding a new column we will start off with all columns defined in the table. Our process first populates all columns except the MORE_DATA column and then goes back and adds data to that column:

SQL>CREATE TABLE row_growth
  2     (id        NUMBER (10) NOT NULL
  3     ,val_1     NUMBER(3)   NOT NULL
  4     ,col_1     CHAR(50)    NOT NULL
  5     ,more_data VARCHAR2(30))
  6  /

Table created.

SQL>CREATE INDEX row_growth_pk
  2     ON row_growth (id)
  3  /

Index created.

SQL>CREATE INDEX row_growth_ix1
  2     ON row_growth (val_1)
  3  /

Index created.

SQL>INSERT INTO row_growth
  2     (id, val_1, col_1)
  3  SELECT ROWNUM
  4  ,      MOD(ROWNUM,500)
  5  ,      'X'
  6  FROM   dual
  7  CONNECT BY ROWNUM <= 10000
  8  /

10000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000         92          0

Once again we have a table with 92 block and no migrated rows. It’s as if the MORE_DATA column does not exist in the table, which is close to the truth as the column has no data in it. Querying the table we get:

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1240 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1240 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2216  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

Similar to last time our query is consuming 24 logical IOs to return the 20 rows. Let’s now populate the MORE_DATA column:

SQL>UPDATE row_growth
  2     SET more_data = RPAD('X',30,'X')
  3  /

10000 rows updated.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        142       2849

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       105

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1720 |    27   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1720 |    27   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       2884  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       113

Exactly as the scenario where we added a new column, we now have 2,849 migrated rows across 142 blocks. The impact to our query if the same; 32 logical IOs, which is an increase of 8 due to the impact of the migrated rows as confirmed by the change in the “table fetch continued row” statistic.

Previously we needed to do an ALTER TABLE MOVE command to resolve this problem but this situation is different. Since we have all our columns determined in advance and we know that we will be populating the MORE_DATA column after the other rows we can tell Oracle to reserve space for it in the data blocks. We do this by setting the PCTFREE value when creating the table.

PCTFREE specifies the amount of space in a block Oracle will leave free when inserting rows into it (not updating existing rows, only inserting new rows). By default PCTFREE is set to 10, meaning that Oracle will stop inserting row into a block if the new row would drop the free space in the black to 10% or less. A quick calculation based on the column lengths and our understanding of the data shows that our initial insert is about 60% of the overall row length so we would need to set PCTFREE to 40 to reserve space for the MORE_DATA column.

Let’s repeat our example with PCTFREE set to 40:

SQL>CREATE TABLE row_growth
  2     (id        NUMBER (10) NOT NULL
  3     ,val_1     NUMBER(3)   NOT NULL
  4     ,col_1     CHAR(50)    NOT NULL
  5     ,more_data VARCHAR2(30))
  6  PCTFREE 40
  7  /

Table created.

SQL>CREATE INDEX row_growth_pk
  2     ON row_growth (id)
  3  /

Index created.

SQL>CREATE INDEX row_growth_ix1
  2     ON row_growth (val_1)
  3  /

Index created.

SQL>INSERT INTO row_growth
  2     (id, val_1, col_1)
  3  SELECT ROWNUM
  4  ,      MOD(ROWNUM,500)
  5  ,      'X'
  6  FROM   dual
  7  CONNECT BY ROWNUM <= 10000
  8  /

10000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        137          0

Our table is now consuming 137 blocks after the initial insert of data instead of 92 previously. This is due to each block only being populated up to 60% of available space. So does this impact our query?

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1240 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1240 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2216  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

We can see that our query is still using 24 logical IOs. So, while the rows of the table a spread across more block we still only need to access the same number for this query. The reason our query did require any extra work was because it is using an index so it is directly accessing the blocks required.

Now let’s see what happens when we update the MORE_DATA column:

SQL>UPDATE row_growth
  2     SET more_data = RPAD('X',30,'X')
  3  /

10000 rows updated.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        137          0

We can see from the CHAIN_CNT entry that we don’t have any migrated rows. Our estimate of setting PCTFREE to 40 was sufficient to ensure that there was adequate free space to handle the update of the MORE_DATA column. Since none of our rows have been migrated our query will not have been impacted and it will perform the same 24 logical IO operations.

In summary, when designing database tables consideration of the manner in which the rows will be populated and setting the PCTFREE clause accordingly is fairly important. Resolving a problem with migrated rows is an annoyance that can easily be avoided with a small amount of forethought and a quick calculation.

New columns and row migration

As a database developer you can get away without knowing a lot about how the Oracle database works. The internals of data blocks, tablespaces and data files tends to sit firmly in the DBA realm leaving developers to simply consider things in terms of tables, columns and rows. That being said, certain development activities will have an impact on the underlying storage and its efficiency. This post looks at the impact of row migration that might result from the common activity of adding a column to a table and back-populating it.

For those not familiar with the concept of row migration, when Oracle goes to update a row in a data block if there is insufficient free space in the block to support the update then Oracle will migrate the row to another data block. Oracle is required to leave behind a pointer in the original block to the new block where the row now resides. This is necessary as all indexes refer to the block using the original location. Row migration can happen at any time and isn’t a particular problem unless it affects a significant number of rows.

Now for an example of adding a row to a table, back-populating it and seeing how many rows get migrated:

SQL>CREATE TABLE row_growth
  2     (id     NUMBER (10) NOT NULL
  3     ,val_1  NUMBER(3)   NOT NULL
  4     ,col_1  CHAR(50)    NOT NULL)
  5  /

Table created.

SQL>CREATE INDEX row_growth_pk
  2     ON row_growth (id)
  3  /

Index created.

SQL>CREATE INDEX row_growth_ix1
  2     ON row_growth (val_1)
  3  /

Index created.

SQL>INSERT INTO row_growth
  2  SELECT ROWNUM
  3  ,      MOD(ROWNUM,500)
  4  ,      'X'
  5  FROM   dual
  6  CONNECT BY ROWNUM <= 10000
  7  /

10000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000         92          0

Here we’ve created a table of three columns and two indexes. After populating the tables with 10,000 rows and analysing it we can see that the table is using 92 blocks and 0 chained/migrated rows. As an aside, we need to use the ANALYZE TABLE command to populate the CHAIN_CNT column rather than the usual DBMS_STATS routine.

Now let’s query the table on the indexed VAL_1 column using AUTOTRACE:

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1180 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1180 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2145  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

From the above output we can see that it required 24 logical IO operations to look up the 20 rows where VAL_1 has a value of 10. Now let’s add a new column to the table and back-populate it:

SQL>ALTER TABLE row_growth
  2     ADD more_data VARCHAR2(30)
  3  /

Table altered.

SQL>UPDATE row_growth
  2     SET more_data = RPAD('X',30,'X')
  3  /

10000 rows updated.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        142       2849

After adding and populating a new 30 character column we analysed the table again. The table statistics show that our table is now using 142 blocks and 2,849 of the rows have been migrated.

So what’s the impact of this? Let’s repeat our query on the VAL_1 column and also monitor the session statistic “table fetch continued row”, which is Oracle’s internal count of how many migrated/chained rows that it has encountered:

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       513

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1720 |    27   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1720 |    27   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       2884  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       521

This is the same query against the same number of rows and following the same execution plan but it’s now done 32 logical IO operations instead of 24, an increase of 8 that corresponds to the increase seen in the “table fetch continued row” statistic (521-513). This is because Oracle is required to follow the migrated row pointers from where they originally resided to their new location.

The solution to the migrated rows problem is to rebuild the table with the ALTER TABLE MOVE command, which in turn invalidates the indexes requiring them to be rebuilt too:

SQL>ALTER TABLE row_growth
  2     MOVE
  3  /

Table altered.

SQL>ALTER INDEX row_growth_pk
  2     REBUILD
  3  /

Index altered.

SQL>ALTER INDEX row_growth_ix1
  2     REBUILD
  3  /

Index altered.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tab_statistics
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        132          0

We’re now back down to 0 migrated rows. Running our query again:

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1720 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1720 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2836  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

After the move and index rebuild our query is back down to the original 24 logical IO operations.

As noted earlier in this post, migrated rows can occur during normal DML activities and are not normally a problem as they impose a trivial overhead. Activities such as adding new columns may cause a significant proportion of a tables rows to be migrated though. This is something to look out for in system deployments and incorporate a table move and index rebuild to mitigate any adverse impact.

Evolution of an Update

An UPDATE statement might appear to be quite simple but it can be particularly difficult to tune. This post looks at the evolution of an update from an iterative cursor-based approach to a more optimal MERGE approach, by way of a couple of UPDATE variations.

For this exercise we’ll use the following scenario:

  • A table holding account transactions
  • A daily summary table storing the transaction and fee amounts for each account

The objective is to update the daily summary table for a day from the details in the transaction table. The day chosen was the 11th day in the month. The full DDL for the tables is available for download but the basic column structure of the tables is:

CREATE TABLE transactions
   (trans_id        NUMBER(10)   NOT NULL
   ,account         NUMBER(6)    NOT NULL
   ,trans_datetime  DATE         NOT NULL
   ,trans_type      VARCHAR2(30) NOT NULL
   ,trans_amount    NUMBER(9,2)  NOT NULL)

CREATE TABLE account_daily_summary
   (account      NUMBER(6)    NOT NULL
   ,summary_date DATE         NOT NULL
   ,trans_amount NUMBER(9,2)
   ,fee_amount   NUMBER(9,2))

We’ll be summing up the entries in TRANSACTIONS for each account to update the corresponding entry in the ACCOUNT_DAILY_SUMARY table.

Someone new to database systems would turn to a row by row based approach, e.g:

DECLARE
   CURSOR acct_dtls_cur 
   IS
   SELECT t.account
   ,      TRUNC(t.trans_datetime) AS summary_date
   ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
   ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
   FROM   transactions t
   WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
   AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
   GROUP  BY
          t.account
   ,      TRUNC(t.trans_datetime);
BEGIN
   FOR acct_dtls_rec IN acct_dtls_cur
   LOOP
      UPDATE account_daily_summary
      SET    trans_amount = acct_dtls_rec.trans_amount
      ,      fee_amount   = acct_dtls_rec.fee_amount
      WHERE  account      = acct_dtls_rec.account
      AND    summary_date = acct_dtls_rec.summary_date;
   END LOOP;
END;

Here we’ve opened a cursor that gives us the details in the destination table that we need to update, retrieve the rows from that cursor and update the destination table one row at a time. Row by row type processing like this is notoriously resource consuming and slow. Instead we want to avoid row by row processing and instead do everything in a single SQL UPDATE statement. Here’s one way that we might do that:

UPDATE account_daily_summary ads
SET    ads.trans_amount = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'TRANSACTION')
,       ads.fee_amount  = (SELECT SUM(t.trans_amount)
                           FROM   transactions t
                           WHERE  t.account = ads.account
                           AND    t.trans_datetime >= ads.summary_date
                           AND    t.trans_datetime <  ads.summary_date + 1
                           AND    t.trans_type = 'FEE')                       
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE) + 10

Ummm… now that’s looking quite messy. We have 3 sub-queries that are very similar to each other. Each attribute in the destination table that we’re updating has its own correlated sub-query to obtain the value and the UPDATE itself has a sub-query to identify the set of accounts that need updating.

Running the above through Autotrace we get the following execution plan and statistics:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|*  7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
|   9 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|* 10 |    FILTER                      |                       |       |       |            |          |
|* 11 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     1 |    22 |     7   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   7 - filter("T"."TRANS_TYPE"='TRANSACTION')
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)
  10 - filter(:B1+1>:B2)
  11 - filter("T"."TRANS_TYPE"='FEE')
  12 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       6306  consistent gets
          0  physical reads
      24860  redo size
       1138  bytes sent via SQL*Net to client
       2292  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

We can see that each sub-query is being executed separately. The last two on the plan, appearing on lines 7/8 and 11/12 are being executed for each account. This is reflected in our statistics with 6,306 consistent gets (consistent mode logical IO) and 202 db block gets (current mode logical IO).

Can we simplify the query? Well, one way is to update both columns together as a set. I’ve previously discussed set-based operations in relation to queries but didn’t touch on UPDATE statements. This is how we would do such an update:

UPDATE account_daily_summary ads
SET    (ads.trans_amount,ads.fee_amount) 
     = (SELECT SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END)
        ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END)
        FROM   transactions t
        WHERE  t.account = ads.account
        AND    t.trans_datetime >= ads.summary_date
        AND    t.trans_datetime <  ads.summary_date + 1)
WHERE  ads.account IN (SELECT t.account
                       FROM   transactions t
                       WHERE  t.trans_datetime >= ads.summary_date
                       AND    t.trans_datetime <  ads.summary_date + 1)
AND    ads.summary_date = TRUNC(SYSDATE,'YY') + 10

Updating both destination columns has certainly simplified the UPDATE statement. We now have just 2 sub-queries; one to identify the account population and another to find the values of the columns we’re updating. So how does it perform? Autotrace once again shows us:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT               |                       |     1 |    26 |    26   (8)| 00:00:01 |
|   1 |  UPDATE                        | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   NESTED LOOPS SEMI            |                       |     1 |    26 |    10   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL           | ACCOUNT_DAILY_SUMMARY |     5 |    75 |     5   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN            | TRANSACTIONS_IX1      |  3024 | 33264 |     1   (0)| 00:00:01 |
|   5 |   SORT AGGREGATE               |                       |     1 |    22 |            |          |
|*  6 |    FILTER                      |                       |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID| TRANSACTIONS          |     2 |    44 |     7   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN          | TRANSACTIONS_IX1      |     4 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("ADS"."SUMMARY_DATE"=TRUNC(SYSDATE@!,'fmyy')+10 AND
              INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1>TRUNC(SYSDATE@!,'fmyy')+10)
   4 - access("ADS"."ACCOUNT"="T"."ACCOUNT" AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10 AND
              "T"."TRANS_DATETIME"<INTERNAL_FUNCTION("ADS"."SUMMARY_DATE")+1)
       filter("T"."TRANS_DATETIME">="ADS"."SUMMARY_DATE")
   6 - filter(:B1+1>:B2)
   8 - access("T"."ACCOUNT"=:B1 AND "T"."TRANS_DATETIME">=:B2 AND "T"."TRANS_DATETIME"<:B3+1)


Statistics
----------------------------------------------------------
          0  recursive calls
        202  db block gets
       3211  consistent gets
          0  physical reads
      24900  redo size
       1138  bytes sent via SQL*Net to client
       1937  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        100  rows processed

As might have been expected since we’ve reduced the number of sub-queries we have reduced the number of operations in the execution plan. This in turn has reduced the resource consumption as shown by the consistent gets metric, which has dropped from 6,242 to 3,211. db block gets remains constant at 202.

The key cost in the above query is the way we’re performing an index looking into the TRANSACTION table for every account to work out the destination values, i.e. the correlated sub-query in the SET part of the UPDATE is being executed multiple times. Would it not be more efficient to identify the account and obtain the update values at the same time in a single operation and use those values to go update the summary table? This type of operation is possible by using the MERGE statement.

MERGE is used to perform what is frequently called an “UPSERT”; update the destination if a row already exists, otherwise insert into the destination. What can be overlooked with MERGE is that both the update or insert parts do not have to be specified, i.e. MERGE can be used to perform just an update operation. Rewriting our UPDATE as a MERGE we get:

MERGE INTO account_daily_summary dest
USING (SELECT t.account
       ,      TRUNC(t.trans_datetime) AS summary_date
       ,      SUM(CASE WHEN t.trans_type='TRANSACTION' THEN t.trans_amount END) AS trans_amount
       ,      SUM(CASE WHEN t.trans_type='FEE'         THEN t.trans_amount END) AS fee_amount
       FROM   transactions t
       WHERE  t.trans_datetime >= TRUNC(SYSDATE,'YY') + 10
       AND    t.trans_datetime <  TRUNC(SYSDATE,'YY') + 11
       GROUP  BY
              t.account
       ,      TRUNC(t.trans_datetime)) src
ON (src.account = dest.account AND src.summary_date = dest.summary_date)
WHEN MATCHED THEN UPDATE
SET dest.trans_amount = src.trans_amount
,   dest.fee_amount   = src.fee_amount

The USING clause contains the SELECT statement that defines the accounts that need updating as well as the values required. No more correlated sub-queries so the entire statement is quite clear and concise. Running the MERGE through Autotrace we get the following execution plan:

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |                       |  2882 | 86460 |   117   (4)| 00:00:02 |
|   1 |  MERGE                  | ACCOUNT_DAILY_SUMMARY |       |       |            |          |
|   2 |   VIEW                  |                       |       |       |            |          |
|*  3 |    HASH JOIN            |                       |  2882 |   168K|   117   (4)| 00:00:02 |
|   4 |     TABLE ACCESS FULL   | ACCOUNT_DAILY_SUMMARY |  3100 | 46500 |     5   (0)| 00:00:01 |
|   5 |     VIEW                |                       |  2882 |   126K|   112   (4)| 00:00:02 |
|   6 |      SORT GROUP BY      |                       |  2882 | 63404 |   112   (4)| 00:00:02 |
|*  7 |       FILTER            |                       |       |       |            |          |
|*  8 |        TABLE ACCESS FULL| TRANSACTIONS          |  2882 | 63404 |   111   (3)| 00:00:02 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("SRC"."ACCOUNT"="DEST"."ACCOUNT" AND
              "DEST"."SUMMARY_DATE"=INTERNAL_FUNCTION("SRC"."SUMMARY_DATE"))
   7 - filter(TRUNC(SYSDATE@!,'fmyy')+11>TRUNC(SYSDATE@!,'fmyy')+10)
   8 - filter("T"."TRANS_DATETIME"<TRUNC(SYSDATE@!,'fmyy')+11 AND
              "T"."TRANS_DATETIME">=TRUNC(SYSDATE@!,'fmyy')+10)


Statistics
----------------------------------------------------------
          0  recursive calls
        102  db block gets
        410  consistent gets
          0  physical reads
      24860  redo size
       1137  bytes sent via SQL*Net to client
       1963  bytes received via SQL*Net from client
          4  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
        100  rows processed

The execution plan shows us that Oracle has done a single join between our tables and merged the results back into the summary table. To do this it elected to use a hash join but it could easily have chosen a nest loops operation if that were more efficient. The outcome of this plan is shown in the I/O metrics; 410 consistent gets and 102 db block gets, which is far more efficient than the previous approaches.


Download the set-up script for the tables and queries used in this post.

DBMS_LOCK

Concurrent activity in a complex system is, well… complex. Oracle’s standard locking and blocking mechanisms do an admirable job in general but there will be times when more careful control over processes is required. The package DBMS_LOCK exposes Oracle’s internal locking mechanism specifically for this purpose. According to the DBMS_LOCK documentation, user locks can be used for:

  • Providing exclusive access to a device, such as a terminal
  • Providing application-level enforcement of read locks
  • Detecting when a lock is released and cleanup after the application
  • Synchronizing applications and enforcing sequential processing

The last point is where DBMS_LOCK is frequently useful. Systems often have critical background processes that are required to be run serially. To ensure multiple instances of the job are not executed at once it’s quite simple to use DBMS_LOCK to take an exclusive lock and thereby signal to any other processes that the job is in-flight. Essentially this can be coded as:

DECLARE
   -- the number of seconds before the lock request time outs
   c_request_timeout       CONSTANT NUMBER(4) := 5;
   
   -- constants for lock request responses
   -- from dbms_lock documentation
   c_response_success      CONSTANT NUMBER := 0;
   c_response_timeout      CONSTANT NUMBER := 1;
   c_response_deadlock     CONSTANT NUMBER := 2;
   c_response_param_err    CONSTANT NUMBER := 3;
   c_response_own_lock     CONSTANT NUMBER := 4;
   c_response_illegal_lock CONSTANT NUMBER := 5;

   -- the unique name of the lock identifier
   l_job_identifier        VARCHAR2(128) := 'job_101';
   
   l_lock_handle           VARCHAR2(128);
   l_request_response      INTEGER;
   l_release_response      INTEGER;
BEGIN

   -- obtain a lock handle from our job identifier string 
   dbms_lock.allocate_unique (lockname        => l_job_identifier
                             ,lockhandle      => l_lock_handle
                             ,expiration_secs => 864000);

   -- now take out a an exclusive lock using the lock handle
   l_request_response := dbms_lock.request (lockhandle        => l_lock_handle
                                           ,lockmode          => dbms_lock.x_mode
                                           ,timeout           => c_request_timeout
                                           ,release_on_commit => FALSE);

   IF (l_request_response = c_response_timeout) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Unable to obtain job lock due to time-out. Another process currently holds the lock');
   ELSIF (l_request_response  c_response_success) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Job lock request failed with response of ' || TO_CHAR(l_request_response));
   END IF;
   
   -- the job itself goes here
   
   -- job finished so we'll clean up after ourselves
   l_release_response := dbms_lock.release (lockhandle => l_lock_handle);
   
   IF (l_release_response  c_response_success) THEN
      RAISE_APPLICATION_ERROR (-20000, 'Job lock failed with response of ' || TO_CHAR(l_release_response));
   END IF;
END;

There are a few points to note in the above script:

  • The lock can be set to be released on COMMIT, just like a DML operation row level lock. Otherwise it will be retained until explicitly released or the session terminated.
  • The lock name is global across the entire database. Hence it is important the name chosen can in no way conflict with a request for a totally different purpose. Choose a name that can not be mistaken for any other purpose.
  • Sessions waiting to obtain a lock via DBMS_LOCK.REQUEST wait on the event “enq: UL – contention”
  • The ALLOCATE_UNIQUE assigns a lock handle to the lock name specified. This association will remain in force for a minimum of time as specified by the expiration_secs parameter, which defaults to 864,000 seconds (10 days). After this time Oracle may assign a new lock handle to the lock name.

The last point above bears expansion. DBMS_LOCK.ALLOCATE_UNIQUE sets an expiry time on the lock handle, which defaults to 10 days. If you use the generated lock handle after this period of time then expect trouble. The lock handle will still be a valid handle but if another session calls ALLOCATE_UNIQUE with the same lock name it may be assigned a different handle. You will then end up with multiple sessions all believing they have obtained the lock on the lock name… problems are more than likely to occur.

You can see a listing of the locks allocated by DBMS_LOCK.ALLOCATE_UNIQUE by querying SYS.DBMS_LOCK_ALLOCATED:

SELECT *
FROM    sys.dbms_lock_allocated

AUTOTRACE Problem

I’ve commented before about how AUTOTRACE is a favoured tool for a quick performance check of a query. So, whenever a developer pays an interest in it I’m glad to help where I can.

Recently a developer told me that AUTOTRACE wasn’t working for them. They sent across their script, which enabled AUTOTRACE with:

SET AUTOTRACE TRACEONLY

and the output they were getting. The output was very strange (this is from an 11.2.0.3 database):

Statistics
----------------------------------------------------------
          0  user rollbacks
          0  global enqueue releases
          0  physical read requests optimized
          0  physical write total multi block requests
          0  hot buffers moved to head of LRU
          0  commit wait performed
          0  global undo segment hints helped
          0  global undo segment hints were stale
          0  IMU commits
          0  IMU Flushes
        244  rows processed

Running their script in my own account revealed nothing wrong; the AUTOTRACE output was present and correct. Looking into the roles and privileges on the developer’s account showed nothing amiss either… but their account did show as having an expired password. A change of their password and the problem was resolved.

Why did resetting a password resolve an AUTOTRACE problem? What might not be understood about AUTOTRACE is that it creates a new session in order to do its profiling. With the password expired I guess that AUTOTRACE couldn’t do its usual work and generated the strange output above. It’s not good that it didn’t report some sort of login problem but it didn’t take long to resolve. I’ve simply made a mental note (and a blog post to remind me) to check for expired passwords next time someone tells me that AUTOTRACE isn’t working for them.

CURSOR_SHARING for bulk inserts

You won’t have been developing in Oracle for too long before you find the need to populate data in your application tables. Moving from across the various environments from development to production means that you will want the ability to create data easily so manually typing data using a GUI is out of the question. Besides, how many errors do you think you’ll make trying to type in a list of postcodes and suburbs?

For relatively small amounts of data the simple INSERT statement is more than adequate for loading data. Creating scripts for reference tables can be a tedious though. Thankfully GUI development tools such as Dell (Quest) TOAD and Oracle SQL Developer have the ability to export data as INSERT statements. Once a table has been loaded in the development environment it can be easily exported as INSERTs for inclusion as part of the application deployment scripts.

Things begin to wrong when developers think this a good strategy for large tables. Those same GUI tools that made it easy to export small reference table data as INSERTs also make it easy to export large tables, running into the tens of thousands of rows. While there are better ways to load data it can be too late to do anything about it by the time it reaches the DBA for deployment. Not only can the script take a while to run but it will flood the SGA with one-off SQL statements.

Both problems can be mitigated with the CURSOR_SHARING parameter though. Somewhat simplistically, changing the parameter from the default value of EXACT to FORCE will replace the literal values in the insert statements with bind variables. Since CURSOR_SHARING is a session modifiable setting it can be changed with:

ALTER SESSION SET cursor_sharing = FORCE

So how effective is CURSOR_SHARING? Let’s do a test using 10,000 inserts into an empty table with CURSOR_SHARING set to EXACT, i.e. no forced bind variable substitution. The inserts are all along the lines of:

Insert into INS_TEST (ID,PADDING) values (1,'ZLwgVwYBfDopIlMbvjQjtWNfVfFrKP');

The full scripts can be downloaded using the links at the end of the this post.

SQL>SET ECHO ON
SQL>VARIABLE start_time NUMBER
SQL>ALTER SYSTEM FLUSH SHARED_POOL
  2  /

System altered.

SQL>TRUNCATE TABLE ins_test
  2  /

Table truncated.

SQL>ALTER SESSION SET cursor_sharing = EXACT
  2  /

Session altered.

SQL>EXEC :start_time := dbms_utility.get_cpu_time

PL/SQL procedure successfully completed.

SQL>SET TERMOUT OFF
-- 10,000 insert statements executed here but not shown
SQL>EXEC dbms_output.put_line ('Insert duration (cursor_sharing=exact): ' || TO_CHAR(dbms_utility.get_cpu_time - :start_time))
Insert duration (cursor_sharing=exact): 1288

PL/SQL procedure successfully completed.

So, our inserts too 12.7 seconds, which is quite good for my small notebook albeit equipped with a fast processor and an SSD.

Now let’s see what happens if we set CURSOR_SHARING to FORCE, which substitutes literal values with bind values in our insert statements:

SQL>TRUNCATE TABLE ins_test
  2  /

Table truncated.

SQL>ALTER SESSION SET cursor_sharing = FORCE
  2  /

Session altered.

SQL>EXEC :start_time := dbms_utility.get_cpu_time

PL/SQL procedure successfully completed.

SQL>SET TERMOUT OFF
-- 10,000 insert statements executed here but not shown
SQL>EXEC dbms_output.put_line ('Insert duration (cursor_sharing=force): ' || TO_CHAR(dbms_utility.get_cpu_time - :start_time))
Insert duration (cursor_sharing=force): 626

PL/SQL procedure successfully completed.

SQL>
SQL>ALTER SESSION SET cursor_sharing = EXACT
  2  /

Session altered.

Using FORCE for CURSOR_SHARING halved our execution time; from 12.7 to 6.3 seconds. A very nice benefit from a single line change to the deployment script!

So, what about our SGA? Running the following query after each test:

SELECT AVG(executions)
FROM   v$sql
WHERE  sql_text LIKE 'Insert into INS_TEST%' ESCAPE '\'

gives us a result of 1 when CURSOR_SHARING is EXACT and 10,000 when FORCE, showing the CURSOR_SHARING is behaving exactly as we want.

One last point to note: always change CURSOR_SHARING back to the default value immediately at the end of the script! This is just good practice.


Download the scripts for this post.

Dates without time

Oracle’s DATE data type stores both date and time components. If a date is stored without specifying a time component then the time is implicitly set to 0 hours, 0 minutes and 0 seconds (00:00:00), which is exactly midnight at the start of the date.

Nothing unusual with this but quite regularly a data model will call for dates without any time component. The problem resulting from this scenario is how to ensure that time components other than 00:00:00 don’t manage to find their way into the data. If dates with time manage to find their way into such data then various problems occur. For example, checks for equality will fail, e.g.:

SELECT ...
FROM   ...
WHERE  date_col = TRUNC(SYSDATE)

Range searches will fail to find data on the last date of the range, e.g.:

SELECT ...
FROM   ...
WHERE  date_col BETWEEN TRUNC(SYSDATE)-7 AND TRUNC(SYSDATE)

GROUP BY expressions will report groups down to the specific time, e.g.:

SELECT ...
FROM   ...
GROUP  BY
      date_col

The solution to the problems noted about could be to TRUNC the date whenever the date column is referenced, e.g.:

SELECT ...
FROM   ...
GROUP  BY
       TRUNC(date_col)

and this is what regularly occurs in applications I have had to deal with. The line of thinking here is clearly “Since I’m not sure if a date with a time component has managed to get into the system, I’ll ensure I only use the date part of the DATE column”. Unfortunately all those TRUNC functions are overhead and can easily to lead of suboptimal query execution plans.

Other systems have taken the approach of trying to prevent time components being stored by removing the time components from the dates when storing the data, e.g.:

INSERT INTO ...
VALUES (...
       ,TRUNC(some_date_parameter)
       ...)

Clearly this is the better approach than storing the time component and trying to deal with dates with time. However the problem still remains that a rouge entry can make its way into the data. Perhaps an infrequently used update routine is missing the TRUNC or perhaps some data patch simply used SYSDATE instead of TRUNC(SYSDATE). At the end of the day you’re still not 100% certain that all the entries are “timeless”. Once doubt sets in then you’ll find TRUNCs cropping up into the SELECTs and it’s all downhill from there…

The proper solution to this problem is, like all good solutions should be, quite simple. Since Oracle does not provide a date only data type create one! This is done by adding a CHECK constraint:

ALTER TABLE ...
ADD CONSTRAINT date_col_chk
CHECK ( date_col = TRUNC(date_col) )

With the above constraint in force you can rest in the comfort of knowing that all the dates in the column are guaranteed have a time component of 00:00:00. No need for any messy TRUNC functions littered throughout the code. DML operations trying to sneak time components onto the dates will be rejected with an ORA-02290 exception.