Existence checks using COUNT(*)

One thing guaranteed to draw my ire in a code review is the use of COUNT(*) to test for the existence of data. This can be done in a number of ways. For example, a simple count via a SELECT statement and then checking if the result was 1 or greater:

...
SELECT COUNT(*)
INTO   l_rows_exist
FROM   some_table
WHERE  ... blah,blah,blah;

IF (l_rows_exist >= 1) THEN
   ... blah,blah,blah;
END IF;

or as some sort of logical sub-query construct inside a query:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah) >= 1

Unless there's some constraint involved that will guarantee that the query will return at the most one row counting all entries to simply detect if one entry exists is asking for trouble. Oracle will cheerfully table scan a billion row table looking for all rows that satisfy the query even after it found a matching entry in the first block examined.

Let's see that in action using a 1,000,000 row table, using AUTOTRACE to obtain some statistics:

SQL>CREATE TABLE lots_of_rows
  2     (id      NUMBER(10) NOT NULL
  3     ,cat     NUMBER(2)  NOT NULL
  4     ,padding CHAR(30)   NOT NULL)
  5  /

Table created.

SQL>-- generate 1,000,000 rows
SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,100)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) >= 1
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 3376271402

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |     1 |       |  1652   (1)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   FAST DUAL         |              |     1 |       |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |              |     1 |     3 |            |          |
|*  4 |    TABLE ACCESS FULL| LOTS_OF_ROWS | 10000 | 30000 |  1650   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( (SELECT COUNT(*) FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42)>=1)
   4 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6069  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The execution plan above clearly shows a full table scan and the consistent gets statistic would confirm that the entire table was indeed scanned. The solution to this situation is trivial: rewrite the query to use EXISTS. In our example above, this would be to use:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  EXISTS (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah)

The impact of this simple change can be seen using AUTOTRACE:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  EXISTS(SELECT 'X'
  4                FROM   lots_of_rows
  5                WHERE  cat = 42)
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

While the execution plan still shows a full table scan the consistent gets has dropped to just 3. This would have been the number that Oracle performed before it found the fist "42" entry and stopped scanning the table. The query also has be advantage of being better documented as EXISTS describes perfectly what if being sought.

However, perhaps my ire at the use of COUNT(*) is somewhat misplaced. The Oracle optimiser is capable of detecting subqueries in the form of:

(SELECT COUNT(*) FROM ....) > 0

and automatically translate it into the equivalent EXISTS form:

EXISTS (SELECT 'X' FROM ....)

We can see this in action using our million row table:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) > 0
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So, while we wrote the query to use SELECT COUNT(*) the predicate information section in the above AUTOTRACE output shows that Oracle actually used EXISTS. Note, this rewrite only occurs if the test is "> 0". The rewrite will not take place using a test of ">= 1", which is what the original example above used.

I'm not sure what version of Oracle that this automatic rewrite appeared. It certainly works in 11.2 and above. It's a nice feature but I would still argue that it would not be good practice to rely of this kind of thing and to explicitly write the query using EXISTS.

Oracle performance problem case study 2

I had just settled down for a bite to eat for lunch when two developers approached me with rather guilty expressions. Clearly something was up and, given that there had been a significant deployment for the part of the system that they were responsible for the night before, I immediately suspected a production performance problem.

This was confirmed when they told me that one of the production processes was taking far too long; 20 to 30 seconds instead of sub-second. Their suspicion was that they had some sort of query performance problem and they needed help to identify and resolve it.

They gave me the name of the service and the two PL/SQL routines that were taking too long. Each routine invoked multiple SQL statements. Once again I turned to Active Session History, which tends to be my first tool of choice for this type of problem. I ran a query similar to the following to see what long running statements had originated from the service for the past hour:

SELECT session_id
,      session_serial#
,      program
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
,      COUNT(*) AS duration_secs
,      MIN(sample_time) AS start_exec
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    session_type = 'FOREGROUND'
AND    program = :service_name
GROUP  BY
       program
,      session_id
,      session_serial#
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
ORDER  BY
       COUNT(*) DESC

The query result showed a lot of 2 and 3 second queries with just two SQL_IDs listed. Taking these two SQL_IDs, I ran a query like:

SELECT *
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sql_id    = :sql_id
ORDER  BY
       sample_time

The event details showed that the queries were spending their time on CPU so no contention or other wait problems that would point to some system-wide issue. More telling though was the SQL_PLAN_OPERATION and SQL_PLAN_OPTIONS. These details showed full table scans were being done.

With this information it was time to find out where the SQL statements were and where they resided, using the following query:

SELECT s.sql_text
,      (SELECT o.object_name FROM dba_objects o WHERE s.program_id = o.object_id) AS program_name
,      s.program_line#
FROM   v$sql
WHERE  sql_id = :sql_id

and what their execution plans were:

SELECT *
FROM   TABLE(dbms_xplan.display_cursor(:sql_id,:sql_child_number))

As it turned out both SQL statements were relatively simple SELECTs along the lines of:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    (:param1 IS NULL OR t1.p1 = :param1)
AND    (:param2 IS NULL OR t2.p2 = :param2)

The tables had indexes on the ID, P1 and P2 columns. While there were two variables used by the queries the developers explained that only one would be specified and the other would be NULL. The execution plan Oracle had chosen, running under version 11.2.0.4, was to table scan both tables and hash join them.

The queries were rather nice examples of a developer trying to do too much in a single query and performance suffering as a result. Each query were designed to satisfy two distinct scenarios. While those two scenarios result in very similar queries it had forced Oracle into a position where it has no choice but to scan the tables involved. The developer had hoped that Oracle would choose the relevant index, on P1 or P2, depending on what parameters were provided, i.e. Oracle would separate out the two different scenarios at run time.

The solution to this problem was to rewrite the queries separating out the different scenarios. One way was to use a UNION query:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t1.p1 = :param1
UNION
SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t2.p2 = :param2

In the above whichever parameter is NULL will result in no rows being returned from that part of the query and so the result will only be from the part where the parameters is specified. Best of all Oracle would be able to optimise the query to use indexes for each part and so the performance problem would be overcome.

Once I pointed out the problem to the developers they elected to go with a slightly different solution though. The query was SELECT INTO query contained within the PL/SQL package so they simply split it using conditional logic:

IF l_param1 IS NOT NULL) THEN
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t1.p1 = l_param1;
ELSE
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t2.p2 = l_param2;
END IF;   

The change was a simple rewrite to the two problematic routines. As the developers had heeded advice about avoiding session state the change was easily deployed into the production system without requiring an outage. Performance for both routines dropped from 2 to 3 seconds to around 1 millisecond.

I went back to lunch...

Oracle performance problem case study 1

I am regularly asked how to go about diagnosing and troubleshooting Oracle performance problems, which is then followed up with “how did you work that out?” after the problem has been resolved. These are the very same questions I asked when I started out working with Oracle. As a result I thought I’d start making posts outlining particular problems I encounter, the steps taken to diagnose cause of the problem and the solution. “Oracle Voodoo” is the term the developers use where I work as, when first introduced to Oracle, they lacked understanding of how the database worked and so, jokingly, equated performance troubleshooting to magic.

The following problem started with a call from another developer about “a service that’s timing out”. The key bits of information provided were the name of the service, the business function it was trying to perform and a relatively small time period when the problem had occurred. The time-out for the service was 60 seconds. Since the business function was to simply retrieve some data from the database it was assumed that the problem lay inside Oracle as opposed to the services layer or network.

Given the above information I turned to Active Session History to help me out. The one second sampling used by ASH made it a perfect source of data for this type of problem. The system is well instrumented with the Client Identifier set for individual calls, which makes tracing calls by a service easy even when connection pooling is used. Of course, in order to use Active Session History you must be licensed for it.

I ran a query like the following to determine if there were any long running SQL statements around the time of the reported problem:

SELECT session_id
,      session_serial#
,      program
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
,      COUNT(*) AS duration_secs
,      MIN(sample_time) AS start_exec
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    session_type = 'FOREGROUND'
GROUP  BY
program
,      session_id
,      session_serial#
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
ORDER  BY
       COUNT(*) DESC

As the above query groups by SQL_ID and SQL_EXEC_ID it reports the duration of each query invocation. As luck would have it the query did highlight sessions originating from the problem service that exceeded 60 seconds.

Taking one of the sessions I examined the ASH details for the long running query:

SELECT *
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    client_id = :client_id
AND    sql_id    = :sql_id
ORDER  BY
       sample_time

The EVENT details reported by the above query showed no contention, just normal IO operations. So we had a query that was running smoothly but waaaaay too long. My next step was to look at the SQL itself and where it originated from:

SELECT s.sql_text
,      (SELECT o.object_name FROM dba_objects o WHERE s.program_id = o.object_id) AS program_name
,      s.program_line#
FROM   v$sql
WHERE  sql_id = :sql_id

All SQL statements in the application are contained within packages so the above query identifies which package and the line number where it resides. Normally in a situation like this I would also have looked at the query execution plan and possibly compared it to the plans stored in AWR to see if there had been a plan change. However in this situation I was aware that a table referenced by the query had recently been partitioned. A quick check showed that the query had not been modified to correctly reference the partitioning key of the table and was therefore performing an index lookup against all the partitions of a local index. Problem identified.

However a patch was required to the code and the query resided in a package that was pivotal to the operation of the system. The system was mission critical to the business and could not be brought down. Thankfully attention had been given to avoiding package state so, after fixing the query and running a quick test of the patched code in a test environment, a new version of the package was deployed straight into the live system without causing any trouble.

Problem solved. From taking the call to deploying the new code took just over 30 minutes, and half of that time was filling in the appropriate change control "paperwork".

Qualified naming… again

Back in December 2012 I blogged about a range of problems that can occur if columns referenced in queries are not qualified with the name or alias of the table that they belong to. This post is mostly a rehash of just one of those scenarios as it seems to crop up time and time again…

Suppose you are presented with the following:

DELETE
FROM   emp
WHERE  dept_id IN (SELECT dept_id FROM dept WHERE dept_name = 'Dept_5')

You would look at the above statement and think that it will do something along the lines of "delete the entries in EMP where they relate to a DEPT entry with the name of 'Dept_5'". You might be right... but if the tables were defined as follows you would be wrong:

CREATE TABLE dept
   (id         NUMBER (4) NOT NULL   
   ,dept_name  VARCHAR2 (30) NOT NULL)
/

ALTER TABLE dept
   ADD CONSTRAINT dept_pk
   PRIMARY KEY (id)
/

CREATE TABLE emp
   (id         NUMBER (4)     NOT NULL
   ,first_name VARCHAR2 (30)  NOT NULL
   ,last_name  VARCHAR2 (30)  NOT NULL
   ,dept_id    NUMBER (4)     NOT NULL)
/
   
ALTER TABLE emp
   ADD CONSTRAINT  emp_pk
   PRIMARY KEY (id)
/

ALTER TABLE emp
   ADD CONSTRAINT emp_fk1
   FOREIGN KEY (dept_id)
   REFERENCES dept (id)
/

If we were to populate these tables with 10 DEPT entries and 100 EMP entries and run our delete:

SQL>INSERT INTO dept
  2  SELECT ROWNUM
  3  ,      'Dept_' || TO_CHAR(ROWNUM)
  4  FROM   dual
  5  CONNECT BY ROWNUM <= 10
  6  /

10 rows created.

SQL>INSERT INTO emp
  2  SELECT ROWNUM
  3  ,      'E_' || TO_CHAR(ROWNUM)
  4  ,      'Employee_' || TO_CHAR(ROWNUM)
  5  ,      MOD(ROWNUM,10)+1
  6  FROM   dual
  7  CONNECT BY ROWNUM <= 100
  8  /

100 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>DELETE
  2  FROM   emp
  3  WHERE  dept_id IN (SELECT dept_id
  4                     FROM   dept
  5                     WHERE  dept_name = 'Dept_5')
  6  /

100 rows deleted.

SQL>SELECT *
  2  FROM   emp
  3  /

no rows selected

Ummm... where's my data gone?

Let's rewrite the query to show what really happened:

DELETE
FROM   emp e
WHERE  e.dept_id IN (SELECT e.dept_id 
                     FROM   dept d
                     WHERE  d.dept_name = 'Dept_5')

Now we can see what went wrong. The DEPT tables does not contain a column named DEPT_ID so Oracle has referenced the column in the EMP table. It can do this as identifiers in an outer query are visible inside a sub-query. Hence when Oracle is unable to find an identifier local to the sub-query then it will expand its search to the outer query.

Our delete query therefore will delete any EMP entry where the DEPT_ID is not null if there exists an entry in DEPT with a DEPT_NAME of 'Dept_5'. Since DEPT_ID is defined as NOT NULL and there does exist a DEPT entry named 'Dept_5' we end up deleting all entries from EMP. A rather nasty outcome.

The best way to prevent this type of a problem is to always qualify column names with the table name or table alias. If we had done that with our DELETE query we would have got the following:

SQL>DELETE
  2  FROM   emp e
  3  WHERE  e.dept_id IN (SELECT d.dept_id
  4                       FROM   dept d
  5                       WHERE  d.dept_name = 'Dept_5')
  6  /
WHERE  e.dept_id IN (SELECT d.dept_id
                            *
ERROR at line 3:
ORA-00904: "D"."DEPT_ID": invalid identifier

Our qualified reference would have resulted in an error. Even better, if the DELETE was included as part of a package then the package would have thrown a compilation error.

Echoing my advice from the previous post: always use qualified references in your queries. The effort is negligible so there's no excuse not to.

Multi-versioning Read Consistency

Data concurrency is difficult. Oracle does such a good job at hiding the complexity that many developers simply don’t consider it. I’m going to focus this post on one of the aspects of Oracle’s multi-versioning read consistency that new developers really should know about.

You’re probably aware that Oracle does not allow the reading of dirty data, i.e. data that has been modified by another session but that has not been committed. This allows a session to keep its changes private until it decides to commit or rollback those changes. On commit then all the changes are immediately available for other sessions to see. In light of this, ever considered what happens when a long running query accesses data that is being modified by other sessions, for example session 1 runs a simple SELECT query that takes 1 minute to complete while at the same time session 2 is modifying and committing changes to the data? Unlike some other database systems, readers and writers don’t block each other in Oracle so what will the query output for session 1 look like?

The answer is surprisingly simple: a query runs using a view of the data that was in effect as at the time the query started. Changes made to the data for the duration of the operation are ignored. In Oracle this is termed multi-versioning read consistency. The simplest and clearest way of illustrating this that I’ve seen is contained in Tom Kyte’s excellent book Expert Oracle Database Architecture:

SQL>CREATE TABLE t1
  2     (col_x   NUMBER(10) NOT NULL)
  3  /

Table created.

SQL>INSERT INTO t1
  2  SELECT ROWNUM
  3  FROM   dual
  4  CONNECT BY ROWNUM <= 5
  5  /

5 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>VARIABLE rc REFCURSOR

SQL>BEGIN
  2     OPEN :rc
  3     FOR
  4     SELECT *
  5     FROM   t1;
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL>DECLARE
  2     PRAGMA AUTONOMOUS_TRANSACTION;
  3  BEGIN
  4     DELETE
  5     FROM   t1;
  6     COMMIT;
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL>SELECT *
  2  FROM   t1
  3  /

no rows selected

SQL>PRINT rc

     COL_X
----------
         1
         2
         3
         4
         5

In the above we create a table and populate it with 5 rows. Next we open a cursor that simply selects the data from the table. In an autonomous transaction, which simulates another session, we delete the contents from the table and commit that change. We can see that the table has no rows in it by selecting from the table. However, when we finally get around to pulling the details from the cursor we opened before the deletion we see that all the data. So, regardless of the changes that occurred to the table the cursor displays data as at the point in time that it was opened.

Multi-versioning read consistency is also the reason why the following doesn't result in an infinite loop (using the same T1 table, repopulated with 5 rows):

SQL>BEGIN
  2     FOR i IN (SELECT col_x FROM t1)
  3     LOOP
  4         dbms_output.put_line ('Inserting value ' || TO_CHAR(i.col_x));
  5        INSERT INTO t1 VALUES (i.col_x);
  6     END LOOP;
  7  END;
  8  /
Inserting value 1
Inserting value 2
Inserting value 3
Inserting value 4
Inserting value 5

PL/SQL procedure successfully completed.

Here we're inserting into the same table that our cursor is reading from. If the data from the cursor was not fixed at the point when it was opened then the rows we're inserting would be read by the cursor and our loop would never end... well, not until some resource on the server gives out.

One thing to note with multi-versioning read consistency is that it only applies to a single SQL statement and not an entire transaction of multiple statements. So, in the scenario of two SELECT statements, one reading parent entries and one child entries, then the second SELECT may return data for child entries for which the first SELECT did not return the parent entry. This would occur if another session inserted and committed parent/child entries after the parent table SELECT is executed but before the child table SELECT commenced.

It is also worth noting that the read consistency does not extend to SQL statements contained in PL/SQL routines invoked by an SQL statement. To demonstrate this we'll reuse our T1 table from above and also create another table, T2, with a single column and row. On insert into T1 we'll run a autonomous transaction trigger that will increment the value in T2. We'll also create a function that reads the T2 value and use that to insert into T1:

SQL>CREATE TABLE t2
  2     (col_y  NUMBER(10))
  3  /

Table created.

SQL>INSERT INTO t2
  2  VALUES (1)
  3  /

1 row created.

SQL>COMMIT
  2  /

Commit complete.

SQL>CREATE OR REPLACE TRIGGER trg_t1
  2     BEFORE INSERT ON t1
  3     FOR EACH ROW
  4  DECLARE
  5     PRAGMA AUTONOMOUS_TRANSACTION;
  6  BEGIN
  7     UPDATE t2
  8     SET col_y = col_y + 1;
  9     COMMIT;
 10  END trg_t1;
 11  /

Trigger created.

SQL>CREATE OR REPLACE FUNCTION fnc_1
  2  RETURN NUMBER
  3  AS
  4     l_return_value NUMBER;
  5  BEGIN
  6     SELECT col_y
  7     INTO   l_return_value
  8     FROM   t2;
  9     RETURN l_return_value;
 10  END fnc_1;
 11  /

Function created.

SQL>INSERT INTO t1
  2  SELECT fnc_1
  3  FROM   dual
  4  CONNECT BY ROWNUM <= 5
  5  /

5 rows created.

SQL>SELECT *
  2  FROM   t1
  3  /

     COL_X
----------
         1
         2
         3
         4
         5

SQL>SELECT *
  2  FROM   t2
  3  /

     COL_Y
----------
         6

Here can see that the values in T1 reflect the value stored in T2 for each row inserted into T1, i.e. the value returned by the SQL SELECT in function FNC_1 was not locked to what the data looked like when the overall INSERT was executed. Developers who like to call out to PL/SQL functions inside SQL are likely to be causing themselves concurrency headaches.

DBMS_LOCK revisited

In my earlier post regarding DBMS_LOCK I failed to note an important point about the ALLOCATE_UNIQUE routine; this routine commits the current transaction:

SQL>CREATE TABLE t1 (id NUMBER)
  2  /

Table created.

SQL>INSERT INTO t1 VALUES (1)
  2  /

1 row created.

SQL>DECLARE
  2     l_job_identifier        VARCHAR2(128) := 'job_101';
  3     l_lock_handle           VARCHAR2(128);
  4  BEGIN
  5     dbms_lock.allocate_unique (lockname        => l_job_identifier
  6                               ,lockhandle      => l_lock_handle
  7                               ,expiration_secs => 864000);
  8  END;
  9  /

PL/SQL procedure successfully completed.

SQL>ROLLBACK
  2  /

Rollback complete.

SQL>SELECT *
  2  FROM   t1
  3  /

        ID
----------
         1

The solution to this problem is to simply wrap the call to ALLOCATE_UNIQUE in an autonomous transaction:

SQL>CREATE TABLE t1 (id NUMBER)
  2  /

Table created.

SQL>INSERT INTO t1 VALUES (1)
  2  /

1 row created.

SQL>DECLARE
  2     l_job_identifier        VARCHAR2(128) := 'job_101';
  3     l_lock_handle           VARCHAR2(128);
  4
  5     PROCEDURE alloc_unique (p_lockname        IN  VARCHAR2
  6                            ,p_lockhandle      OUT VARCHAR2
  7                            ,p_expiration_secs IN  NUMBER DEFAULT 864000)
  8     AS
  9        PRAGMA AUTONOMOUS_TRANSACTION;
 10     BEGIN
 11        dbms_lock.allocate_unique (lockname        => p_lockname
 12                                  ,lockhandle      => p_lockhandle
 13                                  ,expiration_secs => p_expiration_secs);
 14        COMMIT;
 15     END alloc_unique;
 16  BEGIN
 17     alloc_unique (p_lockname        => l_job_identifier
 18                  ,p_lockhandle      => l_lock_handle
 19                  ,p_expiration_secs => 864000);
 20  END;
 21  /

PL/SQL procedure successfully completed.

SQL>ROLLBACK
  2  /

Rollback complete.

SQL>SELECT *
  2  FROM   t1
  3  /

no rows selected

Increasing Row Length and PCTFREE

In the last post we looked at how adding a new column can cause row migration and its impact this can have on our queries. In that post I noted that row migration may occur through normal DML so this post looks at that scenario and what we can do to manage it.

A relatively common scenario with table data is that a row is initially created with only a subset of the columns populated. Over time the rest of the columns are then populated with values as the data becomes available. Tables that follow this pattern are prone to suffer from migrated rows much like tables that get new columns added to them. The solution to this problem is different though as we know in advance the columns that exist in the table and can plan accordingly.

We can illustrate the problem using the same table from the previous post. Instead of adding a new column we will start off with all columns defined in the table. Our process first populates all columns except the MORE_DATA column and then goes back and adds data to that column:

SQL>CREATE TABLE row_growth
  2     (id        NUMBER (10) NOT NULL
  3     ,val_1     NUMBER(3)   NOT NULL
  4     ,col_1     CHAR(50)    NOT NULL
  5     ,more_data VARCHAR2(30))
  6  /

Table created.

SQL>CREATE INDEX row_growth_pk
  2     ON row_growth (id)
  3  /

Index created.

SQL>CREATE INDEX row_growth_ix1
  2     ON row_growth (val_1)
  3  /

Index created.

SQL>INSERT INTO row_growth
  2     (id, val_1, col_1)
  3  SELECT ROWNUM
  4  ,      MOD(ROWNUM,500)
  5  ,      'X'
  6  FROM   dual
  7  CONNECT BY ROWNUM <= 10000
  8  /

10000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000         92          0

Once again we have a table with 92 block and no migrated rows. It's as if the MORE_DATA column does not exist in the table, which is close to the truth as the column has no data in it. Querying the table we get:

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1240 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1240 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2216  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

Similar to last time our query is consuming 24 logical IOs to return the 20 rows. Let's now populate the MORE_DATA column:

SQL>UPDATE row_growth
  2     SET more_data = RPAD('X',30,'X')
  3  /

10000 rows updated.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        142       2849

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       105

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1720 |    27   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1720 |    27   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         32  consistent gets
          0  physical reads
          0  redo size
       2884  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

SQL>SELECT m.value
  2  FROM   v$mystat m
  3  ,      v$statname s
  4  WHERE  s.statistic# = m.statistic#
  5  AND    s.name = 'table fetch continued row'
  6  /

     VALUE
----------
       113

Exactly as the scenario where we added a new column, we now have 2,849 migrated rows across 142 blocks. The impact to our query if the same; 32 logical IOs, which is an increase of 8 due to the impact of the migrated rows as confirmed by the change in the "table fetch continued row" statistic.

Previously we needed to do an ALTER TABLE MOVE command to resolve this problem but this situation is different. Since we have all our columns determined in advance and we know that we will be populating the MORE_DATA column after the other rows we can tell Oracle to reserve space for it in the data blocks. We do this by setting the PCTFREE value when creating the table.

PCTFREE specifies the amount of space in a block Oracle will leave free when inserting rows into it (not updating existing rows, only inserting new rows). By default PCTFREE is set to 10, meaning that Oracle will stop inserting row into a block if the new row would drop the free space in the black to 10% or less. A quick calculation based on the column lengths and our understanding of the data shows that our initial insert is about 60% of the overall row length so we would need to set PCTFREE to 40 to reserve space for the MORE_DATA column.

Let's repeat our example with PCTFREE set to 40:

SQL>CREATE TABLE row_growth
  2     (id        NUMBER (10) NOT NULL
  3     ,val_1     NUMBER(3)   NOT NULL
  4     ,col_1     CHAR(50)    NOT NULL
  5     ,more_data VARCHAR2(30))
  6  PCTFREE 40
  7  /

Table created.

SQL>CREATE INDEX row_growth_pk
  2     ON row_growth (id)
  3  /

Index created.

SQL>CREATE INDEX row_growth_ix1
  2     ON row_growth (val_1)
  3  /

Index created.

SQL>INSERT INTO row_growth
  2     (id, val_1, col_1)
  3  SELECT ROWNUM
  4  ,      MOD(ROWNUM,500)
  5  ,      'X'
  6  FROM   dual
  7  CONNECT BY ROWNUM <= 10000
  8  /

10000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        137          0

Our table is now consuming 137 blocks after the initial insert of data instead of 92 previously. This is due to each block only being populated up to 60% of available space. So does this impact our query?

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT *
  2  FROM   row_growth
  3  WHERE  val_1 = 10
  4  /

20 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1672052865

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |    20 |  1240 |    21   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ROW_GROWTH     |    20 |  1240 |    21   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ROW_GROWTH_IX1 |    20 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("VAL_1"=10)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         24  consistent gets
          0  physical reads
          0  redo size
       2216  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET AUTOTRACE OFF

We can see that our query is still using 24 logical IOs. So, while the rows of the table a spread across more block we still only need to access the same number for this query. The reason our query did require any extra work was because it is using an index so it is directly accessing the blocks required.

Now let's see what happens when we update the MORE_DATA column:

SQL>UPDATE row_growth
  2     SET more_data = RPAD('X',30,'X')
  3  /

10000 rows updated.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','row_growth')

PL/SQL procedure successfully completed.

SQL>ANALYZE TABLE row_growth COMPUTE STATISTICS
  2  /

Table analyzed.

SQL>SELECT num_rows
  2  ,      blocks
  3  ,      chain_cnt
  4  FROM   user_tables
  5  WHERE  table_name = 'ROW_GROWTH'
  6  /

  NUM_ROWS     BLOCKS  CHAIN_CNT
---------- ---------- ----------
     10000        137          0

We can see from the CHAIN_CNT entry that we don't have any migrated rows. Our estimate of setting PCTFREE to 40 was sufficient to ensure that there was adequate free space to handle the update of the MORE_DATA column. Since none of our rows have been migrated our query will not have been impacted and it will perform the same 24 logical IO operations.

In summary, when designing database tables consideration of the manner in which the rows will be populated and setting the PCTFREE clause accordingly is fairly important. Resolving a problem with migrated rows is an annoyance that can easily be avoided with a small amount of forethought and a quick calculation.