Multi-versioning read consistency… again

Following on from some recent questions this post goes back to basics with regard to what Oracle terms “multi-versioning read consistency”. I have posted on this subject before but as it’s so critical to how applications using Oracle will behave it’s worth another post, with emphasis on a very important point at the end…

“Multi-versioning read consistency” might appear to be just a rather grand sounding name (or another bit of techno-jargon) so here’s a brief outline of what it means:

  • The data reurned by a query is based on what the data in the underlying tables contained when the query commenced
  • By extension of the above point, the query is not impacted by changes to the table data over the time taken to execute and retrieve the query data

It’s time for an example to illustrate this. First we’ll set up a simple table, TAB, and insert a single row:

CREATE TABLE tab
   (id  NUMBER(6) NOT NULL)
/

INSERT INTO tab
VALUES (1);

COMMIT;

Next we’ll open a cursor that returns the data in TAB but we won’t actually retrieve the data yet.

VARIABLE rc1 REFCURSOR

BEGIN
   OPEN :rc1
   FOR
   SELECT *
   FROM   tab
   ORDER  BY id;
END;
/

Let’s head back to our table and do some data changes. We’ll do these changes as autonomous transactions so as to simulate an external process coming in and making these changes, removed from the session with the open cursor:

DECLARE
   PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
   UPDATE tab
   SET id = 2;

   INSERT INTO tab
   VALUES (2);

   COMMIT;
END;
/

… and once again open a new cursor onto our table:

VARIABLE rc2 REFCURSOR

BEGIN
   OPEN :rc2
   FOR
   SELECT *
   FROM   tab
   ORDER  BY id;
END;
/

Lastly, just to reinforce things, we’ll do the whole thing yet again followed by a last update:

DECLARE
   PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
   UPDATE tab
   SET id = 3;

   INSERT INTO tab
   VALUES (3);

   COMMIT;
END;
/

VARIABLE rc3 REFCURSOR

BEGIN
   OPEN :rc3
   FOR
   SELECT *
   FROM   tab
   ORDER  BY id;
END;
/

DECLARE
   PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
   UPDATE tab
   SET id = 4;

   INSERT INTO tab
   VALUES (4);

   COMMIT;
END;
/

At the end of all this we have performed 4 inserts and 3 updates to our data. If we display the contents of our table as it exists at the end of the changes we get:

SQL> SELECT *
  2  FROM   tab
  3  ORDER BY id;

        ID
----------
         4
         4
         4
         4

Now let’s retrieve the data from our cursors and see what they contain:

SQL> PRINT rc1

        ID
----------
         1

SQL> PRINT rc2

        ID
----------
         2
         2

SQL> PRINT rc3

        ID
----------
         3
         3
         3

Even though the data in the table had changed and was committed after we opened the cursors Oracle still returned the data as it was at the point in time we opened the cursor, not what the table contained when we read from the cursor. This is Oracle’s multi-versioning read consistency in action.

Now for that very important point that I mentioned at the start…

The above exercise also highlights another aspect of Oracle:

Readers and writers don’t block each other.

If you look closely at the example, whilst we were holding open cursors to the table we were still able to update existing rows as well as insert new ones. No locks on the table data were being held by the cursors to prevent this… yet the data returned by the cursors were still as the table looked when the cursor was opened. This is a fundamental of how Oracle works. If you want to build a system that is performant under high levels of concurrent activity this is exactly the behaviour you will require.

ORA-00001 unique key violated with MERGE

I arrived at work one morning to find a system generated email telling me that a user session had thrown a unique constraint violation exception. Curious given the code had been running in production for 6 months…

The line number for the PL/SQL package that had thrown the exception showed that it came from a MERGE statement. The fact that it was a MERGE that had thrown an ORA-00001 immediately points to a concurrency issue. At first the developer of the code didn’t understand how a MERGE could throw ORA-00001 until I showed them the following example.

Starting with a simple two column table with 10 rows (running under Oracle 12.1.0.2):

CREATE TABLE merge_test
    (id     NUMBER(6) NOT NULL
    ,val    NUMBER(6) NOT NULL
    ,CONSTRAINT merge_test_pk PRIMARY KEY (id))
/

INSERT INTO merge_test
SELECT ROWNUM
,      ROWNUM
FROM   dual
CONNECT BY ROWNUM <= 10
/

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','merge_test')

we run the following MERGE in two sessions, operating against an existing row in the table:

MERGE INTO merge_test dest
USING (SELECT 5    AS id
       ,      20   AS val
       FROM   dual) src
ON (dest.id = src.id)
WHEN MATCHED THEN UPDATE
SET dest.val = src.val
WHEN NOT MATCHED THEN INSERT
   (id, val)
VALUES
   (src.id, src.val)

The first session immediately reports:

1 row merged.

while the second session, attempting to obtain a lock on the same row, is blocked. Once the first session issues a commit or rollback then the second session also reports 1 row merged.

Now let’s repeat this exercise using an ID value for a row that does not exist in the table:

MERGE INTO merge_test dest
USING (SELECT 20   AS id
       ,      20   AS val
       FROM   dual) src
ON (dest.id = src.id)
WHEN MATCHED THEN UPDATE
SET dest.val = src.val
WHEN NOT MATCHED THEN INSERT
   (id, val)
VALUES
   (src.id, src.val)

Session 1 reports the 1 row merged while session 2, after session 1 commits, reports:

MERGE INTO merge_test dest
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.MERGE_TEST_PK) violated

The reason for this is all related to Oracle’s multi-versioning read consistency model. At the point in time that session 2 executed the MERGE session 1 had not committed the data so session 2 cannot “see” the row that session 1 is about to insert. As a result the MERGE that session 2 executes also attempts to insert a new row. Oracle then blocks session 2 as it detects a uniqueness conflict with the row that session 1 is inserting. Oracle does not immediately report the exception as it needs to wait until session 1 commits or rollbacks its transaction. When session 1 commits the transaction then session 2 throws the ORA-00001 exception. If session 1 had issued a rollback then session 2 would have been permitted to insert the new row.

The solution to this problem is to simply remove the MERGE and replace it with an INSERT/UPDATE combination:

BEGIN
   INSERT INTO merge_test (id, val)
   VALUES (20, 20);
EXCEPTION
   WHEN DUP_VAL_ON_INDEX THEN
      UPDATE merge_test
      SET    val = 20
      WHERE  id = 20;
END;

With this code session 2 will block on its attempt to perform the INSERT. Once session 1 issues a commit then session 2 intercepts the DUP_VAL_ON_INDEX exception that is thrown, which is then caught and the operation is changed to an UPDATE.

Back to the production problem, a scan of the application log files and database entries did indeed show that there were two near simultaneous request that would have resulted in the MERGE being run for the same key, confirming the diagnosis. Another problem solved and, for the developer of the code, another lesson learned.

Package: pkg_delay

Working on a traditional OLTP system means dealing with lots of short transactions. Handling concurrency correctly is a significant concern and application design is tailored to avoid blocking where possible. Having said that, occasionally concurrency problems do arise. Identifying the source of the problem can be tricky but resolution is often simple. The largest problem often lies in the testing of the solution.

In a system where the duration of transactions is measured in milliseconds it can be near impossible to deliberately create the same scenario that gave rise to a concurrency issue. For example, a software tester recently encountered a deadlock whilst running a trivial test one day, unrelated to the functionality that they were testing. After looking into the trace files generated by Oracle it was found that two separate code paths obtained locks on resources in a reverse manner, i.e. one process obtained locks on A and then B, whilst the other did B and then A. This problem had never occurred in production as the probability of it occurring was very, very small… but it had occurred by chance in the test environment. Fixing the code was a trivial task but in order to test that the problem had been resolved we had to slow down the code to widen the window of opportunity for it to occur. This was done by inserting a sleep between where the two locks were obtained. Hence PKG_DELAY was born…

Hard coding a DBMS_LOCK.SLEEP command into the PL/SQL code is a crude but effective way of delaying code execution at some critical point. PKG_DELAY has a couple of advantages over a simple call to DBMS_LOCK.SLEEP:

  • The delay period is controlled via a global context. This permits the delay to be set by a separate session when the test is ready to start, rather than have it always on
  • The delay period can be adjusted at run time, again due to it being controlled via a global context. This has proved useful in tests that have sought to assess the impact of slow processes within the system.
  • Any code calling out to PKG_DELAY will have a dependency on this package. Since PKG_DELAY is never deployed to the production environment, if any test code containing a call out to PKG_DELAY is deployed then it will fail compilation. Not the best scenario perhaps but better than deploying code with a sleep in it…

PKG_DELAY has just two routines:

  • set_delay
    This routine takes a name for a delay and a duration for that delay
  • delay
    Performs a sleep of a duration specified by the set_delay routine for the name specified

So, a delay call might be inserted into code as:

...
pkg_delay.delay ('BEFORE_OPERATION');
...

The code will now run as per usual with virtually no overhead from the call to PKG_DELAY. Once the test is ready to be performed, requiring say a 2 second delay, in a separate session the following is executed:

EXEC pkg_delay.set_delay ('BEFORE_OPERATION',2)

Download PKG_DELAY

Multi-versioning Read Consistency

Data concurrency is difficult. Oracle does such a good job at hiding the complexity that many developers simply don’t consider it. I’m going to focus this post on one of the aspects of Oracle’s multi-versioning read consistency that new developers really should know about.

You’re probably aware that Oracle does not allow the reading of dirty data, i.e. data that has been modified by another session but that has not been committed. This allows a session to keep its changes private until it decides to commit or rollback those changes. On commit then all the changes are immediately available for other sessions to see. In light of this, ever considered what happens when a long running query accesses data that is being modified by other sessions, for example session 1 runs a simple SELECT query that takes 1 minute to complete while at the same time session 2 is modifying and committing changes to the data? Unlike some other database systems, readers and writers don’t block each other in Oracle so what will the query output for session 1 look like?

The answer is surprisingly simple: a query runs using a view of the data that was in effect as at the time the query started. Changes made to the data for the duration of the operation are ignored. In Oracle this is termed multi-versioning read consistency. The simplest and clearest way of illustrating this that I’ve seen is contained in Tom Kyte’s excellent book Expert Oracle Database Architecture:

SQL>CREATE TABLE t1
  2     (col_x   NUMBER(10) NOT NULL)
  3  /

Table created.

SQL>INSERT INTO t1
  2  SELECT ROWNUM
  3  FROM   dual
  4  CONNECT BY ROWNUM <= 5
  5  /

5 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>VARIABLE rc REFCURSOR

SQL>BEGIN
  2     OPEN :rc
  3     FOR
  4     SELECT *
  5     FROM   t1;
  6  END;
  7  /

PL/SQL procedure successfully completed.

SQL>DECLARE
  2     PRAGMA AUTONOMOUS_TRANSACTION;
  3  BEGIN
  4     DELETE
  5     FROM   t1;
  6     COMMIT;
  7  END;
  8  /

PL/SQL procedure successfully completed.

SQL>SELECT *
  2  FROM   t1
  3  /

no rows selected

SQL>PRINT rc

     COL_X
----------
         1
         2
         3
         4
         5

In the above we create a table and populate it with 5 rows. Next we open a cursor that simply selects the data from the table. In an autonomous transaction, which simulates another session, we delete the contents from the table and commit that change. We can see that the table has no rows in it by selecting from the table. However, when we finally get around to pulling the details from the cursor we opened before the deletion we see that all the data. So, regardless of the changes that occurred to the table the cursor displays data as at the point in time that it was opened.

Multi-versioning read consistency is also the reason why the following doesn’t result in an infinite loop (using the same T1 table, repopulated with 5 rows):

SQL>BEGIN
  2     FOR i IN (SELECT col_x FROM t1)
  3     LOOP
  4         dbms_output.put_line ('Inserting value ' || TO_CHAR(i.col_x));
  5        INSERT INTO t1 VALUES (i.col_x);
  6     END LOOP;
  7  END;
  8  /
Inserting value 1
Inserting value 2
Inserting value 3
Inserting value 4
Inserting value 5

PL/SQL procedure successfully completed.

Here we’re inserting into the same table that our cursor is reading from. If the data from the cursor was not fixed at the point when it was opened then the rows we’re inserting would be read by the cursor and our loop would never end… well, not until some resource on the server gives out.

One thing to note with multi-versioning read consistency is that it only applies to a single SQL statement and not an entire transaction of multiple statements. So, in the scenario of two SELECT statements, one reading parent entries and one child entries, then the second SELECT may return data for child entries for which the first SELECT did not return the parent entry. This would occur if another session inserted and committed parent/child entries after the parent table SELECT is executed but before the child table SELECT commenced.

It is also worth noting that the read consistency does not extend to SQL statements contained in PL/SQL routines invoked by an SQL statement. To demonstrate this we’ll reuse our T1 table from above and also create another table, T2, with a single column and row. On insert into T1 we’ll run a autonomous transaction trigger that will increment the value in T2. We’ll also create a function that reads the T2 value and use that to insert into T1:

SQL>CREATE TABLE t2
  2     (col_y  NUMBER(10))
  3  /

Table created.

SQL>INSERT INTO t2
  2  VALUES (1)
  3  /

1 row created.

SQL>COMMIT
  2  /

Commit complete.

SQL>CREATE OR REPLACE TRIGGER trg_t1
  2     BEFORE INSERT ON t1
  3     FOR EACH ROW
  4  DECLARE
  5     PRAGMA AUTONOMOUS_TRANSACTION;
  6  BEGIN
  7     UPDATE t2
  8     SET col_y = col_y + 1;
  9     COMMIT;
 10  END trg_t1;
 11  /

Trigger created.

SQL>CREATE OR REPLACE FUNCTION fnc_1
  2  RETURN NUMBER
  3  AS
  4     l_return_value NUMBER;
  5  BEGIN
  6     SELECT col_y
  7     INTO   l_return_value
  8     FROM   t2;
  9     RETURN l_return_value;
 10  END fnc_1;
 11  /

Function created.

SQL>INSERT INTO t1
  2  SELECT fnc_1
  3  FROM   dual
  4  CONNECT BY ROWNUM <= 5
  5  /

5 rows created.

SQL>SELECT *
  2  FROM   t1
  3  /

     COL_X
----------
         1
         2
         3
         4
         5

SQL>SELECT *
  2  FROM   t2
  3  /

     COL_Y
----------
         6

Here can see that the values in T1 reflect the value stored in T2 for each row inserted into T1, i.e. the value returned by the SQL SELECT in function FNC_1 was not locked to what the data looked like when the overall INSERT was executed. Developers who like to call out to PL/SQL functions inside SQL are likely to be causing themselves concurrency headaches.

SQL blocking with IOT FK parent

I was asked to make a trivial data change to a reference table; update some non-indexed, non-key columns in two rows. While the table is referenced heavily by a core application the table is not subject to change. I scripted up the necessary updates, ran the them, verified the changes and committed them. Given what I know about how Oracle handles locking and blocking I was rather surprised to see a system alert shortly thereafter that a critical process had suffered a delay in its processing.

Diagnosing the cause of the delay was trivial. A quick query against Active Session History showed the critical processes had been waiting on acquiring a row lock and the blocking session was the one that I ran the update within. This puzzled me as the critical process did not modify the table that I had updated. Furthermore, the statement that had been blocked was a MERGE statement on a different table.

As it turned out, the table I had been updating was a indexed organised table (IOT) and the table the critical process was performing the MERGE on had a foreign key to the table. Given that my update had obtained exclusive locks on referenced rows the critical process was unable to obtain the necessary shared locks on those rows and therefore waited for my update to be committed.

We can easily demonstrate this behaviour with a simple test:

CREATE TABLE parent
   (parent_id    NUMBER(10)   NOT NULL
   ,parent_name  VARCHAR2(50) NOT NULL
   ,CONSTRAINT parent_pk PRIMARY KEY (parent_id))
ORGANIZATION INDEX
/

CREATE TABLE child
   (child_id    NUMBER(10)   NOT NULL
   ,parent_id   NUMBER(10)   NOT NULL
   ,child_name  VARCHAR2(50) NOT NULL)
/

ALTER TABLE child
   ADD CONSTRAINT child_pk
   PRIMARY KEY (child_id)
/

ALTER TABLE child
   ADD CONSTRAINT child_fk1
   FOREIGN KEY (parent_id)
   REFERENCES parent (parent_id)
/

CREATE INDEX child_i1
   ON child (parent_id)
/

INSERT INTO parent
SELECT ROWNUM
,      'Parent ' || TO_CHAR(ROWNUM)
FROM   dual
CONNECT BY level <= 10
/

INSERT INTO child
SELECT ROWNUM
,      CASE WHEN MOD(ROWNUM,10)=0 THEN 10 ELSE MOD(ROWNUM,10) END
,      'Child ' || TO_CHAR(ROWNUM)
FROM   dual
CONNECT BY level <= 100
/

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','parent')
EXEC dbms_stats.gather_table_stats ('','child')

We have a PARENT and a CHILD table with a foreign key relationship between them. PARENT has 10 rows and CHILD has 100 rows. In one session we perform an update on a PARENT row:

UPDATE parent
SET    parent_name = 'PARENT ' || 5
WHERE  parent_id = 5
/

and in a separate session we run a series of DML statements:

-- STATEMENT 1
-- insert using unlocked parent row
INSERT INTO child
VALUES (101,1,'CHILD 101')
/

-- STATEMENT 2
-- insert using locked parent row
INSERT INTO child
VALUES (102,5,'CHILD 101')
/

-- STATEMENT 3
-- delete using locked parent
DELETE
FROM   child
WHERE  child_id = 55
/

-- STATEMENT 4
-- update using locked parent
UPDATE child
SET    child_name = 'NEW CHILD 65'
WHERE  child_id = 65
/

-- STATEMENT 5
-- merge using unlocked parent
MERGE INTO child dest
USING  (SELECT 103         AS child_id
        ,      7           AS parent_id
        ,      'CHILD 103' AS child_name
        FROM   dual) src
ON (src.child_id = dest.child_id)
WHEN MATCHED THEN UPDATE
SET dest.child_name = src.child_name
WHEN NOT MATCHED THEN INSERT
   (child_id
   ,parent_id
   ,child_name)
VALUES
   (src.child_id
   ,src.parent_id
   ,src.child_name)
/   

-- STATEMENT 6
-- merge using locked parent
MERGE INTO child dest
USING  (SELECT 104         AS child_id
        ,      5           AS parent_id
        ,      'CHILD 104' AS child_name
        FROM   dual) src
ON (src.child_id = dest.child_id)
WHEN MATCHED THEN UPDATE
SET dest.child_name = src.child_name
WHEN NOT MATCHED THEN INSERT
   (child_id
   ,parent_id
   ,child_name)
VALUES
   (src.child_id
   ,src.parent_id
   ,src.child_name)
/   

Both STATEMENT 2, insert into CHILD referencing the locked PARENT entry, and STATEMENT 6, merge into CHILD using the locked PARENT entry, are blocked by the transaction against the PARENT table. If we were to replace the PARENT IOT with a normal heap table then we would find that no operation against the CHILD table is blocked.

So why does an IOT parent table make such a difference? I would guess that performing INSERT and MERGE Oracle needs to take out a shared lock on the parent to ensure referential integrity. We certainly would not want to perform an insert into a child table only to have the referenced parent table entry deleted by another session whilst our insert transaction is still in progress. With an IOT there is no table structure and all data is stored in the primary key index so when we updates the PARENT “table” we are actually updating the primary key index entry. Our PARENT update would have to take an exclusive lock on that and therefore denies the CHILD operations the ability to obtain a shared lock.

Locking and Blocking Basics

One of the most important aspects of designing a database application is to consider concurrency; multiple users or sessions simultaneously accessing and modifying the data. This is also one area that the various relational database platforms differ so it’s important to understand how it is handled in the one you work with. There are various aspects to concurrency and this post looks at the basics of row locking and blocking within Oracle.

Before we get started I’d like to mention a couple of points with regard to how Oracle works:

  • Locking is done at the row level. Yes, it’s possible to issue a LOCK TABLE statement but if you are simply modifying data then you will only be locking individual rows. There is no locking at the block or extent levels. Furthermore, Oracle does not escalate a series of row level locks to a table lock.
  • Oracle does not support dirty reads of data, which means that changes made by one session will only be visible to other sessions after a COMMIT has been issued. There is simply no facility in Oracle to perform a dirty read even if you wanted to do so.

The following examples all use the table and data defined below and were run under Oracle 11.2.0.3:

CREATE TABLE lock_test
  (id       NUMBER(10)    NOT NULL
  ,cat      NUMBER(2)     NOT NULL
  ,padding  CHAR(30)      NOT NULL)
/

INSERT INTO lock_test
  (id, cat, padding)
SELECT ROWNUM
,      MOD(ROWNUM,5)
,      'X'
FROM   dual
CONNECT BY level <= 10
/

COMMIT
/

ALTER TABLE lock_test
ADD CONSTRAINT lock_test_pk
PRIMARY KEY (id)
/  

Example 1
Let’s start with a simple test of a session updating one row whilst another session queries that same row before the update has been committed:

Session 1

SQL>UPDATE lock_test
  2  SET    cat = 99
  3  WHERE  id = 5
  4  /

1 row updated.

Session 2

SQL>SELECT *
  2  FROM   lock_test
  3  WHERE  id = 5
  4  /

        ID        CAT PADDING
---------- ---------- ------------------------------
         5          5 X

So, since the change in session 1 had not been committed session 2 does not see those changes and the value of the CAT column is 5 and not 99. However, from a locking and blocking perspective, whilst session 1 has modified the data and presumably taken out a lock on the row, session 2 was still able to query it, albeit to obain the pre-committed data. This example illustrates how sessions reading data are not blocked by sessions writing data in Oracle. This is a very important aspect of how locking and blocking works in Oracle; it is very difficult to block readers of data. This situation is what gives rise to the type of problem I described in my very first blog post.

Example 2
So, let’s see what would happen if session 2 tries to modiy the same row that session 1 is modifying:

Session 1

SQL>UPDATE lock_test
  2  SET    cat = 99
  3  WHERE  id = 5
  4  /

1 row updated.

Session 2

SQL>UPDATE lock_test
  2  SET    cat = 98
  3  WHERE  cat = 5
  4  /

Ummm… well, that’s not easy to show in a blog post. Anyway, session 2 appears to have simply stopped. It’s being blocked by the uncommitted transaction in session 1. Once we go into session 1 and issue a COMMIT (or a ROLLBACK for that matter) then session 2 reports:

1 row updated.

If we repeat the test, and while session 2 is blocked, we can see the details of the locks causing session 2 to wait via the data dictionary view DBA_WAITERS.

SELECT *
FROM   dba_waiters
WHERE  holding_session = SYS_CONTEXT('userenv','sid')
/

If we run this query in session 1, the session holding the lock, we get:

WAITING_SESSION HOLDING_SESSION LOCK_TYPE    MODE_HELD  MODE_REQUESTED   LOCK_ID1   LOCK_ID2
--------------- --------------- ------------ ---------- -------------- ---------- ----------
            131             200 Transaction  Exclusive  Exclusive          131095        982

So, session 1, which is SID 200, is holding an exclusive lock that is being requested by session 2, which is SID 131. We can see the details of the lock by querying v$lock:

SELECT *
FROM   v$lock
WHERE  id1 = 131095
AND    id2 = 982
/

which gives us the following rather cryptic details:

ADDR             KADDR              SID TY     ID1     ID2  LMODE  REQUEST  CTIME  BLOCK
---------------- ---------------- ----- -- ------- ------- ------ -------- ------ ------
000007FF3D8DAE98 000007FF3D8DAEF0   131 TX  131095     982      0        6    995      0
000007FF3BCE6938 000007FF3BCE69B0   200 TX  131095     982      6        0   1010      1

I don’t want to delve into the Oracle’s locking internals for this post so for now I’ll just note that the above view contains two rows, one for the holder of the lock (session 1) and one for the requester (session 2). The SIDs of the sessions are shown as is the time the lock has been held and whether the lock is blocking others. Very useful details when troubleshooting locking and blocking problems in a system.

A very important point to understand with how Oracle works is that session 2 will remain blocked until session 1 issues a commit or rollback. There is no automatic timeout. This feature in particular has significant application design implications. For example, you probably need to think very carefully when designing long running batch processes that will modify data which will operate concurrently with short OLTP-type operations. If your batch process obtains locks on rows that an OLTP session requires then you will be facing end-user complaints of bad performance!

Example 3
Back to our original tables. Let’s repeat the insert test again but alter the selection criteria of the row that is updated. Rather than modifying the row where the ID value is 5, we’ll modify the row where the CAT value is 5:

Session 1

SQL>UPDATE lock_test
  2  SET    cat = 99
  3  WHERE  cat = 5
  4  /

1 row updated.

Session 2

SQL>UPDATE lock_test
  2  SET    cat = 98
  3  WHERE  cat = 5
  4  /

Once again session 2 is blocked. If we commit the transaction in session 1 then session 2 reports:

0 rows updated.

Session 2 was blocked by session 1 as it wanted to update the same row based on the pre-existing, committed data. However, when session 1 committed its changes the row no longer satisfied the selection criteria for the update session 2 was performing so session 2 ended up modifing no rows.

What is interesting in this example was that session 2 initially attempted to obtain a lock on the row and was blocked by session 1. When session 1 released that lock, session 2 decided that it didn’t want to update the row since it no longer satisified the update criteria so it actually never needed to obtain any locks. Such are the complexities when dealing with concurrency… get used to it as it will be an important consideration in any multi-user database.

Example 4
So what would happen if session 2 tried to perform an update using a selection criteria that would include uncommitted rows from session 1? Let’s find out:

Session 1

SQL>UPDATE lock_test
  2  SET    cat = 99
  3  WHERE  cat = 5
  4  /

1 row updated.

Session 2

SQL>UPDATE lock_test
  2  SET    cat = 98
  3  WHERE  cat = 99
  4  /

0 rows updated.

Session 2 returned immediately and reported that it didn’t update any rows even though session 1 had an uncommitted change that would have satisified the selection criteria for session 2. Logically what happened was that session 2 read the data in the table, ignoring the uncommitted changes, and based its update on that view of the data… and that view did not have a row with a CAT value of 99. Since session 2 did not attempt to modify any rows that were locked by session 1 it was not blocked by session 1.

Example 5
Enough of updates, let’s take a look at inserts next. This time we’ll start off with 2 sessions inserting a single row into the table:

Session 1

SQL>INSERT INTO lock_test
  2    (id, cat, padding)
  3  VALUES
  4    (11, 99, 'X')
  5  /

1 row created.

Session 2:

SQL>INSERT INTO lock_test
  2    (id, cat, padding)
  3  VALUES
  4    (12, 98, 'X')
  5  /

1 row created.

Nothing unexpected here. Both sessions inserted a row without any problem since locking, and hence blocking, is done at the row level and not at a table, extent or block level.

Example 6
Let’s modify the example such that both attempt to insert the same set of values:

Session 1

SQL>INSERT INTO lock_test
  2    (id, cat, padding)
  3  VALUES
  4    (11, 99, 'X')
  5  /

1 row created.

Session 2:

SQL>INSERT INTO lock_test
  2    (id, cat, padding)
  3  VALUES
  4    (11, 99, 'X')
  5  /

Again, not easily llustrated in a blog post but session 2 is blocked by session 1. The reason? The uniqueness constraint on the table prevents multiple rows having the same ID value. So why has Oracle blocked session 2 and not simply aborted the operation? Session 1 hasn’t actually committed its insert so Oracle doesn’t know for certain if there is a problem. If we were to issue a commit in session 1 then session 2 would report:

INSERT INTO lock_test
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.LOCK_TEST_PK) violated

However, if session 1 issued a rollback instead of a commit then session 2 would report:

1 row created.

As per our previous blocking examples, session 2 will remain blocked until session 1 performs a commit or rollback.

Example 7
Onto deletes then. The situation where two sessions both try to delete the same row probably causes no surprises:

Session 1

SQL>DELETE
  2  FROM   lock_test
  3  WHERE  id = 5
  4  /

1 row deleted.

Session 2

SQL>DELETE
  2  FROM   lock_test
  3  WHERE  id = 5
  4  /

Session 2 is blocked by session 1 as it’s trying to obtain a lock that session 1 already has. Once session 1 issues a commit then session 2 reports:

0 rows deleted.

Obviously session 1 has already deleted the row that session 2 was trying to delete so session 2 can’t delete it after all.

So, hopefully the above examples illustrates the basics of locking and blocking in Oracle, which are necessary features in any multi-user database system. These details are important to understand if you want to develop high performance systems. Time spent waiting whilst being blocked is essentially wasted time from an end user perspective so it’s important to minimise this.

More Concurrency

This article was inspired by a situation that was recently observed in a vendor supplied system I deal with. In a previous article we looked at concurrency handling where we were inserting or updating a row on a table with a simple primary key. These days people want to maintain the history of changes that have been made to their data and one approach to this requirement is to maintain a start and end date for each row. Commonly the entry that is currently in effect wlll not have an end date.

Given a table that implements a versioning strategy via this date stamping approach, our requirement is to create a mechanism to perform a versioning update. Here we will need to update the end date of the existing effective row and insert a new row, which will become the current effective entry. For simplicity, we will assume that we will only ever be inserting a new entry effective from the current system date.

The table used for our example is:

CREATE TABLE test_table
  (pk_id       NUMBER(10)   NOT NULL
  ,id          NUMBER(10)   NOT NULL
  ,start_date  DATE         NOT NULL
  ,end_date    DATE
  ,padding     VARCHAR2(100) NOT NULL
  )
/

-- create the primary key constraint
ALTER TABLE test_table
ADD CONSTRAINT test_table_pk
PRIMARY KEY (pk_id)
/

CREATE SEQUENCE test_table_pk_id_seq
/

You’ll note that the only integrity constraint declared is a surrogate primary key, which is completely useless for imposing any business rule integrity to our data. More on this later. We’ll use the sequence created to populate this column.

Lets turn our attention to performing the required update. As noted before we need to perform two operations for our chosen entity ID; set the end date on the current entry if it exists (i.e. the row that currently has no end date) and create a new entry with a start date from the current system date. Since we’re using DATE data types for the start and end date range, we’ll ensure that the end date used is 1 second less than the start date.

Our script is as follows. We have a simple procedure to handle the update and insert operations that we require. We make sure to use a variable for the effective date so that there won’t be any problem of a delay between the update of the existing entry and the creation of the new one. We commit after each update and introduce a short delay to ensure we aren’t updating the table multiple times within the one second.

SET ECHO ON
SET SERVEROUTPUT ON

DECLARE

PROCEDURE new_version (p_id IN test_table.id%TYPE)
AS
   c_one_second CONSTANT NUMBER := 1/24/60/60;
   l_pk_id      test_table.pk_id%TYPE;
   l_eff_date   DATE;
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of new_version');
   
   l_eff_date :=  SYSDATE;

   -- end date the current entry to be one second prior 
   -- to the start date of the new entry we will be
   -- creating
   UPDATE test_table
   SET    padding = padding || ', updated by SID ' || SYS_CONTEXT('userenv','sid')
   ,      end_date = l_eff_date - c_one_second
   WHERE  id = p_id
   AND    end_date IS NULL;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');

   INSERT INTO test_table
     (pk_id
     ,id
     ,start_date
     ,end_date
     ,padding)
   VALUES
     (test_table_pk_id_seq.nextval
     ,p_id
     ,l_eff_date
     ,NULL
     ,'Inserted by SID ' || SYS_CONTEXT('userenv','sid'))
   RETURNING pk_id INTO l_pk_id;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT - created row pk_id ' || TO_CHAR(l_pk_id));
   
   dbms_lock.sleep(10);
   COMMIT;
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of new_version');
   
   
END new_version;

BEGIN
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 1');
   new_version (p_id => 1);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 2');
   new_version (p_id => 1);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');
END;
/

The following query lists the content of the table, for clarity only showing the time component for the date columns:

SET LINESIZE 132
COLUMN PADDING FORMAT A60

SELECT t.pk_id
,      t.id
,      TO_CHAR(t.start_date,'hh24:mi:ss') AS start_date
,      TO_CHAR(t.end_date,'hh24:mi:ss')   AS end_date
,      t.padding
FROM   test_table t
ORDER  BY
       t.start_date
/	   

If we run this query after executing our script we get:

     PK_ID         ID START_DA END_DATE PADDING
---------- ---------- -------- -------- ---------------------------------------
         1          1 06:49:46 06:49:55 Inserted by SID 141, updated by SID 141
         2          1 06:49:56          Inserted by SID 141

All looks as we would want.

Now let’s see how well concurrency is handled. You’ll have noted that the short delay in the test script is done before the COMMIT. This is deliberate so as to keep the transaction active and allow us to illustrate the concurrency problems with the script. If we run the script from 2 sessions simultaneously and query our table we get:

     PK_ID         ID START_DA END_DATE PADDING
---------- ---------- -------- -------- ---------------------------------------
         1          1 07:02:59 07:03:08 Inserted by SID 149, updated by SID 149
         2          1 07:03:00 07:03:09 Inserted by SID 141, updated by SID 141
         3          1 07:03:09 07:03:09 Inserted by SID 149, updated by SID 141
         4          1 07:03:10          Inserted by SID 141

Hmmm… not exactly what we would want…

We can see that both sessions created an initial entry; one at 7:22:30 and one at 7:32:32. The session with SID 149 then updated the row with PK_ID of 1 and then created the row with PK_ID of 3. The session with SID 141 then updated the end date on the rows with PK_ID of 2 and 3 before creating the last row.

If we run the scripts a second time, after the table has been populated do we continue to have trouble? Let’s see:

     PK_ID         ID START_DA END_DATE PADDING
---------- ---------- -------- -------- ---------------------------------------
         1          1 07:02:59 07:03:08 Inserted by SID 149, updated by SID 149
         2          1 07:03:00 07:03:09 Inserted by SID 141, updated by SID 141
         3          1 07:03:09 07:03:09 Inserted by SID 149, updated by SID 141
         4          1 07:03:10 07:04:37 Inserted by SID 141, updated by SID 149
         5          1 07:04:38 07:04:47 Inserted by SID 149, updated by SID 149
         7          1 07:04:40 07:05:07 Inserted by SID 141, updated by SID 141
         6          1 07:04:48 07:04:39 Inserted by SID 149, updated by SID 141
         8          1 07:05:08          Inserted by SID 141

The problem is still there. We have overlapping date ranges, which will definitely cause trouble when trying to work out which entry was effective at any point in time.

So, what can we do to overcome this. As mentioned at the start of this post, there are no constraints declared on the table to enforce the integrity we’re looking for. Let’s see if we can create one.

Ideally we want a constraint that says “for a given entry, don’t permit the start date to overlap with the start and end dates in any other row and don’t let the end date overlap with the start and end dates in any other row”. Unfortunately it’s not possible to declare such a constraint.

One constraint I have seen people use is to declare uniqueness on the ID and START_DATE. Obviously the thinking is that we don’t want two entries effective from the same point in time for the same ID. However, if you look at the data in our table you will see that it meets this constraint and the data is still wrong. Clearly this constraint is inadequate.

Another possible constraint is to ensure that only a single entry has a NULL end date for a give ID value. This won’t actually prevent all types off errors but it may prevent the problems we are observing above. We can’t declare this as a constraint in Oracle but we can create a unique function-based index to enforce this:

CREATE UNIQUE INDEX test_table_i1
ON test_table (CASE WHEN end_date IS NULL THEN id ELSE NULL END)
/

(Purists will point out that the “ELSE NULL” part is not required in the above CASE expression but I prefer to include it for clarity.)

So, rerunning our scripts in 2 sessions, how does this index impact us? One session completed successfully while the other session gave us the following error:

DECLARE
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.TEST_TABLE_I1) violated
ORA-06512: at line 25
ORA-06512: at line 51

Line 24 is the INSERT statement. What’s happening here is that the first session has performed its actions and created a new entry with a NULL end date. The second session has performed its UPDATE followed by an INSERT. However, when the first session commits its changes then Oracle flags an error in the second session as the row it has tried to insert violates the unique index we have created.

So, on one hand we managed to prevent data corruption but at the expense of failing one of the transactions. Can we do better?

Given that we need to maintain data integrity over a number of rows and not just within a single row we need to look at different strategies for managing the data. Preventing multiple sessions from manipulating the rows for a given ID will allow a single session to perform its changes safe from interference. In general, serialising access like this should not be taken lightly but at times it may be necessary.

How can we prevent multiple sessions from manipulating the same set of data at at the same time? An obvious solution is to perform a SELECT FOR UPDATE on all the rows for a given ID before we perform our changes. The session that manages to lock the rows will be able to make its changes and any other session is blocked until it is able to obtain the lock. Lets give this a try by running the following script in 2 sessions:

SET ECHO ON
SET SERVEROUTPUT ON

DECLARE

PROCEDURE new_version (p_id IN test_table.id%TYPE)
AS
   CURSOR test_cur (p_cur_id test_table.id%TYPE)
   IS
   SELECT *
   FROM   test_table
   WHERE  id = p_cur_id
   FOR UPDATE;

   c_one_second CONSTANT NUMBER := 1/24/60/60;
   l_pk_id      test_table.pk_id%TYPE;
   l_eff_date   DATE;
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of new_version');

   -- obtain a lock on all the rows for the given ID by opening
   -- a cursor that returns all the rows for the ID
   OPEN test_cur (p_id);   
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After SELECT FOR UPDATE');
	
   -- we calculate the effective date after we've obtained the locks, not before!	
   l_eff_date := SYSDATE;

   -- end date the current entry to be one second prior 
   -- to the start date of the new entry we will be
   -- creating
   UPDATE test_table
   SET    padding = padding || ', updated by SID ' || SYS_CONTEXT('userenv','sid')
   ,      end_date = l_eff_date - c_one_second
   WHERE  id = p_id
   AND    end_date IS NULL;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');

   INSERT INTO test_table
     (pk_id
     ,id
     ,start_date
     ,end_date
     ,padding)
   VALUES
     (test_table_pk_id_seq.nextval
     ,p_id
     ,l_eff_date
     ,NULL
     ,'Inserted by SID ' || SYS_CONTEXT('userenv','sid'))
   RETURNING pk_id INTO l_pk_id;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT - created row pk_id ' || TO_CHAR(l_pk_id));
   
   dbms_lock.sleep(10);
   COMMIT;
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of new_version');
   
   
END new_version;

BEGIN
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 1');
   new_version (p_id => 2);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 2');
   new_version (p_id => 2);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');
END;
/

Both scripts executed successfully and the table shows:

     PK_ID         ID START_DA END_DATE PADDING
---------- ---------- -------- -------- ---------------------------------------
         1          2 07:12:33 07:12:42 Inserted by SID 149, updated by SID 149
         2          2 07:12:34 07:12:52 Inserted by SID 141, updated by SID 141
         3          2 07:12:43 07:12:52 Inserted by SID 149, updated by SID 141
         4          2 07:12:53          Inserted by SID 141

Oh, that’s not right though. The cause of the problem is probably clear; both sessions initially created new rows as the SELECT FOR UPDATE had no rows to lock so we didn’t get the serialisation that we wanted. Can we solve this? If we had a table that held the domain of IDs, i.e. one row per ID, then we could potentially lock that row. Whichever session held the lock would be able to make changes to the date versioned entries. Lacking this though we can create our own lock using DBMS_LOCK. With DBMS_LOCK we can “lock” a string value. If the value is constructed to reflect the ID that we’re updating then whichever session is able to lock that string is able to make the changes to the table. The script below illustrates this:

SET ECHO ON
SET SERVEROUTPUT ON

DECLARE

PROCEDURE new_version (p_id IN test_table.id%TYPE)
AS
   c_one_second CONSTANT NUMBER := 1/24/60/60;
   l_pk_id      test_table.pk_id%TYPE;
   l_eff_date   DATE;
   l_lockhandle VARCHAR2(100);
   l_lock       NUMBER;
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of new_version');

   -- obtain a lock on a string that reflects the ID 
   dbms_lock.allocate_unique (lockname    => 'TEST_TABLE_ID_' || TO_CHAR(p_id)
                             ,lockhandle  => l_lockhandle);
   l_lock := dbms_lock.request (lockhandle        => l_lockhandle
--                               ,lockmode          => dbms_lock.x_mode
--                               ,timeout           => dbms_lock.maxwait
                               ,release_on_commit => TRUE);
	
   -- we calculate the ffective date after we've obtained the lock, not before!	
   l_eff_date := SYSDATE;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After DMBS_LOCK');

   -- end date the current entry to be one second prior 
   -- to the start date of the new entry we will be
   -- creating
   UPDATE test_table
   SET    padding = padding || ', updated by SID ' || SYS_CONTEXT('userenv','sid')
   ,      end_date = l_eff_date - c_one_second
   WHERE  id = p_id
   AND    end_date IS NULL;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');

   INSERT INTO test_table
     (pk_id
     ,id
     ,start_date
     ,end_date
     ,padding)
   VALUES
     (test_table_pk_id_seq.nextval
     ,p_id
     ,l_eff_date
     ,NULL
     ,'Inserted by SID ' || SYS_CONTEXT('userenv','sid'))
   RETURNING pk_id INTO l_pk_id;
   
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT - created row pk_id ' || TO_CHAR(l_pk_id));
   
   dbms_lock.sleep(10);
   COMMIT;
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of new_version');
   
   
END new_version;

BEGIN
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 1');
   new_version (p_id => 2);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Create version 2');
   new_version (p_id => 2);
   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');
END;
/

Looking at the table data after we run the script in 2 sessions simultaneously, we see:

     PK_ID         ID START_DA END_DATE PADDING
---------- ---------- -------- -------- ---------------------------------------
         1          2 07:23:18 07:23:27 Inserted by SID 149, updated by SID 141
         2          2 07:23:28 07:23:37 Inserted by SID 141, updated by SID 149
         3          2 07:23:38 07:23:47 Inserted by SID 149, updated by SID 141
         4          2 07:23:48          Inserted by SID 141

Success! We can see that the two sessions took it in turns to create a new entry, end dating the previous entry created by the other session. The lock worked as expected. The serialisation of access isn’t without its downfalls. Most obvious, if the data is subject to a high frequency of update then permitting only one session at a time to make changes may result in unacceptable delays. However, if this were the case then I would probably argue that the data structure is inappropriate.

In this post we’ve seen that the lack of declarative contraints makes it easy for concurrent changes to corrupt the structure that we were expecting within our data. For this example we had to resort to a programmatic locking strategy to serialise modification of the data to avoid the concurrency problems. Not necessarily the best of strategies but one that may need to be applied when no other options are available.

Oracle Concurrency 101

My first post takes a look at concurrency in Oracle data processing. Concurrency is one key aspect of data handling that is necessary to maintain data consistency but an area that is far too regularly overlooked. In my experience, even people who term themselves database developers make fundamental mistakes.

The following illustrates a very simple scenario of a single row change on a table upon which resides a primary key constraint. The requirement is to add a new entry if the key attribute doesn’t exist or update the existing row if it does. We start off defining a simple table that will be used for the examples:

-- create a simple table with an ID column
CREATE TABLE test_table
   (id      NUMBER(10)   NOT NULL
   ,padding VARCHAR2(50) NOT NULL
   )
/

-- create the primary key index and constraint
ALTER TABLE test_table
ADD CONSTRAINT test_table_pk
PRIMARY KEY (id)
/

The following script shows a simple implementation of the requirement. It initially runs a query against the table to see if a row already exists. If no row is found then it will perform an insert operation, otherwise it will perform an update operation. The script has been liberally instrumented to report on the steps it performs.

-- a simple anonymous block that performs an
-- insert if a row doesn't already exist, otherwise
-- performs an update
SET ECHO ON
SET SERVEROUTPUT ON
DECLARE
   l_id         test_table.id%TYPE := 1;
   l_padding    test_table.padding%TYPE := RPAD('A',50,'A');
   l_row_count  NUMBER(10);
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');

   SELECT COUNT(*)
   INTO   l_row_count
   FROM   test_table
   WHERE  id = l_id;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After SELECT');

   IF (l_row_count = 0) THEN
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before INSERT');
      INSERT INTO test_table
         (id, padding)
      VALUES
         (l_id, l_padding);
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT');
   ELSE
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before UPDATE');
      UPDATE test_table
      SET    padding = l_padding
      WHERE  id = l_id;
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');
   END IF;

   COMMIT;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');

END;
/

Running the script for the first time shows that it has inserted the new row as desired:

04:49:21.361 - Start of test
04:49:21.361 - After SELECT
04:49:21.361 - Before INSERT
04:49:21.361 - After INSERT
04:49:21.361 - End of test

PL/SQL procedure successfully completed.

Running the script for a second time shows that it now performs an update as required:

04:49:27.352 - Start of test
04:49:27.352 - After SELECT
04:49:27.352 - Before UPDATE
04:49:27.352 - After UPDATE
04:49:27.352 - End of test

PL/SQL procedure successfully completed.

On the surface it would appear that the code fulfils our requirement. However, things begin to fall apart when concurrency considerations are taken into account. What happens if two people run this script at the same time?

I find it strange that even some people who consider themselves to be database developers fail to see the concurrency flaw in the above strategy. Some will acknowledge that while the script executed by two sessions at precisely the same point in time will cause problems the probability of this occurring is remote. They will point out that the length of time between the test for existence of the row (i.e. the SELECT) and the operation (i.e. the INSERT or UPDATE) is so small that in practical terms this won’t happen (the script output above shows this to be less than 1 millisecond). This reasoning is flawed though.

Let’s see what happens if we simulate subsequent processing after the insert or update by introducing a delay after the row operation but before the commit point. The modified script becomes:

-- a simple anonymous block that performs an
-- insert if a row doesn't already exist, otherwise
-- performs an update. This test includes a delay
-- before the commit of the changes
SET ECHO ON
SET SERVEROUTPUT ON
DECLARE
   l_id         test_table.id%TYPE := 2;
   l_padding    test_table.padding%TYPE := RPAD('A',50,'A');
   l_row_count  NUMBER(10);
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');

   SELECT COUNT(*)
   INTO   l_row_count
   FROM   test_table
   WHERE  id = l_id;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After SELECT');

   IF (l_row_count = 0) THEN
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before INSERT');
      INSERT INTO test_table
         (id, padding)
      VALUES
         (l_id, l_padding);
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT');
   ELSE
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before UPDATE');
      UPDATE test_table
      SET    padding = l_padding
      WHERE  id = l_id;
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');
   END IF;

   dbms_lock.sleep (30);

   COMMIT;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');

END;
/

We run the script in one session and then again in another session about 10 seconds later. The first session completes  with:

04:58:05.335 - Start of test
04:58:05.335 - After SELECT
04:58:05.335 - Before INSERT
04:58:05.335 - After INSERT
04:58:35.365 - End of test

PL/SQL procedure successfully completed.

whilst the second session produces the following:

04:58:16.738 - Start of test
04:58:16.738 - After SELECT
04:58:16.738 - Before INSERT
DECLARE
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.TEST_TABLE_PK) violated
ORA-06512: at line 18

Looking at the timestamps involved we can see that the second session performed its test for existence (the SELECT) after the first session had performed its operation (the INSERT) and yet it still generated an error. The reason for this is that the insert performed by the first session is not visible to the second session unit the session has issued a commit. This means the opportunity for error is not limited to the duration between the test for existence and the subsequent operation but from the test for existence to the commit point.

In a production system this type of problem will probably manifest itself rarely. When it does occur it will often be written off as a “one-off system glitch” due to the “same process having been run for months without error”. To end users, the system will be labelled “buggy” and “unreliable”.

So, what can be done to avoid this scenario? One possibility is to look at eliminating the SELECT/INSERT/UPDATE structure with a simple MERGE statement. As many developers would know that MERGE performs either an insert or update according to whether a row exists so it appears to match our requirements exactly. Let’s test this:

-- a simple anonymous block that performs an
-- merge to insert or update on a row according
-- to whether it exists or not
SET ECHO ON
SET SERVEROUTPUT ON
DECLARE
   l_id         test_table.id%TYPE := 3;
   l_padding    test_table.padding%TYPE := RPAD('A',50,'A');
   l_row_count  NUMBER(10);
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');

   MERGE INTO test_table dest
   USING (SELECT l_id AS id
          ,      l_padding AS padding
          FROM   dual) src
   ON (src.id = dest.id)
   WHEN MATCHED THEN
      UPDATE
      SET dest.padding = src.padding
   WHEN NOT MATCHED THEN
      INSERT
         (id
         ,padding)
      VALUES
         (src.id
         ,src.padding);

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After MERGE');

   dbms_lock.sleep (30);

   COMMIT;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');

END;
/

The output from the two sessions are:

05:02:15.013 - Start of test
05:02:15.013 - After MERGE
05:02:45.028 - End of test

PL/SQL procedure successfully completed.

and

05:02:24.015 - Start of test
DECLARE
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.TEST_TABLE_PK) violated
ORA-06512: at line 9

Hmmm… that didn’t work either. No doubt the MERGE in the second session still doesn’t “see” the row inserted by the first session so it attempts to perform an insert too, rather than an update. So, how about we try a strategy that doesn’t require one session to “see’ what the other session is doing:

-- a simple anonymous block that performs an
-- update on a row and if the row doesn't already exist
-- it will perform an insert
SET ECHO ON
SET SERVEROUTPUT ON
DECLARE
   l_id         test_table.id%TYPE := 4;
   l_padding    test_table.padding%TYPE := RPAD('A',50,'A');
   l_row_count  NUMBER(10);
BEGIN

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Start of test');

   BEGIN
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before INSERT');
      INSERT INTO test_table
         (id, padding)
      VALUES
         (l_id, l_padding);
      dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After INSERT');
   EXCEPTION
      WHEN DUP_VAL_ON_INDEX THEN
         dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - Before UPDATE');
         UPDATE test_table
         SET    padding = l_padding
         WHERE  id = l_id;
         dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - After UPDATE');
   END;

   dbms_lock.sleep (30);

   COMMIT;

   dbms_output.put_line (TO_CHAR(SYSTIMESTAMP,'hh:mi:ss.ff3') || ' - End of test');

END;
/

With this script we simply attempt to create a new entry and if that fails with an error that tells us the row already exists we perform an update instead. Running this script in two sessions we get the following output:

05:04:20.484 - Start of test
05:04:20.484 - Before INSERT
05:04:20.484 - After INSERT
05:04:50.499 - End of test

PL/SQL procedure successfully completed.

and:

05:04:28.487 - Start of test
05:04:28.487 - Before INSERT
05:04:50.499 - Before UPDATE
05:04:50.499 - After UPDATE
05:05:20.513 - End of test

PL/SQL procedure successfully completed.

Finally, success!

A close look at the timestamps on the messages tells us what has happened. The first session performed the insert operation as we would expect. The timing on the second session messages shows that it also attempted to perform an insert but had to wait until the first session committed its changes before failing that operation. It then switched to an update operation. Concurrency is properly handled.

Of course, in a real system there are other data processing aspects to consider. The second session overwrote the details of the first session without so much as batting an eyelid. Depending on the situation we may have a lost update problem on our hands. We’ll defer discussion of that type of problem to a later post though.

In summary, processes that may appear to be be robust when examined from the perspective of a single session may not properly handle concurrency correctly. Developing data processes that correctly handle concurrency issues is not necessary any more difficult, it just requires different considerations and an understanding of the types of problems that exist and the strategies available to avoid them.