Working on a traditional OLTP system means dealing with lots of short transactions. Handling concurrency correctly is a significant concern and application design is tailored to avoid blocking where possible. Having said that, occasionally concurrency problems do arise. Identifying the source of the problem can be tricky but resolution is often simple. The largest problem often lies in the testing of the solution.
In a system where the duration of transactions is measured in milliseconds it can be near impossible to deliberately create the same scenario that gave rise to a concurrency issue. For example, a software tester recently encountered a deadlock whilst running a trivial test one day, unrelated to the functionality that they were testing. After looking into the trace files generated by Oracle it was found that two separate code paths obtained locks on resources in a reverse manner, i.e. one process obtained locks on A and then B, whilst the other did B and then A. This problem had never occurred in production as the probability of it occurring was very, very small… but it had occurred by chance in the test environment. Fixing the code was a trivial task but in order to test that the problem had been resolved we had to slow down the code to widen the window of opportunity for it to occur. This was done by inserting a sleep between where the two locks were obtained. Hence PKG_DELAY was born…
Hard coding a DBMS_LOCK.SLEEP command into the PL/SQL code is a crude but effective way of delaying code execution at some critical point. PKG_DELAY has a couple of advantages over a simple call to DBMS_LOCK.SLEEP:
- The delay period is controlled via a global context. This permits the delay to be set by a separate session when the test is ready to start, rather than have it always on
- The delay period can be adjusted at run time, again due to it being controlled via a global context. This has proved useful in tests that have sought to assess the impact of slow processes within the system.
- Any code calling out to PKG_DELAY will have a dependency on this package. Since PKG_DELAY is never deployed to the production environment, if any test code containing a call out to PKG_DELAY is deployed then it will fail compilation. Not the best scenario perhaps but better than deploying code with a sleep in it…
PKG_DELAY has just two routines:
This routine takes a name for a delay and a duration for that delay
Performs a sleep of a duration specified by the set_delay routine for the name specified
So, a delay call might be inserted into code as:
... pkg_delay.delay ('BEFORE_OPERATION'); ...
The code will now run as per usual with virtually no overhead from the call to PKG_DELAY. Once the test is ready to be performed, requiring say a 2 second delay, in a separate session the following is executed:
EXEC pkg_delay.set_delay ('BEFORE_OPERATION',2)