AQ Multiple Subscribers

In the last post we looked at asynchronous processing using a queue and a callback routine. Event-based processing is very powerful and flexible. Oracle’s Advanced Queues provide many features that are useful in setting up event-based processing. This post will look at muti-subscriber queues.

The queue created for the previous post was a single subscriber queue, i.e. a message is available for being dequeued just once so it’s pretty much limited to a single dequeue process. Queues can also be configured to support multiple subscribers to the queue, allowing us to initiate multiple business processes from the one event. In this scenario, each subscriber is able to dequeue the same message.

The changes to our previous code to support multi-subscriber queue are relatively few. When creating the queue table we need to specify that it will support multiple subscribers. In the routine SETUP inside the package EMP_EVENT_MANAGEMENT you will find the following change:

dbms_aqadm.create_queue_table (queue_table        => c_queue_owner || '.' || c_queue_table
                              ,queue_payload_type => c_queue_payload
                              ,multiple_consumers => TRUE);

We also have routines to add and remove queue subscribers exposed in the package spec:

PROCEDURE add_subscriber (p_subscriber         IN VARCHAR2
                         ,p_error_on_duplicate IN BOOLEAN DEFAULT TRUE);

PROCEDURE remove_subscriber (p_subscriber         IN VARCHAR2
                            ,p_error_on_not_exist IN BOOLEAN DEFAULT TRUE);

The package body has an additional routine for removing all subscribers from the queue, which invoked by the teardown routine to clean things up properly.

When creating a process that will consume messages from a queue, the process will need to:

  • add itself as a subscriber of the queue
  • specify its subscriber name when dequeuing messages

When registering a callback routine, the queue subscriber name is specified after the name of the callback routine, as shown below:

PROCEDURE register_event_callback
IS
BEGIN
   dbms_aq.register
      (sys.aq$_reg_info_list
         (sys.aq$_reg_info
             -- the next bit needs to be "queue_name : subscriber"
            (c_queue_owner || '.' || c_queue_name || ':' || c_callback_subscriber
            ,DBMS_AQ.NAMESPACE_AQ
             -- this is the routine that will get called when a message is queued
             -- I'm going to assume it's in the same schema as the queue
            ,'plsql://' || c_queue_owner || '.emp_event_processing.emp_event_callback'
            ,NULL
            )
         ),
      1
      );
   msg('Callback registered');
END register_event_callback;

The unregister routine is similarly modified to include the subscriber.

Looking at the callback routine itself, contained in the EMP_EVENT_PROCESSING package, we need to specify the subscriber when dequeuing the message although this is provided in the details passed into the callback routine:

PROCEDURE emp_event_callback (context    RAW
                             ,reginfo    SYS.AQ$_REG_INFO
                             ,descr      SYS.AQ$_DESCRIPTOR
                             ,payload    RAW
                             ,payloadl   NUMBER)
IS
   l_dequeue_options    dbms_aq.dequeue_options_t;
   l_message_properties dbms_aq.message_properties_t;
   l_message_handle     RAW(16);
   l_payload            t_emp_event;
BEGIN

   msg ('Entering emp_event_callback');
   -- dequeue the message specified
   l_dequeue_options.msgid         := descr.msg_id;
   l_dequeue_options.wait          := dbms_aq.no_wait;
   l_dequeue_options.consumer_name := descr.consumer_name;

   dbms_aq.dequeue (queue_name         => descr.queue_name
                   ,dequeue_options    => l_dequeue_options
                   ,message_properties => l_message_properties
                   ,payload            => l_payload
                   ,msgid              => l_message_handle);

   process_message (p_emp_event => l_payload);

   msg ('Processed ' || l_payload.event_operation || ' event for emp ' || TO_CHAR(l_payload.empno));

   -- commit here to ensure we remove the message
   COMMIT;

END emp_event_callback;

All of these changes are required simply to get the previous code working on a multi-subscriber queue. We still don’t have a second subscriber. We could set up a second callback routine or simply test our queue using a test script, like the one following:

EXEC emp_event_management.add_subscriber (p_subscriber => 'TEST_SCRIPT')

EXEC emp_event_processing.update_sal (p_empno=>7654,p_sal=>1600)
EXEC emp_event_processing.update_dept (p_empno=>7654,p_deptno=>20)
COMMIT;

DECLARE
   ex_queue_timeout     EXCEPTION;
   PRAGMA EXCEPTION_INIT (ex_queue_timeout, -25228);
   l_dequeue_options    dbms_aq.dequeue_options_t;
   l_message_properties dbms_aq.message_properties_t;
   l_message_handle     RAW(16);
   l_payload            t_emp_event;
   l_subscriber_name    VARCHAR2(30) := 'TEST_SCRIPT';
BEGIN

   l_dequeue_options.wait          := 5; -- some nominal wait for a message to arrive
   l_dequeue_options.consumer_name := l_subscriber_name;

   -- loop through the messages until we find no more
   LOOP
      BEGIN
         dbms_aq.dequeue (queue_name         => emp_event_management.c_queue_owner || '.' || emp_event_management.c_queue_name
                         ,dequeue_options    => l_dequeue_options
                         ,message_properties => l_message_properties
                         ,payload            => l_payload
                         ,msgid              => l_message_handle);

         -- here's where we would normallydo our message processing, which is
         -- just to dump some details of the message
         dbms_output.put_line ('Processed ' || l_payload.event_operation || ' event for emp ' || TO_CHAR(l_payload.empno));
      EXCEPTION
         WHEN ex_queue_timeout THEN
            EXIT;
      END;
   END LOOP;

END;
/

-- commit here to ensure we remove the message
COMMIT;

The output from the mainpart of the above script is:

Processed UPDATE_SAL event for emp 7654
Processed CHANGE_DEPT event for emp 7654

and if we take a look at the MSG_LOG table where our callback routine logs its processing activity:

SQL> SELECT *
  2  FROM   msg_log
  3  WHERE  msg_timestamp >= SYSTIMESTAMP - INTERVAL '2' MINUTE
  4  ORDER  BY
  5         msg_id DESC
  6  /

MSG_ID MSG_TIMESTAMP   MSG
------ --------------- --------------------------------------------------
   290 06:44:31.481000 Processed CHANGE_DEPT event for emp 7654
   289 06:44:31.481000 Entering emp_event_callback
   288 06:44:31.481000 Processed UPDATE_SAL event for emp 7654
   287 06:44:31.481000 Entering emp_event_callback

we can see that it too has processed the messages.


The full set of scripts for the above are available for download.

Asynchronous processing using AQ Callback

Database transactions are awesome! The ability to do multiple changes to data in various tables and to commit those changes so that they all appear to other sessions at once, or to roll them all back as if nothing had happened is one of the most powerful features of relational databases.

… and here’s the “but”; but sometimes you don’t want to do all processing in one transaction. For example, with the on-line systems of today we might want to do the minimum processing needed to fulfil an order in a single transaction and defer all non-immediate processing to maximise throughput. “Near real time” would be the way to describe our deferred processing. There are various ways to do this but one way is via queues with callback routines.

So, how do we go about doing this? Here’s a high level summary of the steps involved:

  • Create an object type with the details that the processing will require
  • Create a queue whose payload is the object type
  • Create a callback routine that will be run whenever an item is queued
  • Associate the callback routine with the queue so that Oracle knows to call it

Getting into some solid detail we’ll use the EMP table from the SCOTT schema. We’ll assume our business processing has two functions that need to be done asynchronously; update an employees salary and transfer an employee to a different division (okay, probably not great scenarios for asynchronous processing but I wanted something simple using the EMP table…)

Since we’ll be working within the SCOTT schema we need to ensure SCOTT can do Advance Queue operations so we grant SCOTT the ability execute on the packages we will use:

GRANT EXECUTE ON dbms_aqadm TO scott;
GRANT EXECUTE ON dbms_aq    TO scott;

Next we will need to create our object type that the queue will hold. For this example, the type will hold the necessary EMP attributes we need to know about and also an attribute that will store a description of the operation to perform:

CREATE OR REPLACE TYPE t_emp_event
AS OBJECT
  (event_operation VARCHAR2 (20)
  ,empno           NUMBER (4)
  ,sal             NUMBER (7,2)
  ,mgr             NUMBER (4)
  ,deptno          NUMBER (2)
  )
/

Next up, we create our queue on this object type:

BEGIN
   dbms_aqadm.create_queue_table (queue_table        => 'SCOTT.EMP_EVENT_QT'
                                 ,queue_payload_type => 'T_EMP_EVENT'
                                 ,multiple_consumers => FALSE);

   dbms_aqadm.create_queue (queue_name  => 'SCOTT.EMP_EVENT_Q'
                           ,queue_table => 'SCOTT.EMP_EVENT_QT);

   dbms_aqadm.start_queue (queue_name => 'SCOTT.EMP_EVENT_Q');
END;
/

We can now enqueue messages onto our queue with a routine like the following:

PROCEDURE enqueue (p_msg  IN t_emp_event) 
IS
   l_enqueue_options    dbms_aq.enqueue_options_t;
   l_message_properties dbms_aq.message_properties_t;
   l_msgid              RAW(16);
BEGIN
   dbms_aq.enqueue (queue_name         => 'SCOTT.EMP_EVENT_Q'
                   ,enqueue_options    => l_enqueue_options
                   ,message_properties => l_message_properties
                   ,payload            => p_msg
                   ,msgid              => l_msgid);
END enqueue;

Before we can process our messages we need to set up our callback routine, which has a specific call interface. You can read about what this interface needs to look like in the Oracle documentation:

PROCEDURE emp_event_callback (context    RAW
                             ,reginfo    SYS.AQ$_REG_INFO
                             ,descr      SYS.AQ$_DESCRIPTOR
                             ,payload    RAW
                             ,payloadl   NUMBER)
IS
   l_dequeue_options    dbms_aq.dequeue_options_t;
   l_message_properties dbms_aq.message_properties_t;
   l_message_handle     RAW(16);
   l_payload            t_emp_event;
BEGIN
   l_dequeue_options.msgid         := descr.msg_id;
   l_dequeue_options.wait          := dbms_aq.no_wait;

   dbms_aq.dequeue (queue_name         => descr.queue_name
                   ,dequeue_options    => l_dequeue_options
                   ,message_properties => l_message_properties
                   ,payload            => l_payload
                   ,msgid              => l_message_handle);

   -- here's what we'll do with the message... in this case call
   -- out to a separate processing routine (not shown here)
   process_message (p_emp_event => l_payload);

END emp_event_callback;

The routine above assumes the processing of the message will be done by some other routine called “process_message”. In a full blown solution (see later on) this routine would be part of a package that captures all the logic associated with the EMP event processing.

Now that the callback routine has been defined we register it so that Oracle will call it whenever something is placed in our queue. To do this we need to call the DBMS_AQ.REGISTER routine, telling it the queue and the processing routine it needs to call:

BEGIN
   dbms_aq.register
      (sys.aq$_reg_info_list
         (sys.aq$_reg_info
            ('SCOTT.EMP_EVENT_Q' -- the queue 
            ,DBMS_AQ.NAMESPACE_AQ
            ,'plsql://SCOTT.EMP_EVENT_CALLBACK' -- this is the routine that will get called when a message is queued
            ,NULL)
         ),
      1
      );
END;
/

… and that’s pretty much all there is to it. Keep in mind that the messages on the queue are transactional so they will not be processed until the session that enqueues the messages commits, but this is exactly the behaviour that we would normally desire.

Of course it is much simpler to encapsulate all of the above inside packaged routines. For the above EMP event scenario I have constructed two packages, available for download:

  • emp_event_management, which handles the setting up of the queue and registering the callback routine
  • emp_event_processing, which contains the routines to queue up events and processing those same event messages

Once all the complicated DBMS_AQ calls are hidden behind a management API, starting up the event processing is just:

EXEC emp_event_management.setup
EXEC emp_event_management.startup

Shutting things down is:

EXEC emp_event_management.shutdown
EXEC emp_event_management.teardown

The package routines write processing messages to a table named MSG_LOG. I’ve gone a bit overboard with the message written out to help understand what’s going on. The messages are timestamped so the speed of processing can be monitored.

Creating events for processing are matter of making calls to our processing package, similar to these:

EXEC emp_event_processing.update_sal (p_empno => 7654,p_sal => 1600)
EXEC emp_event_processing.update_dept (p_empno => 7654,p_deptno => 20)

By way of demonstration, let’s execute the above commands against the standard SCOTT.EMP table. We’ll grab the current time so as to filter our MSG_LOG table later on, view the data we’re about to change, queue the messages, note the current time so we can see how long the callback processing takes, list the MSG_LOG entries and lastly view the data again to confirm the changes:

VARIABLE now  VARCHAR2(30)
EXEC :now := TO_CHAR(SYSTIMESTAMP,'dd/mm/yyyy hh24:mi:ss.ff')

SELECT empno
,      ename
,      sal
,      deptno
,      mgr
FROM   emp
WHERE  empno = 7654
/

EXEC emp_event_processing.update_sal (p_empno => 7654,p_sal => 1600)
EXEC emp_event_processing.update_dept (p_empno => 7654,p_deptno => 20)
SELECT TO_CHAR(SYSTIMESTAMP,'hh24:mi:ss.ff') AS current_time FROM dual;
COMMIT;

SELECT *
FROM   msg_log
WHERE  msg_timestamp >= TO_TIMESTAMP(:now,'dd/mm/yyyy hh24:mi:ss.ff')
ORDER  BY
       msg_id DESC
/

SELECT empno
,      ename
,      sal
,      deptno
,      mgr
FROM   emp
WHERE  empno = 7654
/

The output of which is:

SQL> VARIABLE now  VARCHAR2(30)
SQL> EXEC :now := TO_CHAR(SYSTIMESTAMP,'dd/mm/yyyy hh24:mi:ss.ff')

PL/SQL procedure successfully completed.

SQL> SELECT empno
  2  ,      ename
  3  ,      sal
  4  ,      deptno
  5  ,      mgr
  6  FROM   emp
  7  WHERE  empno = 7654
  8  /

     EMPNO ENAME             SAL     DEPTNO        MGR
---------- ---------- ---------- ---------- ----------
      7654 MARTIN           1250         30       7698

SQL> EXEC emp_event_processing.update_sal (p_empno => 7654,p_sal => 1600)

PL/SQL procedure successfully completed.

SQL> EXEC emp_event_processing.update_dept (p_empno => 7654,p_deptno => 20)

PL/SQL procedure successfully completed.

SQL> SELECT TO_CHAR(SYSTIMESTAMP,'hh24:mi:ss.ff') AS current_time FROM dual;

CURRENT_TIME
------------------
07:09:02.313000

SQL> COMMIT;

Commit complete.

SQL>
SQL> SELECT *
  2  FROM   msg_log
  3  WHERE  msg_timestamp >= TO_TIMESTAMP(:now,'dd/mm/yyyy hh24:mi:ss.ff')
  4  ORDER  BY
  5         msg_id DESC
  6  /

MSG_ID MSG_TIMESTAMP   MSG
------ --------------- --------------------------------------------------
   237 07:09:02.329000 Processed CHANGE_DEPT event for emp 7654
   236 07:09:02.329000 Entering emp_event_callback
   235 07:09:02.329000 Processed UPDATE_SAL event for emp 7654
   234 07:09:02.329000 Entering emp_event_callback

SQL> SELECT empno
  2  ,      ename
  3  ,      sal
  4  ,      deptno
  5  ,      mgr
  6  FROM   emp
  7  WHERE  empno = 7654
  8  /

     EMPNO ENAME             SAL     DEPTNO        MGR
---------- ---------- ---------- ---------- ----------
      7654 MARTIN           1600         20       7566

All went according to plan. The changes took just a fraction of a second to process from when the messages were queued, which certainly satisfies the criteria of “near real time” to me.


The full set of scripts for the above are available for download.

DML within DML

As noted in my previous post, a recent code review revealed a couple of things that I wasn’t aware of… which is one of the pleasures of working in IT; learning new things. Aggregation within the RETURNING INTO clause of a DML statement was a good thing to find out about. I’m having a hard time thinking of a scenario that would legitimately justify the other feature; performing DML within DML…

To demonstrate, let’s create two simple tables, TAB_A and TAB_B, each with two columns; ID and VAL. Both table start off with a small set of data:

CREATE TABLE tab_a
   (id    NUMBER (6) NOT NULL
   ,val   NUMBER (6) NOT NULL
   ,CONSTRAINT tab_a_pk PRIMARY KEY (id))
/

CREATE TABLE tab_b
   (id    NUMBER (6) NOT NULL
   ,val   NUMBER (6) NOT NULL
   ,CONSTRAINT tab_b_pk PRIMARY KEY (id))
/

INSERT INTO tab_a
SELECT ROWNUM
,      ROWNUM
FROM   dual
CONNECT BY ROWNUM <= 10
/

INSERT INTO tab_b
SELECT ROWNUM
,      ROWNUM
FROM   dual
CONNECT BY ROWNUM <= 10
/

COMMIT
/

Next we’ll create a package with two routines. The first routine is a function that sets the VAL value in TAB_A to a random value and returns that value. The second routine is a procedure that sets the VAL value in TAB_B using the return value of the first function:

CREATE OR REPLACE PACKAGE proc_pkg
AS
   FUNCTION val_a (p_id IN tab_a.id%TYPE)
      RETURN tab_a.val%TYPE;
   PROCEDURE update_b (p_id IN tab_b.id%TYPE);
END proc_pkg;
/

CREATE OR REPLACE PACKAGE BODY proc_pkg
AS
   FUNCTION val_a (p_id IN tab_a.id%TYPE)
      RETURN tab_a.val%TYPE
   AS
      l_return_value tab_a.val%TYPE;
   BEGIN
      UPDATE tab_a
      SET    val = TRUNC(dbms_random.value (100,200))
      WHERE  id = p_id
      RETURNING val INTO l_return_value;
      RETURN l_return_value;
   END val_a;
   
   PROCEDURE update_b  (p_id IN tab_b.id%TYPE)
   AS
   BEGIN
      UPDATE tab_b
      SET    val = val_a (id)
      WHERE  id = p_id;
   END update_b;
END proc_pkg;
/

If you look at the UPDATE_B routine, we’re updating TABL_B but since we invoke function VAL_A we’re also updating TAB_A. My initial thought when seeing code like this was “hey, that won’t work”… and it seems I was wrong:

SQL> EXEC proc_pkg.update_b (5)

PL/SQL procedure successfully completed.

SQL> SELECT *
  2  FROM   tab_a
  3  WHERE  id = 5
  4  /

        ID        VAL
---------- ----------
         5        163

SQL> SELECT *
  2  FROM   tab_b
  3  WHERE  id = 5
  4  /

        ID        VAL
---------- ----------
         5        163

Okay, so it works; performing DML on one table can invoke a function that performs DML on another table. We can insert, update and delete data on a table from a PL/SQL function invoked via DML on a different table. If we were to try running a simple SELECT query using a function that does DML then we get ORA-14551:

SQL> SELECT id
  2  ,      proc_pkg.val_a (id)
  3  FROM   tab_a
  4  /
,      proc_pkg.val_a (id)
       *
ERROR at line 2:
ORA-14551: cannot perform a DML operation inside a query
ORA-06512: at "DEVELOPER.PROC_PKG", line 8

Yes, you can get around this particular error by making the function an autonomous transaction… but don't do that, it's not what autonomous transaction are designed for.

My problem is that I have difficulty thinking of a scenario whereby doing DML from within DML might be a good idea instead of splitting it into separate operations. In some ways it’s similar to a row level trigger and I dislike that approach to data processing as it leads to spaghetti code. Sure, some sort of demo where I want to log details about the execution of the function as it executes I can understand, but in normal business type processing…? Furthermore, keep in mind that we're essentially doing row by row type processing here; our TAB_A DML function will be invoked for each row modified in TAB_B so we're not doing bulk processing on TAB_A. This is likely to be a performance inhibitor for the TAB_A changes.

I guess this is something that I would classify as a nice feature but use only with caution when no other option exists… although I'm happy to take advice on this.

Deferrable Constraints

As someone who has done a lot of data manipulation work I strongly support anything that goes to improve the quality of data. Within the database that means I encourage the use of declarative constraints. In Oracle, these are:

  • Primary key constraints
  • Unique key constraints
  • Foreign key constraints
  • Check constraints
  • Not Null constraints

As anyone who has dealt with databases would know, data inserted or updated must abide by whatever constraints have been declared on a table. Normally this check is made at the point the data modification is made. However, deferrable constraints offers us the ability to delay the checking to the commit point.

So why might we need deferrable constraints? Depending on the data processes it might be that data obtained early on in the process is incomplete and gets supplemented as the process progresses. Deferrable constraints allow us to gradually build up the data knowing that at the end of the transaction the data will be complete. It’s certainly a better situation than dropping the constraints completely!

Let’s look at a very simple example of a deferrable NOT NULL constraint. We’ll create a table with two columns, COL_1 and COL_2, both with NOT NULL constraints but the COL_2 constraint will be deferrable:

CREATE TABLE t1
   (col_1   NUMBER CONSTRAINT col_1_not_null NOT NULL
   ,col_2   NUMBER CONSTRAINT col_2_not_null NOT NULL DEFERRABLE INITIALLY DEFERRED)
/

I’ve taken the liberty of naming the NOT NULL constraints in the above DDL, which is something that I would not normally do for NOT NULL constraints, as we’ll see the names in the exceptions thrown.

If we insert a row into our table with NULL for COL_1 we get:

SQL>INSERT INTO t1
  2  VALUES (NULL, 1)
  3  /
VALUES (NULL, 1)
        *
ERROR at line 2:
ORA-01400: cannot insert NULL into ("DEVELOPER"."T1"."COL_1")

However if the COL_2 value is set to NULL:

SQL>INSERT INTO t1
  2  VALUES (1, NULL)
  3  /

1 row created.

we find that the row gets created. If we were to commit at this point then the NOT NULL constraint check on COL_2 kicks in and we get:

SQL>COMMIT;
COMMIT
*
ERROR at line 1:
ORA-02091: transaction rolled back
ORA-02290: check constraint (DEVELOPER.COL_2_NOT_NULL) violated

However, if we initially insert NULL into COL_2 and then provide a value prior to the commit we find all goes through without error:

SQL>INSERT INTO t1
  2  VALUES (1, NULL)
  3  /

1 row created.

SQL>
SQL>UPDATE t1
  2  SET    COL_2 = 1
  3  WHERE  COL_1 = 1
  4  /

1 row updated.

SQL>COMMIT
  2  /

Commit complete.

You’ll note that the deferrable constraint was declared with the keywords DEFERRABLE INITIALLY DEFERRED. The DEFERRABLE keyword notes that the constraint is capable of being deferred while the INTIALLY DEFERRED part specifies that the constraint defaults to deferred checking. The alternative to this is DEFERRABLE INITIALLY IMMEDIATE, which means the constraint check is not deferred by default but can be made deferrable.

To make all deferrable constraint deferred issue the follow:

ALTER SESSION SET CONSTRAINTS = DEFERRED

and to reverse this, i.e. turn off deferred checking:

ALTER SESSION SET CONSTRAINTS = IMMEDIATE

Use:

ALTER SESSION SET CONSTRAINTS = DEFAULT

puts deferrable constraints into DEFERRED or IMMEDIATE according to how they are declared. A session can change its constraint setting in the middle of a transaction without impacting the DML that has occurred prior the change.

Note, the above commands only impact constraints that are declare to be DEFERRABLE. By default constraints are NOT DEFERRABLE, which means they can never be deferred. The only way to enable deferrable checking on a constraint that has been declared as not deferrable is to drop it and recreate it with one of the deferrable options (at least, as of Oracle 12.1.0.2. I guess future versions may support this).

If you have read my earlier post on unique constraints indexes then you’ll know that by default Oracle will create a unique index to enforce the constraint. If the unique constraint is declared as DEFERRABLE then the index Oracle creates will not be unique. On consideration this makes senses. During the course of a transaction Oracle might have duplicate entries on the unique column(s) and a unique index would prevent this scenario.

Index Compression

Whilst preparing this post I noticed that I’ve been beaten to it by Connor McDonald’s recent post on index compression. Since my post was virtually ready I’m going to post it anyway but make sure to read Connor’s and follow the links to Richard Foote’s articles too.

Index compression is a feature that’s been around since something like Oracle 8. With index compression, if the index has repeating values for the leading columns (or “attributes” since we can index expressions… I’ll continue to use the term “columns” for the rest of the post though) of the index then Oracle is able to optimise the storage of the index entries and fit more entries per block thereby producing an index that has a smaller on disk footprint.

The syntax for compressing an index is:

CREATE INDEX <index_name> 
   ON <table_name> (<columns>)
   COMPRESS <n>

where n is the number of leading columns to compress. Obviously n cannot be a number greater than there are columns in the index and it may not be worth compressing on all columns so n is often less than the number of columns in the index.

“How much smaller will the index be?” I hear you ask. Well, as with many things results will vary. Thankfully Oracle will tell you what you can expect to achieve so you can make a decision as to whether to proceed or not before actually performing the operation.

Let’s take a look at how we would go about assessing an index for compression. We’ll start with a table containing 4 million rows:

CREATE TABLE index_demo
    (id         NUMBER(10)  NOT NULL
    ,val1       NUMBER(4) NOT NULL
    ,val2       NUMBER(4) NOT NULL)
/

INSERT INTO index_demo
WITH r AS (SELECT ROWNUM rn FROM dual CONNECT BY ROWNUM <= 2000)
SELECT ROWNUM
,      MOD(ROWNUM,100)
,      TRUNC(ROWNUM/1001)
FROM   r, r
/

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','index_demo')

With this table, ID is unique, VAL1 has 100 distinct values and VAL2 has 1001 distinct values. Next we create 4 different indexes; two non-unique and two unique:

CREATE INDEX index_demo_i1
   ON index_demo (val1)
/

CREATE INDEX index_demo_i2
   ON index_demo (val2, val1)
/

CREATE UNIQUE INDEX index_demo_i3
   ON index_demo (val1, id)
/

CREATE UNIQUE INDEX index_demo_i4
   ON index_demo (id, val2)
/

Next we’ll get some basic stats for our indexes:

SELECT index_name
,      blevel
,      leaf_blocks
,      distinct_keys
,      avg_leaf_blocks_per_key AS avg_lf_key
,      num_rows
FROM   user_ind_statistics
WHERE  table_name = 'INDEX_DEMO'
ORDER  BY
       index_name
/

On my 12.1.0.2 database with an 8 kb block size, this gave the following:

INDEX_NAME       BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LF_KEY   NUM_ROWS
--------------- ------- ----------- ------------- ---------- ----------
INDEX_DEMO_I1         2        7792           100         77    4000000
INDEX_DEMO_I2         2       10006        399605          1    4000000
INDEX_DEMO_I3         2       10431       4000000          1    4000000
INDEX_DEMO_I4         2       10982       4000000          1    4000000

The key metric we’re interested in is the LEAF_BLOCKS. When we compress an index we are wanting to see a reduction in that. Oracle will tell us if we analyse the index and look at the index statistics gathered:

ANALYZE INDEX index_demo_i1 VALIDATE STRUCTURE
/

SELECT opt_cmpr_count
,      opt_cmpr_pctsave
FROM   index_stats
/

The two measures from INDEX_STATS are the recommended number of columns to compress and the percent saving in space that can be expected. Repeating the above for each index we end up with:

INDEX_NAME       OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
--------------- --------------- ----------------
INDEX_DEMO_I1                 1               21
INDEX_DEMO_I2                 2               31
INDEX_DEMO_I3                 1               15
INDEX_DEMO_I4                 0                0

So, from the above, Oracle is recommending we use COMPRESS 1 for index INDEX_DEMO_I1 and we should see a 21% reduction in space used by our index. Compression on the first of our unique indexes, INDEX_DEMO_I3, is still possible as the first column has repeating values. Oracle is saying we should see a 15% reduction in space usage. Our second unique index, INDEX_DEMO_I4, leads with a column that is unique and so Oracle says that compression isn’t worthwhile on that index (note, it’s actually still possible to apply compression to the index but you will end up increasing the size of the index!).

Now let’s test out Oracle’s recommendations and see how accurate they are. We could simply drop and recreate the indexes with the compress option but we can also rebuild them on-line, which is great for a system with high availability requirements.

ALTER INDEX index_demo_i1
   REBUILD
   COMPRESS 1
   ONLINE
/   

ALTER INDEX index_demo_i2
   REBUILD
   COMPRESS 2
   ONLINE
/   

ALTER INDEX index_demo_i3
   REBUILD
   COMPRESS 1
   ONLINE
/   

Rerunning our query against USER_IND_STATISTICS we get:

INDEX_NAME       BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LF_KEY   NUM_ROWS
--------------- ------- ----------- ------------- ---------- ----------
INDEX_DEMO_I1         2        6145           100         61    4000000
INDEX_DEMO_I2         2        6863        399605          1    4000000
INDEX_DEMO_I3         2        8782       4000000          1    4000000
INDEX_DEMO_I4         2       10982       4000000          1    4000000

Some quick calculations show the compression actually achieved:

INDEX_NAME       PCT COMPRESSION
--------------- ----------------
INDEX_DEMO_I1                 21
INDEX_DEMO_I2                 30
INDEX_DEMO_I3                 16

We can see that Oracle’s estimates of compression were quite accurate.

Index compression is a feature that can result in a significant space saving for any moderately large database. By the time the space saving is multiplied over development and test environments and back-up space it’s pretty much a no-brainer to use it whereever possible.

Flashback Data Archive

I recently started work on a small APEX application. The owner of the application requested that all data changes be tracked, which was not an unreasonable request given the nature of the application. In days of yore, i.e. pre-Oracle 12c, this would have involved something like custom PL/SQL code within an data access API or a (*shudder*) trigger-based strategy. However, recalling back to Bjoern Rost’s presentation at the 2014 AusOUG conference I decided to look into using Flashback Data Archive (FDA) instead… and I’m so glad I did!

Flashback Data Archive essentially involves defining a retention policy, e.g. something like “1 year”, and associating tables with that policy. Oracle then handles the rest. Let’s take a look at how easy it is to set this up. We will:

  • define a new tablespace
  • create a FDA policy (specifying the new tablespace as the storage)
  • create an FDA application (which is nothing more than a container for tables to allow easy FDA management; 12c only)
  • associate the application with the FDA policy
  • enable the FDA

An example is in order and we’ll use Oracle standard DEPT and EMP tables for it (using a database of version 12.1.0.2). Assuming you’ve already got the table set up in the SCOTT account we’ll create an FDA to track 2 years of data changes:

-- Create the FDA tablespace
CREATE TABLESPACE scott_fda_data 
   DATAFILE '<<path>>/scott_fda_data.dbf'
   SIZE 1M 
   AUTOEXTEND ON NEXT 1M
   MAXSIZE 200M
/

-- Grant SCOTT access to the new tablespace
ALTER USER scott 
   QUOTA UNLIMITED ON scott_fda_data
/
   
-- Create the flashback data archive policy
CREATE FLASHBACK ARCHIVE scott_fda 
   TABLESPACE scott_fda_data
   QUOTA 200M 
   RETENTION 2 YEAR
/   

-- Set up an Application so we can control FDA for all tables (Oracle 12c only)
EXEC dbms_flashback_archive.register_application('SCOTT','SCOTT_FDA')

EXEC dbms_flashback_archive.add_table_to_application ('SCOTT','DEPT','SCOTT')
EXEC dbms_flashback_archive.add_table_to_application ('SCOTT','EMP','SCOTT')

EXEC dbms_flashback_archive.enable_application ('SCOTT')

Now that we’ve got our FDA in place, let’s make some data changes. The script below makes use of DBMS_LOCK.SLEEP to insert delays of 1 minute. This is done so that later on we can query the tables as they looked back in time using semi-realistic examples.

UPDATE emp
SET    sal = sal + 10;
COMMIT;

EXEC dbms_lock.sleep (60)

INSERT INTO EMP VALUES
(8000,'MARK','MANAGER',7839,to_date('01-01-2015','dd-mm-yyyy'),2000,NULL,40);
COMMIT;

EXEC dbms_lock.sleep (60)

INSERT INTO EMP VALUES
(8010,'JAMES','DEVELOPER',8000,to_date('15-01-2015','dd-mm-yyyy'),2500,NULL,40);
COMMIT;

EXEC dbms_lock.sleep (60)

INSERT INTO EMP VALUES
(8010,'HEATHER','DEVELOPER',8000,to_date('20-01-2015','dd-mm-yyyy'),2200,NULL,40);
COMMIT;

EXEC dbms_lock.sleep (60)

UPDATE emp
SET    sal = sal + 500
WHERE  empno = 8000;
COMMIT;

Of course, there’s little point tracking data changes if you can’t easily access them. Simple flashback query clauses to SELECT allows us to query the source tables either as it looked at a specific point in time or show all the changes that have occurred across time.

For example, to view the data in a table at a specific point in time we use:

SELECT *
FROM   <table_name> AS OF TIMESTAMP <timestamp_expr>

So, for our example table we could run something like the following to find out what that the EMP data was like 5 minutes ago, just before we did the operations shown above:

SELECT *
FROM   emp AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '5' MINUTE)
ORDER  BY
       empno;

If you are wanting to get a full version history of the changes made then you can use the VERSIONS BETWEEN clause:

SELECT e.*
,      versions_starttime
,      versions_endtime
FROM   emp VERSIONS BETWEEN TIMESTAMP MINVALUE AND MAXVALUE e
WHERE  empno = 8000
ORDER  BY
       versions_starttime;

The above query includes the pseudo-columns versions_starttime andversions_endtime, which provide the date range that a specific row version was effective for.

Now for some FDA details not mentioned above:

  • Flashback Data Archive is available in all versions of Oracle from 11.2.0.4 without additional licensing. In earlier versions of 11g it required Advanced Compression license to utilise but that restriction has been removed now the archive tables are, by default, not compressed.
  • Only Oracle 12c supports the ability to create flashback applications. In 11g you associate the table directly with the FDA one by one using ALTER TABLE <<table>> FLASHBACK ARCHIVE <<fda>>. This means the FDA tracking won’t be started/stopped at precisely the same moment in time for all tables.
  • You’ll need to disable FDA whilst making DDL changes to the tracked tables; dbms_flashback_archive.disable_application
  • In order to query the tables as at an earlier time period you require FLASHBACK privilege on the table, or the FLASHBACK ANY TABLE privilege.

In summary, Flashback Data Archive provides an effective means of tracking data changes on a table. The implementation is not only simple and the interface for accessing the historical data, via extensions to the SELECT statement, is easy to harness and intuitive.

TOP n Queries

So you’ve been given a relatively simple task of getting the last 5 modified rows from a table, which contains a column that holds the modified date. How do you write the query to do that?

Back in pre-10 versions of Oracle it was likely that the query was written:

SELECT *
FROM   (SELECT *
        FROM   my_table
        ORDER  BY
               created_date DESC)
WHERE  ROWNUM <= 5

… and this works pretty well. If you had written it:

SELECT *
FROM   my_table
WHERE  ROWNUM <= 5
ORDER  BY
       created_date DESC

you would (hopefully) have observed some strange output during your testing and adjusted your query accordingly. For the Oracle newcomers, in the above query Oracle selects 5 rows and then sorts them, not the other way around, so it’s the equivalent of telling Oracle “get any 5 rows from the table and give them to me sorted descending by the created date”.

Somewhere around Oracle 10g the recommendation was to use the ROW_NUMBER analytic function in place of ROWNUM, i.e.

SELECT *
FROM   (SELECT t.*
        ,      ROW_NUMBER() OVER (ORDER BY created_date DESC) rn
        FROM   my_table t)
WHERE  rn <= 5

Now in version 12c of the database Oracle has introduced the FETCH FIRST ROWS syntax for doing exactly this kind of query. It makes things quite simple and clear:

SELECT *
FROM   my_table
ORDER  BY
       created_date DESC
FETCH FIRST 5 ROWS ONLY

Now let’s take a peek under the covers and see what Oracle is actually doing when faced with these queries. To start with we’ll create a simple table with 1,000,000 rows:

CREATE TABLE big_table
   (id            NUMBER(8)   NOT NULL
   ,cat           NUMBER(4)   NOT NULL
   ,padding       CHAR(30)    NOT NULL
   ,last_mod_date DATE        NOT NULL)
/

INSERT INTO big_table
SELECT ROWNUM
,      MOD(ROWNUM,1000) AS cat
,      'x' AS padding
,      TO_DATE('01/01/2000','dd/mm/yyyy') + dbms_random.value(0,5000) AS last_mod_date
FROM   (SELECT 'x'
        FROM   dual
        CONNECT BY ROWNUM <= 1000) x
,      (SELECT 'x'
        FROM   dual
        CONNECT BY ROWNUM <= 1000) y
/        

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','big_table')

CREATE UNIQUE INDEX  big_table_pk
   ON big_table (id)
/

ALTER TABLE big_table
   ADD CONSTRAINT big_table_pk
   PRIMARY KEY (id)
/

Obviously we want to access our last created rows as quick as possible so we’ll index that column:

CREATE INDEX big_table_ix1
   ON big_table(last_mod_date)
/   

We’ll run each query against our table with AUTOTRACE enabled to see the execution plan and cost. As with anything related to Oracle performance it’s important to keep in mind the version that you’re using as things can change across versions. In light of that statement, the following examples were run against a 12.1.0.2 database. First up is our ROWNUM approach:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   (SELECT b.id
  6          ,      b.padding
  7          ,      b.last_mod_date
  8          ,      b.cat
  9          FROM   big_table b
 10          ORDER  BY
 11                 b.last_mod_date DESC) r
 12  WHERE  ROWNUM <= 5
 13  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    545616        616 x                              08/09/2013 23:59:05
    557331        331 x                              08/09/2013 23:57:45
      5220        220 x                              08/09/2013 23:54:28
    874232        232 x                              08/09/2013 23:50:34
    610984        984 x                              08/09/2013 23:39:15


Execution Plan
----------------------------------------------------------
Plan hash value: 2877194421

-----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |               |     5 |   335 |     8   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |               |       |       |            |          |
|   2 |   VIEW                        |               |     5 |   335 |     8   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| BIG_TABLE     |  1000K|    45M|     8   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN DESCENDING| BIG_TABLE_IX1 |     5 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          9  consistent gets
          0  physical reads
          0  redo size
        954  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

As we can see from the execution plan, Oracle traversed the index on the LAST_MODIFIED_DATE in a descending fashion, returned each table entry found until, and this is the COUNT STOPKEY bit, it had returned the requested number of rows. All up the query required 9 logical IO operations so it’s very efficient.

Next up we do the same with the ROW_NUMBER query:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   (SELECT b.id
  6          ,      b.cat
  7          ,      b.padding
  8          ,      b.last_mod_date
  9          ,      ROW_NUMBER() OVER (ORDER BY b.last_mod_date DESC) AS rn
 10          FROM   big_table b) r
 11  WHERE  rn <= 5
 12  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    545616        616 x                              08/09/2013 23:59:05
    557331        331 x                              08/09/2013 23:57:45
      5220        220 x                              08/09/2013 23:54:28
    874232        232 x                              08/09/2013 23:50:34
    610984        984 x                              08/09/2013 23:39:15


Execution Plan
----------------------------------------------------------
Plan hash value: 2679878340

----------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |     5 |   400 |       | 13907   (1)| 00:00:01 |
|*  1 |  VIEW                    |           |     5 |   400 |       | 13907   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|           |  1000K|    45M|    57M| 13907   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | BIG_TABLE |  1000K|    45M|       |  1984   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("B"."LAST_MOD_DATE")
              DESC )<=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       7302  consistent gets
          0  physical reads
          0  redo size
        954  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

Well, at least we got back the same rows so our query worked but Oracle completely ignored our index and chose to perform a full table scan, costing a total of 7,302 logical IOs. Ouch!

Moving onto the FETCH FIRST ROWS syntax:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   big_table r
  6  ORDER  BY
  7         r.last_mod_date DESC
  8  FETCH FIRST 5 ROWS ONLY
  9  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    545616        616 x                              08/09/2013 23:59:05
    557331        331 x                              08/09/2013 23:57:45
      5220        220 x                              08/09/2013 23:54:28
    874232        232 x                              08/09/2013 23:50:34
    610984        984 x                              08/09/2013 23:39:15


Execution Plan
----------------------------------------------------------
Plan hash value: 2679878340

----------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |     5 |   445 |       | 13907   (1)| 00:00:01 |
|*  1 |  VIEW                    |           |     5 |   445 |       | 13907   (1)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|           |  1000K|    45M|    57M| 13907   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | BIG_TABLE |  1000K|    45M|       |  1984   (1)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("R"."LAST_MOD_DATE")
              DESC )<=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       7302  consistent gets
          0  physical reads
          0  redo size
        954  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

We can see from the execution plan that the FETCH FIRST ROWS syntax has used exactly the same execution plan as the ROW_NUMBER approach; a full table scan. Indeed, the predicate information of the plan shows that it has used the ROW_NUMBER function.

So far we have seen that the ROW_NUMBER and FETCH FIRST ROWS approaches have ignored a suitable index. Let’s look at another scenario that might be a bit more realistic. The table contains a CAT column (for CATegory), which contains 1,000 distinct values. The query we’ll run is “Get the last 5 modified rows for category 42”. For this exercise we’ll create a new index on the category and the last modified date:

CREATE INDEX big_table_ix2
   ON big_table(cat, last_mod_date)

First up the ROWNUM approach:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   (SELECT b.id
  6          ,      b.padding
  7          ,      b.last_mod_date
  8          ,      b.cat
  9          FROM   big_table b
 10          WHERE  cat = 42
 11          ORDER  BY
 12                 b.last_mod_date DESC) r
 13  WHERE  ROWNUM <= 5
 14  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    156042         42 x                              17/08/2013 23:57:57
    118042         42 x                              17/08/2013 12:44:38
    266042         42 x                              11/08/2013 20:13:13
    805042         42 x                              04/08/2013 08:45:18
    151042         42 x                              30/07/2013 06:46:54


Execution Plan
----------------------------------------------------------
Plan hash value: 200163764

------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |               |     5 |   335 |     9   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                 |               |       |       |            |          |
|   2 |   VIEW                         |               |     5 |   335 |     9   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID | BIG_TABLE     |  1000 | 48000 |     9   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN DESCENDING| BIG_TABLE_IX2 |     5 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   4 - access("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          9  consistent gets
          0  physical reads
          0  redo size
        941  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

Similar to our previous query, the ROWNUM approach does uses the new index, keeping the number of logical IO operations down to just 9. Let’s see how the ROW_NUMBER version of our query fares:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   (SELECT b.id
  6          ,      b.cat
  7          ,      b.padding
  8          ,      b.last_mod_date
  9          ,      ROW_NUMBER() OVER (ORDER BY b.last_mod_date DESC) AS rn
 10          FROM   big_table b
 11          WHERE  cat = 42) r
 12  WHERE  rn <= 5
 13  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    156042         42 x                              17/08/2013 23:57:57
    118042         42 x                              17/08/2013 12:44:38
    266042         42 x                              11/08/2013 20:13:13
    805042         42 x                              04/08/2013 08:45:18
    151042         42 x                              30/07/2013 06:46:54

Execution Plan
----------------------------------------------------------
Plan hash value: 1296513801

------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |               |     5 |   400 |  1006   (0)| 00:00:01 |
|*  1 |  VIEW                          |               |     5 |   400 |  1006   (0)| 00:00:01 |
|*  2 |   WINDOW NOSORT STOPKEY        |               |  1000 | 48000 |  1006   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID | BIG_TABLE     |  1000 | 48000 |  1006   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN DESCENDING| BIG_TABLE_IX2 |  1000 |       |     6   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RN"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("B"."LAST_MOD_DATE") DESC
              )<=5)
   4 - access("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
        941  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

Well, that’s a bit better. With the ROW_NUMBER query Oracle did elect to use the new index. The presence of the number of the INDEX RANGE SCAN DESCENDING and WINDOW NOSORT STOPKEY lines in the execution plan show that Oracle is only accessing the minimum number of entries it needs to satisfy the query, as opposed to all the entries with a CAT value of 42. This is confirmed by the statistics report showing the logical IO operations is just 10.

Now to see if the FETCH FIRST ROWS syntax does the same:

SQL>SELECT r.id
  2  ,      r.cat
  3  ,      r.padding
  4  ,      r.last_mod_date
  5  FROM   big_table r
  6  WHERE  cat = 42
  7  ORDER  BY
  8         r.last_mod_date DESC
  9  FETCH FIRST 5 ROWS ONLY
 10  /

        ID        CAT PADDING                        LAST_MOD_DATE
---------- ---------- ------------------------------ -------------------
    156042         42 x                              17/08/2013 23:57:57
    118042         42 x                              17/08/2013 12:44:38
    266042         42 x                              11/08/2013 20:13:13
    805042         42 x                              04/08/2013 08:45:18
    151042         42 x                              30/07/2013 06:46:54


Execution Plan
----------------------------------------------------------
Plan hash value: 1296513801

------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |               |     5 |   445 |  1006   (0)| 00:00:01 |
|*  1 |  VIEW                          |               |     5 |   445 |  1006   (0)| 00:00:01 |
|*  2 |   WINDOW NOSORT STOPKEY        |               |  1000 | 48000 |  1006   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID | BIG_TABLE     |  1000 | 48000 |  1006   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN DESCENDING| BIG_TABLE_IX2 |  1000 |       |     6   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY INTERNAL_FUNCTION("R"."LAST_MOD_DATE") DESC
              )<=5)
   4 - access("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0  physical reads
          0  redo size
        941  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

As per our previous query, the FETCH FIRST ROWS approach appears to be nothing more than a rewrite of the ROW_NUMBER one. The plan and the statistics are all the same.

Admittedly both of the above query scenarios are somewhat contrived. It is unlikely that the last modified date column would be indexed unless there is a driving need for top n type queries against the data. A more likely scenario would be to have an index on the CAT column only. With just this index in place all three queries performed near identical plans of:

--------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |               |     5 |   335 |  1006   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY                         |               |       |       |            |          |
|   2 |   VIEW                                 |               |  1000 | 67000 |  1006   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY               |               |  1000 | 48000 |  1006   (1)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| BIG_TABLE     |  1000 | 48000 |  1005   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                  | BIG_TABLE_IX3 |  1000 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

and had statistics of:

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
       1005  consistent gets
          0  physical reads
          0  redo size
        941  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

In this scenario any approach to finding the last modified rows is the same from a resource consumption perspective. However, it is particularly interesting how efficient a suitable index makes the query though. Without the LAST_MOD_DATE column being indexed the queries now require just over 1,000 logical IO operations, which is 100 times what they required when the column was included from the index. This provides a nice example of creating indexes that are appropriate to the queries being run against the data.

In summary, we have multiple approaches to writing a TOP n type query; a sorted in-line view with ROWNUM filtering, an in-line view with ROW_NUMBER filter and, with 12c, the FETCH FIRST ROWS syntax. If a suitable access path index is in place then all approaches seem roughly equivalent in terms of cost, except for the cast where the query run against the entire table. In this situation only the ROWNUM approach made use of an index on the LAST_MOD_DATE. As per many things related to the query optimiser, check what you might expect is actually happening and adjust accordingly.