Point in time extracts

Extracts from a database are a common activity for any system. In our information driven world we are constantly extracting data for reporting systems, data warehouses, extracts for other systems or simple reports. Naturally, getting data out of an Oracle database is as simple as writing one or more SQL queries. Things can get a bit tricky when we need to consider the consistency of the data we’re extracting.

As an example of data extract inconsistency, one place I worked at had a requirement to extract data into CSV files for a third party on a daily basis. Primarily the data consisted of account details and transactions on those accounts. The third party started to complain that the account balances did not match up with the transactions that were reported. The cause for the difference was easily diagnosed. The overall process simply ran one query after another dumping the results to a CSV file. The transaction extract occurred after the account details one. By the time the account details extract had finished new transactions had made their way into the system and were picked up by the transaction extract. These new transactions were not reflected in the account balances of the account extract. What the extract process required was to do all the extracts as at the same point in time and Oracle provides this functionality via flashback query.

Flashback query has been part of Oracle since version 9. In it’s simplest form it’s a matter of specifying the point in time that you want to query a table, e.g.:

SELECT *
FROM   transactions AS OF TIMESTAMP CAST(TRUNC(SYSDATE) AS TIMESTAMP)

which will display the details of the TRANSACTION table as at the start of the day. Activity on the table since the start of the day will be rolled back for the purpose of the query. Using the above approach it’s possible to construct a set of queries that are consistent at the same point in time, which is ideal for extracts that need to be data consistent with each other.

Using PKG_CSV, creating a data consistent set of extracts is as simple as:

DECLARE
   l_directory         VARCHAR2(30) := 'EXTRACT_DIR';
   l_extract_timestamp TIMESTAMP     := CAST(TRUNC(SYSDATE) AS TIMESTAMP);
   l_emp_cur           SYS_REFCURSOR;
   l_dept_cur          SYS_REFCURSOR;
BEGIN
   OPEN l_emp_cur
   FOR
   SELECT * FROM emp AS OF TIMESTAMP l_extract_timestamp;
   
   OPEN l_dept_cur
   FOR
   SELECT * FROM dept AS OF TIMESTAMP l_extract_timestamp;
   
   pkg_csv.to_file (p_cursor      => l_emp_cur
                   ,p_directory   => l_directory
                   ,p_filename    => 'emp_' || TO_CHAR(l_extract_timestamp,'yyyymmddhh24miss') || '.csv');
   pkg_csv.to_file (p_cursor      => l_dept_cur
                   ,p_directory   => l_directory
                   ,p_filename    => 'dept_' || TO_CHAR(l_extract_timestamp,'yyyymmddhh24miss') || '.csv');
END;

So how far back can you go with this type of query? Oracle uses the undo segments (or rollback) to work out what the data looked like in the past. The obvious question is then “how much undo does Oracle retain?”. Unfortunately the answer to this isn’t clear cut. Generally speaking, Oracle will retain undo for at least the duration specified by the initialisation parameter UNDO_RETENTION, which is specified in seconds. This parameter defaults to 900, or 15 minutes, although many systems will have this set to a larger value. The Oracle documentation contains this warning on the way Oracle manages the undo available:

The UNDO_RETENTION parameter can only be honored if the current undo tablespace has enough space. If an active transaction requires undo space and the undo tablespace does not have available space, then the system starts reusing unexpired undo space. This action can potentially cause some queries to fail with a “snapshot too old” message

You can read more about flashback query in the Oracle documentation.

Strongly typed cursor variables

Arguably the easiest way to return a set of data back to a client application from PL/SQL is via a cursor variable. Oracle simply opens a cursor to a query and the application pulls data from the cursor on demand. Conveniently Oracle provides a generic SYS_REFCURSOR data type that can be used for any cursor variable, as shown below to retrieve employee details:

CREATE OR REPLACE PACKAGE pkg_emp
AS
   PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR);
END pkg_emp;
/
   
CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR)
AS
BEGIN
   OPEN p_emp_cur
   FOR
   SELECT id
   ,      first_name
   ,      last_name
   ,      dept_id
   FROM   emp;
END get_emps;

END pkg_emp;
/

While the above is very quick to code it suffers from a significant flaw; an ill-defined contract between the provider of the data (the database via the PL/SQL routine) and the consumer of the data (whatever application is calling the routine). The key problem here is that the SYS_REFCURSOR type can be used with any query. In this regard it’s like having an “any data” type in an application programming language.

If I were to modify the query inside the get_emps routine to return the department name then I could simply compile a new version as:

CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

   PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR)
   AS
   BEGIN
      OPEN p_emp_cur
      FOR
      SELECT e.id
      ,      e.first_name
      ,      e.last_name
      ,      e.dept_id
      ,      d.dept_name
      FROM   emp e
      ,      dept d
      WHERE  d.id = e.dept_id;
   END get_emps;

END pkg_emp;
/

All done, right? Well, although this code will compile and run correctly I have probably inadvertently broken any application that was using it. The code does nothing to enforce the structure of the cursor that I am returning so any query is deemed acceptable. From Oracle’s perspective this is fine but it won’t be okay to the consumers of the cursor, which will be expecting a cursor of a particular structure.

A better way to code the above routine is to use a strongly typed cursor, which declares in advance the attributes and data types that the cursor will contain. If the cursor is opened using a query that does not match the definition then a compile time error is thrown. Sure, the definition of the strongly type cursor could be altered to match the new query but the error acts as a warning to the database developer that they might be about to break the application.

So, how do we create a strongly typed cursor variable? My preferred approach is to use a record type defined in the package specification as shown below:

CREATE OR REPLACE PACKAGE pkg_emp
AS
   TYPE t_emp_rec IS RECORD (id         emp.id%TYPE
                            ,first_name emp.first_name%TYPE
                            ,last_name  emp.last_name%TYPE
                            ,dept_id    emp.dept_id%TYPE);
   TYPE t_emp_cur IS REF CURSOR RETURN t_emp_rec;                          

   PROCEDURE get_emps (p_emp_cur OUT t_emp_cur);
END pkg_emp;
/
   
CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

PROCEDURE get_emps (p_emp_cur OUT t_emp_cur)
AS
BEGIN
   OPEN p_emp_cur
   FOR
   SELECT id
   ,      first_name
   ,      last_name
   ,      dept_id
   FROM   emp;
END get_emps;

END pkg_emp;
/

The record type is useful for “self documenting” the code as shows what data will be returned by the cursor without having to go into the package body and read the query. It’s a little bit of extra typing but now when I try to modify the package body and replace DEPT_ID with DEPT_NAME I get the following compilation error:

CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

   PROCEDURE get_emps (p_emp_cur OUT t_emp_cur)
   AS
   BEGIN
      OPEN p_emp_cur
      FOR
      SELECT e.id
      ,      e.first_name
      ,      e.last_name
      ,      e.dept_id
      ,      d.dept_name
      FROM   emp e
      ,      dept d
      WHERE  d.id = e.dept_id;
   END get_emps;

END pkg_emp;
/

Warning: Package Body created with compilation errors.

SQL> show errors
Errors for PACKAGE BODY PKG_EMP:

LINE/COL ERROR
-------- ---------------------------------------------
7/7      PL/SQL: SQL Statement ignored
9/7      PLS-00382: expression is of wrong type

The “expression is of wrong type” is telling me that my query no longer matches the record declaration in the package specification. Time for me to revise my code… or change the record type and go searching for the application code that calls this routine and ensure it will continue to work with the new cursor. In a large, complex multi-tier system additional checks and balances like strongly typed cursors are invaluable to prevent inadvertent mistakes.

Proxy Connections

Managing development environments is a fickle business. On one hand you want to be able to grant developers whatever privileges they require to make the development process easy. On the other hand you don’t want the environment to become so disorganised that it’s hard to figure out where the hacking ends and the real application begins.

Database applications are typically structured with database objects in one or more accounts (users). Sharing around the password for the application account can lead to trouble, especially in non-development environments where there is supposed to be deployment controls in place. Once an account password has been given out then it can be assumed that it is generally known by all developers. Proxy connections allow access to be given without having to compromise the application account password.

The ALTER USER command is used to configure the proxy connection details. Suppose your application account is named APP_OWNER and you want to connect to it through a developer’s account named MARK. The following is the DDL required:

ALTER USER app_owner GRANT CONNECT THROUGH mark
/

So, now the user MARK is able to connect to the APP_OWNER account as it they had the credentials for that account. Access can be granted for a particular task and then easily revoked later once the task is complete.

So, once set up how do you go about logging into the database using the proxy connection details? The following describes how to do it using the SQL*Plus command line, Oracle’s SQL Developer and Dell’s (formerly Quest) TOAD.

SQL*Plus
Proxy account connections using SQL*Plus involves specifying the details of the account you intend to connect to inside square brackets as part of the connection details, e.g.:

CONNECT mark[app_owner]

This can all be done direct from the OS command prompt when invoking SQL*Plus:

> sqlplus mark[app_owner]@<<database>>

You’ll be prompted for a password. Enter the password for the MARK account and you’ll be connected to the APP_OWNER account.

Oracle SQL Developer
Using Oracle’s SQL Developer, in the connection window there is a check-box near the bottom labelled “Proxy Connection”.

SQL Developer connection window

Checking this will cause a new window to open, into which you enter the proxy connection details. Using the basic set up described earlier all that would be required in this second window is proxy account name in the Proxy Client field.

SQL Developer proxy connection window

TOAD
Unlike SQL Developer, TOAD doesn’t have any special settings on its connection window for proxy connections. Instead the details are entered directly into the user/schema field, using the square bracket notation that SQL*Plus uses, i.e. username[proxy user] as shown below.

TOAD connection window

Package: pkg_delay

Working on a traditional OLTP system means dealing with lots of short transactions. Handling concurrency correctly is a significant concern and application design is tailored to avoid blocking where possible. Having said that, occasionally concurrency problems do arise. Identifying the source of the problem can be tricky but resolution is often simple. The largest problem often lies in the testing of the solution.

In a system where the duration of transactions is measured in milliseconds it can be near impossible to deliberately create the same scenario that gave rise to a concurrency issue. For example, a software tester recently encountered a deadlock whilst running a trivial test one day, unrelated to the functionality that they were testing. After looking into the trace files generated by Oracle it was found that two separate code paths obtained locks on resources in a reverse manner, i.e. one process obtained locks on A and then B, whilst the other did B and then A. This problem had never occurred in production as the probability of it occurring was very, very small… but it had occurred by chance in the test environment. Fixing the code was a trivial task but in order to test that the problem had been resolved we had to slow down the code to widen the window of opportunity for it to occur. This was done by inserting a sleep between where the two locks were obtained. Hence PKG_DELAY was born…

Hard coding a DBMS_LOCK.SLEEP command into the PL/SQL code is a crude but effective way of delaying code execution at some critical point. PKG_DELAY has a couple of advantages over a simple call to DBMS_LOCK.SLEEP:

  • The delay period is controlled via a global context. This permits the delay to be set by a separate session when the test is ready to start, rather than have it always on
  • The delay period can be adjusted at run time, again due to it being controlled via a global context. This has proved useful in tests that have sought to assess the impact of slow processes within the system.
  • Any code calling out to PKG_DELAY will have a dependency on this package. Since PKG_DELAY is never deployed to the production environment, if any test code containing a call out to PKG_DELAY is deployed then it will fail compilation. Not the best scenario perhaps but better than deploying code with a sleep in it…

PKG_DELAY has just two routines:

  • set_delay
    This routine takes a name for a delay and a duration for that delay
  • delay
    Performs a sleep of a duration specified by the set_delay routine for the name specified

So, a delay call might be inserted into code as:

...
pkg_delay.delay ('BEFORE_OPERATION');
...

The code will now run as per usual with virtually no overhead from the call to PKG_DELAY. Once the test is ready to be performed, requiring say a 2 second delay, in a separate session the following is executed:

EXEC pkg_delay.set_delay ('BEFORE_OPERATION',2)

Download PKG_DELAY

skip_unusable_indexes

Back in some version of Oracle 10 the session modifiable parameter skip_unuable_indexes changed its default value from FALSE to TRUE. This meant that operations on tables with an unusable index did not fail with an error if Oracle arrived at an execution plan that utilised the index. For all intents and purposes Oracle ignores that the index is present… well, nearly ignores them as we will see shortly.

First of all, let’s take a quick look of what skip_unusable_indexes does using a 1,000,000 row table named LOTS_OF_ROWS:

SQL>CREATE TABLE lots_of_rows
  2     (id         NUMBER (10) NOT NULL
  3     ,cat        NUMBER (4)  NOT NULL
  4     ,padding    CHAR(30)    NOT NULL)
  5  /

Table created.

SQL>ALTER TABLE lots_of_rows
  2     ADD CONSTRAINT lots_of_rows_pk
  3     PRIMARY KEY (id)
  4  /

Table altered.

SQL>CREATE INDEX lots_of_rows_ix1
  2     ON lots_of_rows (cat)
  3  /

Index created.

SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,1000)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>ALTER SESSION SET skip_unusable_indexes = TRUE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3680026695

--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                  |  1000 | 40000 |  1005   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LOTS_OF_ROWS     |  1000 | 40000 |  1005   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LOTS_OF_ROWS_IX1 |  1000 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CAT"=42)

SQL>SET AUTOTRACE OFF

In the above we created a table, LOTS_OF_ROWS, with a non-unique index on the CAT column, which has 1,000 distinct values, and populated it with 1,000,000 rows. We then ran a query selecting out the rows with a CAT value of 42. Not surprisingly Oracle elected to use the index when running the index.

Now let’s see what happens if we make the index unusable and rerun the query:

SQL>ALTER INDEX lots_of_rows_ix1 UNUSABLE
  2  /

Index altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3711500201

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1000 | 40000 |  1687   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| LOTS_OF_ROWS |  1000 | 40000 |  1687   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAT"=42)

SQL>SET AUTOTRACE OFF

The previously generated execution plan for our query, using the index, clearly could not be used but Oracle did not fail the query. Instead Oracle came up with a new plan, using a table scan. Perhaps not the best of plans but our query did run without error.

Now let’s see what happens if SKIP_UNUSABLE_INDEXES is set to FALSE:

SQL>ALTER SESSION SET skip_unusable_indexes = FALSE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /
SELECT *
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

Not surprisingly, with the SKIP_UNUSABLE_INDEXES set to FALSE Oracle failed the query.

However, can ORA-01502 be throw even if SKIP_UNUSABLE_INDEXES is set to TRUE? The answer is yes so let’s look at how this might happen:

SQL>ALTER SESSION SET skip_unusable_indexes = TRUE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT /*+ index (lots_of_rows lots_of_rows_ix1) */
  2         *
  3  FROM   lots_of_rows
  4  WHERE  cat = 42
  5  /
SELECT /*+ index (lots_of_rows lots_of_rows_ix1) */
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

In the above example, we explicitly hinted the query to use the unusable index. In this situation Oracle will attempt to abide by the hint and fail the statement when it finds the index is unusable. The also applies using the variation of the INDEX hint that specifies the column name, rather than the index name:

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT /*+ index (lots_of_rows (cat)) */
  2         *
  3  FROM   lots_of_rows
  4  WHERE  cat = 42
  5  /
SELECT /*+ index (lots_of_rows (cat)) */
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

To summarise this post then:

  • When the SKIP_UNUSABLE_INDEXES parameter is set to TRUE it allows Oracle to ignore any indexes in an unusable state, and this the standard default setting
  • Skipping disabled indexes can lead to poor execution plans, which should not be surprising since those indexes were created for a reason
  • Queries specifically hinted to use an index will not ignore the unusable index and will throw an ORA-01502 error

Existence checks using COUNT(*)

One thing guaranteed to draw my ire in a code review is the use of COUNT(*) to test for the existence of data. This can be done in a number of ways. For example, a simple count via a SELECT statement and then checking if the result was 1 or greater:

...
SELECT COUNT(*)
INTO   l_rows_exist
FROM   some_table
WHERE  ... blah,blah,blah;

IF (l_rows_exist >= 1) THEN
   ... blah,blah,blah;
END IF;

or as some sort of logical sub-query construct inside a query:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah) >= 1

Unless there’s some constraint involved that will guarantee that the query will return at the most one row counting all entries to simply detect if one entry exists is asking for trouble. Oracle will cheerfully table scan a billion row table looking for all rows that satisfy the query even after it found a matching entry in the first block examined.

Let’s see that in action using a 1,000,000 row table, using AUTOTRACE to obtain some statistics:

SQL>CREATE TABLE lots_of_rows
  2     (id      NUMBER(10) NOT NULL
  3     ,cat     NUMBER(2)  NOT NULL
  4     ,padding CHAR(30)   NOT NULL)
  5  /

Table created.

SQL>-- generate 1,000,000 rows
SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,100)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) >= 1
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 3376271402

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |     1 |       |  1652   (1)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   FAST DUAL         |              |     1 |       |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |              |     1 |     3 |            |          |
|*  4 |    TABLE ACCESS FULL| LOTS_OF_ROWS | 10000 | 30000 |  1650   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( (SELECT COUNT(*) FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42)>=1)
   4 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6069  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The execution plan above clearly shows a full table scan and the consistent gets statistic would confirm that the entire table was indeed scanned. The solution to this situation is trivial: rewrite the query to use EXISTS. In our example above, this would be to use:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  EXISTS (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah)

The impact of this simple change can be seen using AUTOTRACE:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  EXISTS(SELECT 'X'
  4                FROM   lots_of_rows
  5                WHERE  cat = 42)
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

While the execution plan still shows a full table scan the consistent gets has dropped to just 3. This would have been the number that Oracle performed before it found the fist “42” entry and stopped scanning the table. The query also has be advantage of being better documented as EXISTS describes perfectly what if being sought.

However, perhaps my ire at the use of COUNT(*) is somewhat misplaced. The Oracle optimiser is capable of detecting subqueries in the form of:

(SELECT COUNT(*) FROM ....) > 0

and automatically translate it into the equivalent EXISTS form:

EXISTS (SELECT 'X' FROM ....)

We can see this in action using our million row table:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) > 0
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So, while we wrote the query to use SELECT COUNT(*) the predicate information section in the above AUTOTRACE output shows that Oracle actually used EXISTS. Note, this rewrite only occurs if the test is “> 0″. The rewrite will not take place using a test of “>= 1″, which is what the original example above used.

I’m not sure what version of Oracle that this automatic rewrite appeared. It certainly works in 11.2 and above. It’s a nice feature but I would still argue that it would not be good practice to rely of this kind of thing and to explicitly write the query using EXISTS.

Oracle performance problem case study 2

I had just settled down for a bite to eat for lunch when two developers approached me with rather guilty expressions. Clearly something was up and, given that there had been a significant deployment for the part of the system that they were responsible for the night before, I immediately suspected a production performance problem.

This was confirmed when they told me that one of the production processes was taking far too long; 20 to 30 seconds instead of sub-second. Their suspicion was that they had some sort of query performance problem and they needed help to identify and resolve it.

They gave me the name of the service and the two PL/SQL routines that were taking too long. Each routine invoked multiple SQL statements. Once again I turned to Active Session History, which tends to be my first tool of choice for this type of problem. I ran a query similar to the following to see what long running statements had originated from the service for the past hour:

SELECT session_id
,      session_serial#
,      program
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
,      COUNT(*) AS duration_secs
,      MIN(sample_time) AS start_exec
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    session_type = 'FOREGROUND'
AND    program = :service_name
GROUP  BY
       program
,      session_id
,      session_serial#
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
ORDER  BY
       COUNT(*) DESC

The query result showed a lot of 2 and 3 second queries with just two SQL_IDs listed. Taking these two SQL_IDs, I ran a query like:

SELECT *
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sql_id    = :sql_id
ORDER  BY
       sample_time

The event details showed that the queries were spending their time on CPU so no contention or other wait problems that would point to some system-wide issue. More telling though was the SQL_PLAN_OPERATION and SQL_PLAN_OPTIONS. These details showed full table scans were being done.

With this information it was time to find out where the SQL statements were and where they resided, using the following query:

SELECT s.sql_text
,      (SELECT o.object_name FROM dba_objects o WHERE s.program_id = o.object_id) AS program_name
,      s.program_line#
FROM   v$sql
WHERE  sql_id = :sql_id

and what their execution plans were:

SELECT *
FROM   TABLE(dbms_xplan.display_cursor(:sql_id,:sql_child_number))

As it turned out both SQL statements were relatively simple SELECTs along the lines of:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    (:param1 IS NULL OR t1.p1 = :param1)
AND    (:param2 IS NULL OR t2.p2 = :param2)

The tables had indexes on the ID, P1 and P2 columns. While there were two variables used by the queries the developers explained that only one would be specified and the other would be NULL. The execution plan Oracle had chosen, running under version 11.2.0.4, was to table scan both tables and hash join them.

The queries were rather nice examples of a developer trying to do too much in a single query and performance suffering as a result. Each query were designed to satisfy two distinct scenarios. While those two scenarios result in very similar queries it had forced Oracle into a position where it has no choice but to scan the tables involved. The developer had hoped that Oracle would choose the relevant index, on P1 or P2, depending on what parameters were provided, i.e. Oracle would separate out the two different scenarios at run time.

The solution to this problem was to rewrite the queries separating out the different scenarios. One way was to use a UNION query:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t1.p1 = :param1
UNION
SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t2.p2 = :param2

In the above whichever parameter is NULL will result in no rows being returned from that part of the query and so the result will only be from the part where the parameters is specified. Best of all Oracle would be able to optimise the query to use indexes for each part and so the performance problem would be overcome.

Once I pointed out the problem to the developers they elected to go with a slightly different solution though. The query was SELECT INTO query contained within the PL/SQL package so they simply split it using conditional logic:

IF l_param1 IS NOT NULL) THEN
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t1.p1 = l_param1;
ELSE
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t2.p2 = l_param2;
END IF;   

The change was a simple rewrite to the two problematic routines. As the developers had heeded advice about avoiding session state the change was easily deployed into the production system without requiring an outage. Performance for both routines dropped from 2 to 3 seconds to around 1 millisecond.

I went back to lunch…