skip_unusable_indexes

Back in some version of Oracle 10 the session modifiable parameter skip_unuable_indexes changed its default value from FALSE to TRUE. This meant that operations on tables with an unusable index did not fail with an error if Oracle arrived at an execution plan that utilised the index. For all intents and purposes Oracle ignores that the index is present… well, nearly ignores them as we will see shortly.

First of all, let’s take a quick look of what skip_unusable_indexes does using a 1,000,000 row table named LOTS_OF_ROWS:

SQL>CREATE TABLE lots_of_rows
  2     (id         NUMBER (10) NOT NULL
  3     ,cat        NUMBER (4)  NOT NULL
  4     ,padding    CHAR(30)    NOT NULL)
  5  /

Table created.

SQL>ALTER TABLE lots_of_rows
  2     ADD CONSTRAINT lots_of_rows_pk
  3     PRIMARY KEY (id)
  4  /

Table altered.

SQL>CREATE INDEX lots_of_rows_ix1
  2     ON lots_of_rows (cat)
  3  /

Index created.

SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,1000)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>ALTER SESSION SET skip_unusable_indexes = TRUE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3680026695

--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                  |  1000 | 40000 |  1005   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LOTS_OF_ROWS     |  1000 | 40000 |  1005   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LOTS_OF_ROWS_IX1 |  1000 |       |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CAT"=42)

SQL>SET AUTOTRACE OFF

In the above we created a table, LOTS_OF_ROWS, with a non-unique index on the CAT column, which has 1,000 distinct values, and populated it with 1,000,000 rows. We then ran a query selecting out the rows with a CAT value of 42. Not surprisingly Oracle elected to use the index when running the index.

Now let’s see what happens if we make the index unusable and rerun the query:

SQL>ALTER INDEX lots_of_rows_ix1 UNUSABLE
  2  /

Index altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 3711500201

----------------------------------------------------------------------------------
| Id  | Operation         | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |              |  1000 | 40000 |  1687   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| LOTS_OF_ROWS |  1000 | 40000 |  1687   (1)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAT"=42)

SQL>SET AUTOTRACE OFF

The previously generated execution plan for our query, using the index, clearly could not be used but Oracle did not fail the query. Instead Oracle came up with a new plan, using a table scan. Perhaps not the best of plans but our query did run without error.

Now let’s see what happens if SKIP_UNUSABLE_INDEXES is set to FALSE:

SQL>ALTER SESSION SET skip_unusable_indexes = FALSE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT *
  2  FROM   lots_of_rows
  3  WHERE  cat = 42
  4  /
SELECT *
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

Not surprisingly, with the SKIP_UNUSABLE_INDEXES set to FALSE Oracle failed the query.

However, can ORA-01502 be throw even if SKIP_UNUSABLE_INDEXES is set to TRUE? The answer is yes so let’s look at how this might happen:

SQL>ALTER SESSION SET skip_unusable_indexes = TRUE
  2  /

Session altered.

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT /*+ index (lots_of_rows lots_of_rows_ix1) */
  2         *
  3  FROM   lots_of_rows
  4  WHERE  cat = 42
  5  /
SELECT /*+ index (lots_of_rows lots_of_rows_ix1) */
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

In the above example, we explicitly hinted the query to use the unusable index. In this situation Oracle will attempt to abide by the hint and fail the statement when it finds the index is unusable. The also applies using the variation of the INDEX hint that specifies the column name, rather than the index name:

SQL>SET AUTOTRACE TRACEONLY EXPLAIN

SQL>SELECT /*+ index (lots_of_rows (cat)) */
  2         *
  3  FROM   lots_of_rows
  4  WHERE  cat = 42
  5  /
SELECT /*+ index (lots_of_rows (cat)) */
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.LOTS_OF_ROWS_IX1' or partition of such index is in unusable state

SQL>SET AUTOTRACE OFF

To summarise this post then:

  • When the SKIP_UNUSABLE_INDEXES parameter is set to TRUE it allows Oracle to ignore any indexes in an unusable state, and this the standard default setting
  • Skipping disabled indexes can lead to poor execution plans, which should not be surprising since those indexes were created for a reason
  • Queries specifically hinted to use an index will not ignore the unusable index and will throw an ORA-01502 error

Existence checks using COUNT(*)

One thing guaranteed to draw my ire in a code review is the use of COUNT(*) to test for the existence of data. This can be done in a number of ways. For example, a simple count via a SELECT statement and then checking if the result was 1 or greater:

...
SELECT COUNT(*)
INTO   l_rows_exist
FROM   some_table
WHERE  ... blah,blah,blah;

IF (l_rows_exist >= 1) THEN
   ... blah,blah,blah;
END IF;

or as some sort of logical sub-query construct inside a query:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah) >= 1

Unless there’s some constraint involved that will guarantee that the query will return at the most one row counting all entries to simply detect if one entry exists is asking for trouble. Oracle will cheerfully table scan a billion row table looking for all rows that satisfy the query even after it found a matching entry in the first block examined.

Let’s see that in action using a 1,000,000 row table, using AUTOTRACE to obtain some statistics:

SQL>CREATE TABLE lots_of_rows
  2     (id      NUMBER(10) NOT NULL
  3     ,cat     NUMBER(2)  NOT NULL
  4     ,padding CHAR(30)   NOT NULL)
  5  /

Table created.

SQL>-- generate 1,000,000 rows
SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,100)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) >= 1
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 3376271402

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |     1 |       |  1652   (1)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   FAST DUAL         |              |     1 |       |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |              |     1 |     3 |            |          |
|*  4 |    TABLE ACCESS FULL| LOTS_OF_ROWS | 10000 | 30000 |  1650   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( (SELECT COUNT(*) FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42)>=1)
   4 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6069  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The execution plan above clearly shows a full table scan and the consistent gets statistic would confirm that the entire table was indeed scanned. The solution to this situation is trivial: rewrite the query to use EXISTS. In our example above, this would be to use:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  EXISTS (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah)

The impact of this simple change can be seen using AUTOTRACE:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  EXISTS(SELECT 'X'
  4                FROM   lots_of_rows
  5                WHERE  cat = 42)
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

While the execution plan still shows a full table scan the consistent gets has dropped to just 3. This would have been the number that Oracle performed before it found the fist “42” entry and stopped scanning the table. The query also has be advantage of being better documented as EXISTS describes perfectly what if being sought.

However, perhaps my ire at the use of COUNT(*) is somewhat misplaced. The Oracle optimiser is capable of detecting subqueries in the form of:

(SELECT COUNT(*) FROM ....) > 0

and automatically translate it into the equivalent EXISTS form:

EXISTS (SELECT 'X' FROM ....)

We can see this in action using our million row table:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) > 0
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So, while we wrote the query to use SELECT COUNT(*) the predicate information section in the above AUTOTRACE output shows that Oracle actually used EXISTS. Note, this rewrite only occurs if the test is “> 0”. The rewrite will not take place using a test of “>= 1”, which is what the original example above used.

I’m not sure what version of Oracle that this automatic rewrite appeared. It certainly works in 11.2 and above. It’s a nice feature but I would still argue that it would not be good practice to rely of this kind of thing and to explicitly write the query using EXISTS.

Oracle performance problem case study 2

I had just settled down for a bite to eat for lunch when two developers approached me with rather guilty expressions. Clearly something was up and, given that there had been a significant deployment for the part of the system that they were responsible for the night before, I immediately suspected a production performance problem.

This was confirmed when they told me that one of the production processes was taking far too long; 20 to 30 seconds instead of sub-second. Their suspicion was that they had some sort of query performance problem and they needed help to identify and resolve it.

They gave me the name of the service and the two PL/SQL routines that were taking too long. Each routine invoked multiple SQL statements. Once again I turned to Active Session History, which tends to be my first tool of choice for this type of problem. I ran a query similar to the following to see what long running statements had originated from the service for the past hour:

SELECT session_id
,      session_serial#
,      program
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
,      COUNT(*) AS duration_secs
,      MIN(sample_time) AS start_exec
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    session_type = 'FOREGROUND'
AND    program = :service_name
GROUP  BY
       program
,      session_id
,      session_serial#
,      machine
,      module
,      action
,      client_id
,      sql_id
,      sql_exec_id
ORDER  BY
       COUNT(*) DESC

The query result showed a lot of 2 and 3 second queries with just two SQL_IDs listed. Taking these two SQL_IDs, I ran a query like:

SELECT *
FROM   v$active_session_history
WHERE  sample_time >= TO_TIMESTAMP(:start_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sample_time <= TO_TIMESTAMP(:end_date_time,'dd/mm/yyyy hh24:mi:ss')
AND    sql_id    = :sql_id
ORDER  BY
       sample_time

The event details showed that the queries were spending their time on CPU so no contention or other wait problems that would point to some system-wide issue. More telling though was the SQL_PLAN_OPERATION and SQL_PLAN_OPTIONS. These details showed full table scans were being done.

With this information it was time to find out where the SQL statements were and where they resided, using the following query:

SELECT s.sql_text
,      (SELECT o.object_name FROM dba_objects o WHERE s.program_id = o.object_id) AS program_name
,      s.program_line#
FROM   v$sql
WHERE  sql_id = :sql_id

and what their execution plans were:

SELECT *
FROM   TABLE(dbms_xplan.display_cursor(:sql_id,:sql_child_number))

As it turned out both SQL statements were relatively simple SELECTs along the lines of:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    (:param1 IS NULL OR t1.p1 = :param1)
AND    (:param2 IS NULL OR t2.p2 = :param2)

The tables had indexes on the ID, P1 and P2 columns. While there were two variables used by the queries the developers explained that only one would be specified and the other would be NULL. The execution plan Oracle had chosen, running under version 11.2.0.4, was to table scan both tables and hash join them.

The queries were rather nice examples of a developer trying to do too much in a single query and performance suffering as a result. Each query were designed to satisfy two distinct scenarios. While those two scenarios result in very similar queries it had forced Oracle into a position where it has no choice but to scan the tables involved. The developer had hoped that Oracle would choose the relevant index, on P1 or P2, depending on what parameters were provided, i.e. Oracle would separate out the two different scenarios at run time.

The solution to this problem was to rewrite the queries separating out the different scenarios. One way was to use a UNION query:

SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t1.p1 = :param1
UNION
SELECT t1.col_1
,      t1.col_2
,      t2.col_1
,      t2.col_2
FROM   t1
,      t2
WHERE  t1.id = t2.id
AND    t2.p2 = :param2

In the above whichever parameter is NULL will result in no rows being returned from that part of the query and so the result will only be from the part where the parameters is specified. Best of all Oracle would be able to optimise the query to use indexes for each part and so the performance problem would be overcome.

Once I pointed out the problem to the developers they elected to go with a slightly different solution though. The query was SELECT INTO query contained within the PL/SQL package so they simply split it using conditional logic:

IF l_param1 IS NOT NULL) THEN
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t1.p1 = l_param1;
ELSE
   SELECT t1.col_1
   ,      t1.col_2
   ,      t2.col_1
   ,      t2.col_2
   INTO   l_var_1
   ,      l_var_2
   ,      l_var_3
   ,      l_var_4
   FROM   t1
   ,      t2
   WHERE  t1.id = t2.id
   AND    t2.p2 = l_param2;
END IF;   

The change was a simple rewrite to the two problematic routines. As the developers had heeded advice about avoiding session state the change was easily deployed into the production system without requiring an outage. Performance for both routines dropped from 2 to 3 seconds to around 1 millisecond.

I went back to lunch…