More Implicit Date Conversions

A DBA colleague recently approached me with a rather confusing query execution plan. The query itself was relatively simple, somewhat similar to:

SELECT /*+ index (some_table some_table_i1) */
       *
FROM   some_table
WHERE  date_col >= :X

The column DATE_COL was indexed and the query was hinted to use the index as the bind variable value supplied would return a small subset of rows from the table. The execution plan desired was:

-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| SOME_TABLE    |
|*  2 |   INDEX RANGE SCAN          | SOME_TABLE_I1 |
-----------------------------------------------------

but instead Oracle was using:

-----------------------------------------------------
| Id  | Operation                   | Name          |
-----------------------------------------------------
|   0 | SELECT STATEMENT            |               |
|   1 |  TABLE ACCESS BY INDEX ROWID| SOME_TABLE    |
|*  2 |   INDEX FULL SCAN           | SOME_TABLE_I1 |
-----------------------------------------------------

Simulating the query in SQL*Plus showed the index range scan, not the index full scan, which was puzzling. The solution to the problem, as resolved by Connor McDonald, was in the Predicate Information of the plan:

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(INTERNAL_FUNCTION("DATE_COL")>=:X)

Oracle wasn’t using a range scan on the date column as it was applying a function to the column in order to compare it to the bind variable. As it turned out, the bind variable data type as provided by the external application was not a DATE but a TIMESTAMP. So, in doing the comparison Oracle was casting the DATE column to TIMESTAMP, which defeated the effective use of the index. At this point, I kicked myself for not looking closely enough at the detail, especially after I had diagnosed and resolved a near identical problem a year ago on a different system.

We can construct a simple test to illustrate the problem using the following set-up:

CREATE TABLE date_test
   (id        NUMBER(10) NOT NULL
   ,log_date  DATE       NOT NULL
   ,padding   CHAR(30)   NOT NULL
   )
/

INSERT INTO date_test
SELECT ROWNUM
,      TRUNC(SYSDATE) - 100 + (ROWNUM/100)
,      'X'
FROM   dual
CONNECT BY ROWNUM <= 10000
/

EXEC dbms_stats.gather_table_stats ('','date_test',method_opt=>'for all indexed columns size auto')

CREATE UNIQUE INDEX date_test_pk
   ON date_test (id)
/

CREATE INDEX date_test_i1
   ON date_test (log_date)
/

So, we have a simple table with an indexed date column, LOG_DATE, containing 10,000 rows. The values of LOG_DATE evenly span the last 100 days. The execution plan for a simple query using a DATE bind variable can be obtained with:

EXPLAIN PLAN
FOR
SELECT *
FROM   date_test
WHERE  log_date >= TO_DATE(:some_date)
/

SELECT *
FROM   TABLE(dbms_xplan.display())
/

and this results in the following output:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 2693408247

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 50000 |  2148K|    87   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DATE_TEST    | 50000 |  2148K|    87   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | DATE_TEST_I1 |  9000 |       |    26   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("LOG_DATE">=TO_DATE(:SOME_DATE))

In the above query, note how we explicitly cast the bind as a DATE data type. If we repeat the test but cast the bind variable to a TIMESTAMP, using:

EXPLAIN PLAN
FOR
SELECT /*+ index (date_test date_test_i1) */
       *
FROM   date_test
WHERE  log_date >= TO_TIMESTAMP(:some_date)
/

SELECT *
FROM   TABLE(dbms_xplan.display())
/

we get the following output:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 2054711204

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              | 50000 |  2148K|  3036   (2)| 00:00:37 |
|   1 |  TABLE ACCESS BY INDEX ROWID| DATE_TEST    | 50000 |  2148K|  3036   (2)| 00:00:37 |
|*  2 |   INDEX FULL SCAN           | DATE_TEST_I1 | 50000 |       |  2697   (2)| 00:00:33 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(INTERNAL_FUNCTION("LOG_DATE")>=TO_TIMESTAMP(:SOME_DATE))

We can see that the Predicate Information section is performing a filter operation via the index. So, Oracle is scanning the all entires in the index, casting the column date found for each one to a timestamp and using that to perform the bind variable comparison. The earlier query using the DATE data type contains an access operation, meaning that Oracle used the index branch structure to traverse directly down to the start of the entries that would satisfy the comparison criteria. From there it can move through the index entries knowing that all entries encountered would satisfy the search criteria. No data type casting is required to perform the comparisons. When the bind variable used will filter out the majority of the rows in the table the index range scan operation will be far faster.

In the TIMESTAMP example I had to explicitly hint the query to use the index, otherwise Oracle would choose a full table scan. However, this is exactly what the problematic application query had done so perhaps the original developers had noticed the query was not using the index and had hinted it without realising why Oracle had avoided the use of the index in the first place.

As noted earlier, I had previously encountered a similar bug in a different application. In that situation the problem was more apparent from a performance perspective as the table in question was partitioned on the date column. Altering our example table we can see the impact:

DROP TABLE date_test PURGE
/

CREATE TABLE date_test
   (id        NUMBER(10) NOT NULL
   ,log_date  DATE       NOT NULL
   ,padding   CHAR(30)   NOT NULL
   )
PARTITION BY RANGE (log_date) INTERVAL (INTERVAL '7' DAY)
   (PARTITION p1 VALUES LESS THAN (TO_DATE('01/01/2012','dd/mm/yyyy')))
/

INSERT INTO date_test
SELECT ROWNUM
,      TRUNC(SYSDATE) - 100 + (ROWNUM/100)
,      'X'
FROM   dual
CONNECT BY ROWNUM <= 10000
/

EXEC dbms_stats.gather_table_stats ('','date_test',method_opt=>'for all indexed columns size auto')

CREATE UNIQUE INDEX date_test_pk
   ON date_test (id)
/

CREATE INDEX date_test_i1
   ON date_test (log_date)
   LOCAL
/

Repeating our EXPLAIN PLAN using a TIMESTAMP, I’ve modified the plan output to exclude the row and byte details to keep the overall width down:

EXPLAIN PLAN
FOR
SELECT /*+ INDEX (date_test date_test_i1) */
       *
FROM   date_test
WHERE  log_date >= TO_TIMESTAMP(:some_date)
/

SELECT *
FROM   TABLE(dbms_xplan.display(format=>'-BYTES,-ROWS'))
/

gives the following output:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 600373160

---------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |              |    35   (3)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL               |              |    35   (3)| 00:00:01 |     1 |1048575|
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| DATE_TEST    |    35   (3)| 00:00:01 |     1 |1048575|
|*  3 |    INDEX FULL SCAN                 | DATE_TEST_I1 |    30   (0)| 00:00:01 |     1 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(INTERNAL_FUNCTION("LOG_DATE")>=TO_TIMESTAMP(:SOME_DATE))

The key detail in this plan are the Pstart and Pstop columns, which have values 1 and 1048575 respectively. The Pstop value is simply Oracle’s way of saying “the last partition” as the table is INTERVAL partitioned. So, the Pstart and Pstop details tell us that Oracle will read all partitions of the local index. The actual operation performed, INDEX FULL SCAN, remains the same as the non-partitioned table.

Contrast this to the plan that would be displayed if the query utilised a DATE value:

EXPLAIN PLAN
FOR
SELECT *
FROM   date_test
WHERE  log_date >= TO_DATE(:some_date)
/

SELECT *
FROM   TABLE(dbms_xplan.display(format=>'-BYTES,-ROWS'))
/

gives the following output:

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 146305644

---------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |              |     3   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |              |     3   (0)| 00:00:01 |   KEY |1048575|
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| DATE_TEST    |     3   (0)| 00:00:01 |   KEY |1048575|
|*  3 |    INDEX RANGE SCAN                | DATE_TEST_I1 |     2   (0)| 00:00:01 |   KEY |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("LOG_DATE">=TO_DATE(:SOME_DATE))

In this plan the Pstart column now displays KEY, which means that Oracle will use the value of the bind variable supplied to eliminate partitions that it knows will not contain relevant data. Once again, the operation performed is the same as the non-paritioned table; an INDEX RANGE SCAN. So, in this example, Oracle is not only using the index more efficiently to access the data in each partition it is also ensuring that it only accesses partitions that might contain relevant data. For the system in question where this situation was occuring, the process execution time went from a couple of minutes to sub-second.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s