Implicit Date Conversion

A DBA colleague at work recently approached me with an interesting problem. He was investigating a query run by one of the OLTP applications but when he ran it it resulted in a full table scan of a rather large table and a result-set that ran into a couple of million rows.

The query in question resembled the following:

SELECT col_a
,      col_b
,      col_c
FROM   table_owner.table_x
WHERE  col_d = '13-APR-12'
GROUP  BY
       col_a
,      col_b
,      col_c

Looking into V$SQL and using DBMS_XPLAN, we quickly determined that the query being run by the application was not performing a full table scan but using the index on COL_D. We also confirmed that the query ran by my colleague, under a different login, did indeed perform a full table scan. TABLE_X was fully qualified with the owner so both queries were addressing the same table. Most interesting was the result set differed between the two queries; the application query using the index only returned a couple of hundred of rows and yet the query did not contain bind variables.

As it turned out, my colleague worked out that the NLS_DATE_FORMAT setting for his session had been set to ‘DD/MM/YYYY’, and not the more common ‘DD-MON-RR’. Problem solved…

… but a lingering question remained in my mind. If the NLS_DATE_FORMAT setting is ‘DD/MM/YYYY’ and Oracle is presented with a string in ‘DD-MON-RR’ format then what date value does it use? Since the query did not fail then Oracle must have performed a conversion even if it wasn’t the one desired.

For those of you unfamiliar with the NLS_DATE_FORMAT, it’s a parameter that defines the default date format for a session. So, in the above query, the string literal ’13-APR-12′ is implicitly translated into a DATE data type of 13 April 2012. NLS_DATE_FORMAT is set at the database level but is session modifiable.

From a programming perspective, the implicit date conversion and its reliance on the NLS_DATE_FORMAT is deplorable. I would have no hesitation insisting on an explicit conversion, using the TO_DATE function, in any code that I was asked to review that contained an implicit data-type conversion like this. As my colleague was able to illustrate, a different NLS_DATE_FORMAT setting did not result in the query failing but instead resulted in a different, and totally unexpected, result-set.

We can easily create a simple scenario to illustrate the problem with the following script:

CREATE TABLE test_tab
   (id        NUMBER(10) NOT NULL
   ,tstamp    DATE       NOT NULL
   ,padding   CHAR(50)   NOT NULL
   )
/

INSERT INTO test_tab
SELECT ROWNUM AS id
,      TO_DATE('01/01/2010','dd/mm/yyyy') + (ROWNUM/10)
,      'X'
FROM   dual
CONNECT BY level <= 10000
/

SELECT *
FROM   TABLE(dbms_xplan.display)
/


ALTER SESSION SET nls_date_format = 'DD/MM/YYYY'
/

EXPLAIN PLAN 
FOR
SELECT *
FROM   test_tab
WHERE  tstamp >= '01-SEP-12'
/

The above script creates a simple table with a date column, populates it with 10,000 rows and displays the execution plan of a query involving a filter on the date column, once with an NLS_DATE_FORMAT setting of DD-MON-RR and once with DD/MM/YYYY. On my 11.2.0.3 database the execution plans are:

DD-MON-RR:

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 899007780

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   261 | 16443 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST_TAB    |   261 | 16443 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TEST_TAB_I1 |   261 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TSTAMP">='01-SEP-12')

14 rows selected.

DD/MM/YYYY:

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 3962208483

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 10000 |   615K|    27   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TEST_TAB | 10000 |   615K|    27   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("TSTAMP">='01-SEP-12')

13 rows selected.

The plan tells us that not only will the index not be used when the NLS_DATE_FORMAT is set to DD/MM/YYYY but the number of rows expected to be returned is 10,000, which is the entire table. Running the following query tells us what date value Oracle converted 12-SEP-12 into:

SELECT TO_DATE('01-SEP-12')
FROM   dual

and the reponse given was:

01/09/0012

Ah, so now the execution plan makes sense. A table scan is clearly the best way to return all rows from the table.

However, it’s somewhat misleading that Oracle will convert the string in the first place given that it does not in any way resemble DD/MM/YYYY, which is what the NLS_DATE_FORMAT specifies. Still, I guess that if you are foolish enough to rely on implicit string to date conversion and specify a two digit year then you deserve to have slow queries that probably do not return the result you were expecting!