Min/Max Range Partitioning Query Optimisation

One day I’m being told of an query optimisation strategy on partitioned tables by Connor McDonald and the following week a friend asks how to optimise a query that can make use of the very same strategy. Coincidence is just plain creepy!

So, what is the optimisation strategy? When dealing with a range partitioned table, if the query is obtaining the maximum value of the partitioning key then Oracle will search the table backwards through the partitions until it finds a result. Once it has a result from a partition then it stops as it no longer needs to search the remaining partitions. I’m not sure but this optimisation may have been introduced in Oracle 11g release 2.

How does this work? Using Oracle 11g release 2 (11.2.0.3.0), let’s start off with a table based on one that might be used by a utility supply company to hold meter readings (gas, electricity, water, etc):

CREATE TABLE readings
   (reading_id     NUMBER(20)  NOT NULL
   ,meter_id       NUMBER(10)  NOT NULL
   ,reading_tstamp TIMESTAMP   NOT NULL
   ,reading_type   VARCHAR2(1) NOT NULL
   ,reading        NUMBER)
PARTITION BY RANGE (reading_tstamp) INTERVAL (INTERVAL '1' MONTH)
 (PARTITION readings_201012 VALUES LESS THAN (TO_DATE('01/01/2011','dd/mm/yyyy'))
 )
/

INSERT INTO readings
SELECT /*+ APPEND */
       ROWNUM
,      MOD(ROWNUM,2000) -- spread across 2000 meters
,      TO_TIMESTAMP('01/01/2011','dd/mm/yyyy') + NUMTODSINTERVAL(ROWNUM-1,'MINUTE')
       -- every 11th reading is considered a different type of reading, which
       -- simulates the actual data scenario
,      CASE WHEN MOD(ROWNUM,11) = 3 
            THEN 'X' 
            ELSE 'A' 
       END
,      1
FROM  dual
CONNECT BY ROWNUM 'ALL')

CREATE UNIQUE INDEX readings_pk
   ON readings (reading_id, reading_tstamp)
   LOCAL
/   

CREATE INDEX readings_ix1
   ON readings (meter_id, reading_tstamp)
   LOCAL
/   

ALTER TABLE readings
  ADD CONSTRAINT readngs_pk
  PRIMARY KEY (reading_id, reading_tstamp)
/

So, we have a READINGS table that is partitioned into months according to the READING_TSTAMP. The table has been populated for the years 2011 and 2012 with a reading every minute spread across 2000 meters. Overall we have just over 1,000,000 readings. Using meter 1234, let’s see how many readings of type A that we’re dealing with:

SQL>VARIABLE l_meter_id NUMBER
SQL>EXEC :l_meter_id := 1234

PL/SQL procedure successfully completed.

SQL>SELECT COUNT(*)
  2  FROM   readings
  3  WHERE  meter_id = :l_meter_id
  4  AND    reading_type = 'A'
  5  /

  COUNT(*)
----------
       477

and if we look at the distribution of the rows we find that they are well spread, with each row in a different block, which is not really surprising given the manner that the table was populated:

SQL>SELECT COUNT(DISTINCT dbms_rowid.rowid_block_number(ROWID)) AS no_blocks
  2  FROM   readings
  3  WHERE  meter_id = :l_meter_id
  4  AND    reading_type = 'A'
  5  /

 NO_BLOCKS
----------
       477

So, enough background… let’s get onto the optimisation. We want to get the timestamp associated with the last A type reading for our meter, i.e.:

SELECT MAX(reading_tstamp)
FROM   readings
WHERE  meter_id = :l_meter_id
AND    reading_type = 'A'

When we run this using autotrace, we get:

Execution Plan
----------------------------------------------------------
Plan hash value: 3273213074

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |     1 |    17 |   553   (0)| 00:00:07 |    |          |
|   1 |  PARTITION RANGE ALL                |              |     1 |    17 |            |          |1048575|     1 |
|   2 |   SORT AGGREGATE                    |              |     1 |    17 |            |          |    |          |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| READINGS     |   263 |  4471 |   553   (0)| 00:00:07 |1048575|     1 |
|*  4 |     INDEX RANGE SCAN                | READINGS_IX1 |   526 |       |    27   (0)| 00:00:01 |1048575|     1 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("READING_TYPE"='A')
   4 - access("METER_ID"=TO_NUMBER(:L_METER_ID))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         23  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

A casual look at the execution plan shows that Oracle is choosing to use index READINGS_IX1 to access the rows in the table for the specified meter, removing the non-A type readings and sorting the output. It does this for each partition in the table… or is it? Looking at the pstart and pstop columns shows that pstart is 1048575, which is the standard value for “the maximum defined partition in an interval partitioned table” and pstop is 1. So, Oracle is saying that it is traversing the table partitions in descending order as defined by the partitioning key READING_TSTAMP.

The most telling thing from the autotrace output is the consistent gets statistics, which is just 23. We already know that the readings for our meter reside on 477 blocks so there’s no way we accessed all those rows with just 23 consistent gets.

If I were the Oracle optimiser and with my knowledge of the data I would be accessing the rows for the meter in descending reading timestamp via READINGS_IX1, checking each row to see if it were of reading type A and stopping immediately that I found one. However, this strategy is likely to result in a handful of consistent gets so a value of 23 means that Oracle is doing a bit more than that.

Let’s see what statistics we get if we restrict out query to just the December 2012 partition, which is where the value we want resides:

SELECT MAX(reading_tstamp)
FROM   readings
WHERE  meter_id = :l_meter_id
AND    reading_type = 'A'
AND    reading_tstamp >= TO_TIMESTAMP('01/12/2012','dd/mm/yyyy')
AND    reading_tstamp <  TO_TIMESTAMP('01/01/2013','dd/mm/yyyy')

Autotrace gives us:

Execution Plan
----------------------------------------------------------
Plan hash value: 1620825272

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |     1 |    17 |    24   (0)| 00:00:01 |    |          |
|   1 |  PARTITION RANGE SINGLE             |              |     1 |    17 |            |          | 25 |       25 |
|   2 |   SORT AGGREGATE                    |              |     1 |    17 |            |          |    |          |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| READINGS     |    11 |   187 |    24   (0)| 00:00:01 | 25 |       25 |
|*  4 |     INDEX RANGE SCAN                | READINGS_IX1 |     1 |       |     2   (0)| 00:00:01 | 25 |       25 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("READING_TYPE"='A')
   4 - access("METER_ID"=TO_NUMBER(:L_METER_ID) AND "READING_TSTAMP">=TIMESTAMP' 2012-12-01
              00:00:00.000000000' AND "READING_TSTAMP"<TIMESTAMP' 2013-01-01 00:00:00.000000000')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         23  consistent gets
          0  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The execution plan is similar to the first one but it is now restricted to a single partition. More interesting is that the consistent gets is 23, exactly the same as our non-timestamp restricted query. So, we can infer that in the original query Oracle only accessed the relevant rows in the final partition in our table.

So, how did we get 23 consistent gets? A check against the table shows that the last partition contains 21 entries for our meter so that accounts for 21 of the 23 consistent gets. The local index READINGS_IX1 has a BLEVEL of 1 so traversing that index took another 2 consistent gets, giving us our total of 23. This allows us to assume that while the optimisation only accessed one partition it still accessed all relevant rows in that partition, which is something to keep in mind if you have many relevant rows per partition.

We can get confirmation that only one partition was accessed by disabling the READINGS_IX1 index partition for November 2012. If we try to run a query that requires the index on that specific partition we’ll get an error:

ALTER SESSION SET skip_unusable_indexes = FALSE
/

ALTER INDEX readings_ix1 
   MODIFY PARTITION SYS_P768 UNUSABLE
/

SELECT COUNT(*)
FROM   readings
WHERE  meter_id = :l_meter_id
AND    reading_type = 'A'
/

which results in:

SELECT COUNT(*)
*
ERROR at line 1:
ORA-01502: index 'DEVELOPER.READINGS_IX1' or partition of such index is in unusable state

However if we run our maximum meter reading query:

SELECT MAX(reading_tstamp)
FROM   readings
WHERE  meter_id = :l_meter_id
AND    reading_type = 'A'

we actually get back the correct answer and not an error:

MAX(READING_TSTAMP)
----------------------------
29-DEC-12 03.13.00.000000 PM

So, we can conclude that Oracle has only accessed the December 2012 partition, which is the highest partition as defined by our READING_TSTAMP partitioning key, obtained an answer from that partition and ignored the other table partitions.

As an aside, hinting the query to perform a full table scan also shows that the optimisation is present, with Oracle accessing the last partition first:

Execution Plan
----------------------------------------------------------
Plan hash value: 1332708028

------------------------------------------------------------------------------------------------
| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |     1 |    17 |  1087   (2)| 00:00:14 |       |       |
|   1 |  PARTITION RANGE ALL|          |     1 |    17 |            |          |1048575|     1 |
|   2 |   SORT AGGREGATE    |          |     1 |    17 |            |          |       |       |
|*  3 |    TABLE ACCESS FULL| READINGS |   477 |  8109 |  1087   (2)| 00:00:14 |1048575|     1 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("METER_ID"=TO_NUMBER(:L_METER_ID) AND "READING_TYPE"='A')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        169  consistent gets
        168  physical reads
          0  redo size
        542  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The consistent gets shown above are far too low for a full table scan of the entire table, which would be nearly 4,000 on the demonstration table used in these examples.

Furthermore, this optimisation strategy can also be used when requesting a minimum value. In this situation Oracle will traverse the table partitions in ascending order so it may be a bit harder to detect that it is occurring.

I do see one potential problem though. If you look back to the original execution plan you’ll note that the cost of the query is 553. Let’s look at the plan for a query that returns all the READING_TSTAMP values for our meter, as opposed to just the maximum:

EXPLAIN PLAN
FOR
SELECT reading_tstamp
FROM   readings
WHERE  meter_id = :l_meter_id
AND    reading_type = 'A'
/

SELECT *
FROM   TABLE(dbms_xplan.display(format=>'-BYTES'))
/

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------
Plan hash value: 2044760240

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name         | Rows  | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |              |   477 |   553   (0)| 00:00:07 |       |    |
|   1 |  PARTITION RANGE ALL               |              |   477 |   553   (0)| 00:00:07 |     1 |1048575|
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID| READINGS     |   477 |   553   (0)| 00:00:07 |     1 |1048575|
|*  3 |    INDEX RANGE SCAN                | READINGS_IX1 |   526 |    27   (0)| 00:00:01 |     1 |1048575|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("READING_TYPE"='A')
   3 - access("METER_ID"=TO_NUMBER(:L_METER_ID))

Obviously all table partitions were accessed for this query yet the cost of the query is the same as the one that returns the maximum result. Assuming negligible cost for the sort operation that would return the maximum value this makes sense… but only if all rows were actually accessed, which they aren’t when we are after the maximum value. So, on one hand it appears we have Oracle optimising the strategy for resolving a query but not reflecting a reduction in plan cost as a result of that strategy. The logical implication of this is that it might lead Oracle to avoid the optimisation strategy in preference of an alternate execution plan and therefore end up with a sub-optimal plan. Probably not likely but yet another thing to keep in mind.


Unfortunately for me the requirement was not just to find the maximum READING_TSTAMP of one meter_id but multiple meters, so the query required resembled:

SELECT /*+ index (readings readings_ix1) */
       meter_id
,      MAX(reading_tstamp)
FROM   readings
WHERE  meter_id IN (:l_meter_id1, :l_meter_id2, :l_meter_id3)
AND    reading_type = 'A'
GROUP  BY
       meter_id

If we run this query (hinted to use the READINGS_IX1 index as the optimiser chose a table scan in this demo table by default, which wouldn’t happen in the real table) using meters 123, 345 and 567 then we get the following:

Execution Plan
----------------------------------------------------------
Plan hash value: 2355415513

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |     3 |    51 |  1610   (1)| 00:00:20 |       |       |
|   1 |  HASH GROUP BY                       |              |     3 |    51 |  1610   (1)| 00:00:20 |       |       |
|   2 |   PARTITION RANGE ALL                |              |   788 | 13396 |  1609   (1)| 00:00:20 |     1 |1048575|
|   3 |    INLIST ITERATOR                   |              |       |       |            |          |       |       |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| READINGS     |   788 | 13396 |  1609   (1)| 00:00:20 |     1 |1048575|
|*  5 |      INDEX RANGE SCAN                | READINGS_IX1 |  1577 |       |    31   (0)| 00:00:01 |     1 |1048575|
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("READING_TYPE"='A')
   5 - access("METER_ID"=TO_NUMBER(:L_METER_ID1) OR "METER_ID"=TO_NUMBER(:L_METER_ID2) OR
              "METER_ID"=TO_NUMBER(:L_METER_ID3))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1679  consistent gets
          0  physical reads
          0  redo size
        696  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

From the pstart and pstop column in the execution plan Oracle is reporting that it is starting from the first partition and moving upwards through the data. Our consistent gets figure is also far more than 3 times our previous value of 23. So, we can conclude that the maximum partition access optimisation is not being used.

The solution proposed was to perform the lookup of the maximum reading timestamp in a scalar subquery, thereby avoiding the GROUP BY when accessing the READINGS table:

WITH meters AS
   (SELECT :l_meter_id1 AS meter_id FROM dual UNION ALL
    SELECT :l_meter_id2 AS meter_id FROM dual UNION ALL
    SELECT :l_meter_id3 AS meter_id FROM dual)
SELECT m.meter_id
,      (SELECT MAX(reading_tstamp)
        FROM   readings
        WHERE  meter_id = m.meter_id
        AND    reading_type = 'A'
       ) AS max_reading_tstamp
FROM   meters m
/

and we end up with the optimisation for each meter:

Execution Plan
----------------------------------------------------------
Plan hash value: 5970687

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |     3 |     6 |     6   (0)| 00:00:01 |    |          |
|   1 |  PARTITION RANGE ALL                |              |     1 |    17 |            |          |1048575|     1 |
|   2 |   SORT AGGREGATE                    |              |     1 |    17 |            |          |    |          |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| READINGS     |   263 |  4471 |   553   (0)| 00:00:07 |1048575|     1 |
|*  4 |     INDEX RANGE SCAN                | READINGS_IX1 |   526 |       |    27   (0)| 00:00:01 |1048575|     1 |
|   5 |  VIEW                               |              |     3 |     6 |     6   (0)| 00:00:01 |    |          |
|   6 |   UNION-ALL                         |              |       |       |            |          |    |          |
|   7 |    FAST DUAL                        |              |     1 |       |     2   (0)| 00:00:01 |    |          |
|   8 |    FAST DUAL                        |              |     1 |       |     2   (0)| 00:00:01 |    |          |
|   9 |    FAST DUAL                        |              |     1 |       |     2   (0)| 00:00:01 |    |          |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("READING_TYPE"='A')
   4 - access("METER_ID"=TO_NUMBER(:B1))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         71  consistent gets
          0  physical reads
          0  redo size
        695  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

Obviously this solution will only scale to a certain number of meters before an alternate approach is required. Fortunately this wasn’t a problem as the number of meters would only be a handful. So, in all a very nice optimisation when your partitioned table runs into the billions of rows.

Package: pkg_log_maint

In my last post I outlined a simple application logging table and API for recording log messages. If enabled in a busy system then the size of the table will quickly grow. I noted that the table is partitioned for easy maintenance. In this post I provide a simple package for managing those maintenance tasks.

Purging the LOGS table of old data is clearly the key maintenance task. Given the table is partitioned by date the simplest solution is to drop partitions older than whatever the retention period should be. The other maintenance task, although a relatively minor one, is to rename the system generated partitions with meaningful names.

So, without further ado, I present PKG_LOG_MAINT, which exposes just two routines:

  • update_partition_names – a procedure that renames any system generated partition name to LOGS_yyyymmdd, where yyyymmdd represents the date that the data pertains to
  • clear_logs – a procedure that drops old table partitions. The procedure accepts a single optional parameter and will pruge any partition where the data is older than that date. If a date is not provided then data older than the default retention as specified by the package body constant c_default_retention_days is used.

PKG_LOG_MAINT has dependencies on PKG_LOG, PKG_ERR and PKG_DDL. PKG_DDL performs the actual DDL required and PKG_ERR is used as the mechanism to log any errors associated with the DDL that might arise.


Download PKG_LOG_MAINT.