On occasion I’ve needed to group time-based data into interval periods, e.g. 5 minute intervals. Oracle functions such as TRUNC and ROUND will help grouping to the hour or minute but what about other intervals? Unfortunately that’s when things get a little bit more complex. I need to do this just frequently enough to curse myself for not properly documenting the strategies I used the previous time I had to do it. So, this post is an attempt to rectify that and looks at a couple of strategies for grouping time measurements into arbitrary intervals.

**Set-up**

The interval grouping examples that follow will be based on the following table:

CREATE TABLE log_table (log_ts TIMESTAMP NOT NULL ,log_msg VARCHAR2(100) NOT NULL ) / -- insert 10000 rows into the table spread across the day INSERT INTO log_table (log_ts, log_msg) SELECT CAST((TRUNC(SYSDATE) + (dbms_random.value (0,1000)/1000)) AS TIMESTAMP) , 'X' FROM dual CONNECT BY ROWNUM <= 10000 / COMMIT /

and just so we see the necessary date precision in SQL*Plus, we’ll change the NLS_DATE_FORMAT:

ALTER SESSION SET nls_date_format = 'dd/mm/yyyy hh24:mi' /

All of the following strategies will be based on grouping the data in the table into 15 minute intervals and providing a count of the table entries for each interval.

**Strategy 1**

If the grouping is to be a relatively small number groups on an hourly basis then using CASE can be effective:

SELECT TRUNC(log_ts,'HH') + CASE WHEN EXTRACT(MINUTE FROM log_ts) < 15 THEN 0 WHEN EXTRACT(MINUTE FROM log_ts) < 30 THEN (15/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 45 THEN (30/60/24) ELSE (45/60/24) END AS interval_start , TRUNC(log_ts,'HH') + CASE WHEN EXTRACT(MINUTE FROM log_ts) < 15 THEN (15/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 30 THEN (30/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 45 THEN (45/60/24) ELSE (60/60/24) END AS interval_end , COUNT(*) AS no_msgs FROM log_table GROUP BY TRUNC(log_ts,'HH') + CASE WHEN EXTRACT(MINUTE FROM log_ts) < 15 THEN 0 WHEN EXTRACT(MINUTE FROM log_ts) < 30 THEN (15/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 45 THEN (30/60/24) ELSE (45/60/24) END , TRUNC(log_ts,'HH') + CASE WHEN EXTRACT(MINUTE FROM log_ts) < 15 THEN (15/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 30 THEN (30/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 45 THEN (45/60/24) ELSE (60/60/24) END ORDER BY TRUNC(log_ts,'HH') + CASE WHEN EXTRACT(MINUTE FROM log_ts) < 15 THEN 0 WHEN EXTRACT(MINUTE FROM log_ts) < 30 THEN (15/60/24) WHEN EXTRACT(MINUTE FROM log_ts) < 45 THEN (30/60/24) ELSE (45/60/24) END

This strategy is quite clear on what’s being achieved to anyone reading the code. However, it is really only suitable to scenarios that have a limited number of groupings so it wouldn’t be practical to use this strategy if the intervals were to be, say, 2 minutes.

As an aside, the EXTRACT function will only extract sub-day intervals from TIMESTAMP data types and not the DATE data type, despite DATE having a time component. The Oracle documentation notes that this is due to how Oracle DATE data types map to ANSI standard DATE data types. This is easily overcome by casting the DATE to a TIMESTAMP, i.e. CAST(log_ts AS TIMESTAMP). Personally I find this behaviour rather annoying though…

**Strategy 2**

The TO_CHAR function contains the format model SSSSS, which extracts the number of seconds since midnight from the DATE provided. Using this function it’s relatively easy to do some simple arithmetic to round the number of seconds to the nearest interval boundary; divide by the number of seconds in the interval, discard any fraction and multiply by the number of seconds in the interval:

WITH intvl AS (SELECT 15*60 AS no_secs FROM dual) SELECT TRUNC(log_ts) + ((TRUNC(TO_NUMBER(TO_CHAR(log_ts,'SSSSS'))/i.no_secs)*i.no_secs)*(1/24/60/60)) AS interval_start , TRUNC(log_ts) + (((TRUNC(TO_NUMBER(TO_CHAR(log_ts,'SSSSS'))/i.no_secs)+1)*i.no_secs)*(1/24/60/60)) AS interval_end , COUNT(*) AS no_logs FROM log_table , intvl i GROUP BY TRUNC(log_ts) + ((TRUNC(TO_NUMBER(TO_CHAR(log_ts,'SSSSS'))/i.no_secs)*i.no_secs)*(1/24/60/60)) , TRUNC(log_ts) + (((TRUNC(TO_NUMBER(TO_CHAR(log_ts,'SSSSS'))/i.no_secs)+1)*i.no_secs)*(1/24/60/60)) ORDER BY TRUNC(log_ts) + ((TRUNC(TO_NUMBER(TO_CHAR(log_ts,'SSSSS'))/i.no_secs)*i.no_secs)*(1/24/60/60))

The above query uses a WITH block to factor out the 15 minute interval in seconds so that it doesn’t need to be hard-coded throughout the query. This makes it easy to alter the query to group on other intervals. The primary downside to the above query is that it’s not quite so easy to understand as the CASE strategy but it is probably more flexible.

**Strategy 3**

The next interval grouping strategy uses a WITH block to generate the set of intervals that the data is to be group into and then performs a join with the data in the table:

WITH int_dtls AS (SELECT 15 * 60 AS no_secs , TRUNC(SYSDATE) AS start_time , TRUNC(SYSDATE) + 1 AS end_time FROM dual) ,intvls AS (SELECT start_time + (no_secs/24/60/60*(ROWNUM-1)) AS interval_start , start_time + (no_secs/24/60/60*(ROWNUM)) AS interval_end FROM int_dtls CONNECT BY start_time + (no_secs/24/60/60*(ROWNUM-1)) < end_time) SELECT i.interval_start , i.interval_end , COUNT(*) AS no_logs FROM log_table l , intvls i WHERE l.log_ts >= i.interval_start AND l.log_ts < i.interval_end GROUP BY i.interval_start , i.interval_end ORDER BY i.interval_start , i.interval_end

A drawback of this strategy is that the date range needs to be specified in the WITH block. However a key advantage of is that it’s possible to outer join the interval ranges to the table so as to generate an entry for all interval ranges even if there doesn’t exist an entry in the table for that particular period, i.e.:

-- remove the entries between 1pm adn 2pm DELETE FROM log_table WHERE EXTRACT(HOUR FROM log_ts) = 13 / COMMIT / WITH int_dtls AS (SELECT 15 * 60 AS no_secs , TRUNC(SYSDATE) AS start_time , TRUNC(SYSDATE) + 1 AS end_time FROM dual) ,intvls AS (SELECT start_time + (no_secs/24/60/60*(ROWNUM-1)) AS interval_start , start_time + (no_secs/24/60/60*(ROWNUM)) AS interval_end FROM int_dtls CONNECT BY start_time + (no_secs/24/60/60*(ROWNUM-1)) < end_time) SELECT i.interval_start , i.interval_end , COUNT(l.log_ts) AS no_logs FROM log_table l , intvls i WHERE l.log_ts(+) >= i.interval_start AND l.log_ts(+) < i.interval_end GROUP BY i.interval_start , i.interval_end ORDER BY i.interval_start , i.interval_end

Note that in the above query we cannot count the number of log entries using COUNT(*) and had to change this to COUNT(l.log_ts), i.e. the number of entries in the log table itself.

I’m sure that others have different strategies for grouping data into intervals not noted above so if none of the above are suitable then a Google search should reveal other strategies. All of the strategies are based on groupings that are X minutes. If the grouping interval were to be X days then a bit of rework would be needed.

Hopefully the next time I find myself needing to do this I will remember this post and not find myself reinventing the wheel again…

Download the scripts for the code used in this post.