ROLLUP Function

“The database seems a bit slow today…” was the comment I received upon my arrival at work recently. It wasn’t too hard to see why either with a rather large, convoluted reporting style query chugging along, much to the annoyance of the other activity in the database. Peering into the depths of the query I noticed at its core it comprised of a UNION of 3 queries. What struck me as strange was that each of the three components were nearly identical to each other:

SELECT col_1, col_2, col3, col_4
FROM <tables>
WHERE <selection criteria>
GROUP BY
      col_1, col_2, col3, col_4
UNION ALL
SELECT col_1, col_2, col3, 'TOTAL'
FROM <tables>
WHERE <selection criteria>
GROUP BY
      col_1, col_2, col3, 'TOTAL'
UNION ALL
SELECT col_1, col_2, 'TOTAL', 'TOTAL'
FROM <tables>
WHERE <selection criteria>
GROUP BY
      col_1, col_2, 'TOTAL', 'TOTAL'

The execution details of the query showed that Oracle spent 10 minutes executing each component. The tables and the WHERE clause were identical in each of the query components. This meant that the second component is nothing more than an aggregation of the first and the third component an aggregation of the second. Surely there’s a better way of obtaining the aggregation entries without rerunning the entire query again? And there is…

Lurking in the depths of the Database Data Warehousing Guide, and not the SQL Language Reference, is a chapter titled SQL for Aggregation in Data Warehouses. The particular function we want for our query is ROLLUP. The documentation has this to say about ROLLUP:

ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. ROLLUP is a simple extension to the GROUP BY clause, so its syntax is extremely easy to use. The ROLLUP extension is highly efficient, adding minimal overhead to a query.

By way of simple example we create a table with 4 columns: 3 of them dimension type attributes and one numeric metric, along with some test data:

CREATE TABLE rollup_test
	(col_a 		VARCHAR2(1)	NOT NULL
	,col_b		VARCHAR2(1)	NOT NULL
	,col_c		VARCHAR2(1)	NOT NULL
	,val		NUMBER(3)	NOT NULL);

INSERT INTO rollup_test
VALUES ('A','A','A',1);
INSERT INTO rollup_test
VALUES ('A','A','B',1);
INSERT INTO rollup_test
VALUES ('A','A','C',1);

INSERT INTO rollup_test
VALUES ('A','B','A',1);
INSERT INTO rollup_test
VALUES ('A','B','B',1);
INSERT INTO rollup_test
VALUES ('A','B','C',1);

INSERT INTO rollup_test
VALUES ('A','C','A',1);
INSERT INTO rollup_test
VALUES ('A','C','B',1);
INSERT INTO rollup_test
VALUES ('A','C','C',1);

INSERT INTO rollup_test
VALUES ('B','A','A',1);
INSERT INTO rollup_test
VALUES ('B','A','B',1);
INSERT INTO rollup_test
VALUES ('B','A','C',1);

INSERT INTO rollup_test
VALUES ('B','B','A',1);
INSERT INTO rollup_test
VALUES ('B','B','B',1);
INSERT INTO rollup_test
VALUES ('B','B','C',1);

INSERT INTO rollup_test
VALUES ('B','C','A',1);
INSERT INTO rollup_test
VALUES ('B','C','B',1);
INSERT INTO rollup_test
VALUES ('B','C','C',1);

COMMIT;

and we can see the impact that ROLLUP has using the following query:

SQL> SELECT col_a
  2  ,      NVL(col_b,'TOTAL') AS b
  3  ,      NVL(col_c,'TOTAL') AS c
  4  ,      SUM(val)           AS sum_val
  5  FROM   rollup_test
  6  GROUP  BY
  7         col_a
  8  ,      ROLLUP(col_b, col_c)
  9  ORDER  BY
 10         col_a
 11  ,      NVL(col_b,'TOTAL')
 12  ,      NVL(col_c,'TOTAL');

COL_A  B      C       SUM_VAL
------ ------ ------ --------
A      A      A             1
A      A      B             1
A      A      C             1
A      A      TOTAL         3
A      B      A             1
A      B      B             1
A      B      C             1
A      B      TOTAL         3
A      C      A             1
A      C      B             1
A      C      C             1
A      C      TOTAL         3
A      TOTAL  TOTAL         9
B      A      A             1
B      A      B             1
B      A      C             1
B      A      TOTAL         3
B      B      A             1
B      B      B             1
B      B      C             1
B      B      TOTAL         3
B      C      A             1
B      C      B             1
B      C      C             1
B      C      TOTAL         3
B      TOTAL  TOTAL         9

26 rows selected.

Applying it to our reporting query was as simple as replacing the UNION ALL with:

SELECT col_1, col_2, NVL(col3,'TOTAL'), NVL(col_4,'TOTAL')
FROM <tables>
WHERE <selection criteria>
GROUP BY
      col_1, col_2, ROLLUP(col3, col_4)

No more UNION ALL, no more execution of (nearly) the same query three times and an immediate saving of 20 minutes of wasted execution time… and all before my morning coffee.

Leave a comment