“The database seems a bit slow today…” was the comment I received upon my arrival at work recently. It wasn’t too hard to see why either with a rather large, convoluted reporting style query chugging along, much to the annoyance of the other activity in the database. Peering into the depths of the query I noticed at its core it comprised of a UNION of 3 queries. What struck me as strange was that each of the three components were nearly identical to each other:
SELECT col_1, col_2, col3, col_4 FROM <tables> WHERE <selection criteria> GROUP BY col_1, col_2, col3, col_4 UNION ALL SELECT col_1, col_2, col3, 'TOTAL' FROM <tables> WHERE <selection criteria> GROUP BY col_1, col_2, col3, 'TOTAL' UNION ALL SELECT col_1, col_2, 'TOTAL', 'TOTAL' FROM <tables> WHERE <selection criteria> GROUP BY col_1, col_2, 'TOTAL', 'TOTAL'
The execution details of the query showed that Oracle spent 10 minutes executing each component. The tables and the WHERE clause were identical in each of the query components. This meant that the second component is nothing more than an aggregation of the first and the third component an aggregation of the second. Surely there’s a better way of obtaining the aggregation entries without rerunning the entire query again? And there is…
Lurking in the depths of the Database Data Warehousing Guide, and not the SQL Language Reference, is a chapter titled SQL for Aggregation in Data Warehouses. The particular function we want for our query is ROLLUP. The documentation has this to say about ROLLUP:
ROLLUP enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. It also calculates a grand total. ROLLUP is a simple extension to the GROUP BY clause, so its syntax is extremely easy to use. The ROLLUP extension is highly efficient, adding minimal overhead to a query.
By way of simple example we create a table with 4 columns: 3 of them dimension type attributes and one numeric metric, along with some test data:
CREATE TABLE rollup_test (col_a VARCHAR2(1) NOT NULL ,col_b VARCHAR2(1) NOT NULL ,col_c VARCHAR2(1) NOT NULL ,val NUMBER(3) NOT NULL); INSERT INTO rollup_test VALUES ('A','A','A',1); INSERT INTO rollup_test VALUES ('A','A','B',1); INSERT INTO rollup_test VALUES ('A','A','C',1); INSERT INTO rollup_test VALUES ('A','B','A',1); INSERT INTO rollup_test VALUES ('A','B','B',1); INSERT INTO rollup_test VALUES ('A','B','C',1); INSERT INTO rollup_test VALUES ('A','C','A',1); INSERT INTO rollup_test VALUES ('A','C','B',1); INSERT INTO rollup_test VALUES ('A','C','C',1); INSERT INTO rollup_test VALUES ('B','A','A',1); INSERT INTO rollup_test VALUES ('B','A','B',1); INSERT INTO rollup_test VALUES ('B','A','C',1); INSERT INTO rollup_test VALUES ('B','B','A',1); INSERT INTO rollup_test VALUES ('B','B','B',1); INSERT INTO rollup_test VALUES ('B','B','C',1); INSERT INTO rollup_test VALUES ('B','C','A',1); INSERT INTO rollup_test VALUES ('B','C','B',1); INSERT INTO rollup_test VALUES ('B','C','C',1); COMMIT;
and we can see the impact that ROLLUP has using the following query:
SQL> SELECT col_a 2 , NVL(col_b,'TOTAL') AS b 3 , NVL(col_c,'TOTAL') AS c 4 , SUM(val) AS sum_val 5 FROM rollup_test 6 GROUP BY 7 col_a 8 , ROLLUP(col_b, col_c) 9 ORDER BY 10 col_a 11 , NVL(col_b,'TOTAL') 12 , NVL(col_c,'TOTAL'); COL_A B C SUM_VAL ------ ------ ------ -------- A A A 1 A A B 1 A A C 1 A A TOTAL 3 A B A 1 A B B 1 A B C 1 A B TOTAL 3 A C A 1 A C B 1 A C C 1 A C TOTAL 3 A TOTAL TOTAL 9 B A A 1 B A B 1 B A C 1 B A TOTAL 3 B B A 1 B B B 1 B B C 1 B B TOTAL 3 B C A 1 B C B 1 B C C 1 B C TOTAL 3 B TOTAL TOTAL 9 26 rows selected.
Applying it to our reporting query was as simple as replacing the UNION ALL with:
SELECT col_1, col_2, NVL(col3,'TOTAL'), NVL(col_4,'TOTAL') FROM <tables> WHERE <selection criteria> GROUP BY col_1, col_2, ROLLUP(col3, col_4)
No more UNION ALL, no more execution of (nearly) the same query three times and an immediate saving of 20 minutes of wasted execution time… and all before my morning coffee.