Most developers have a workable knowledge of SQL and are quite familiar with the aggregate functions SUM, COUNT, MIN, MAX and such. Yet I still see logic similar to the following in PL/SQL code:
DECLARE l_rows NUMBER; BEGIN SELECT COUNT(*) INTO l_rows FROM some_table WHERE some_column = some_value; dbms_output.put_line ('Rows found: ' || TO_CHAR(l_rows)); EXCEPTION WHEN NO_DATA_FOUND THEN dbms_output.put_line ('No rows found'); END;
To an experienced Oracle developer the defect is quite clear: the SQL statement will never throw the NO_DATA_FOUND exception. If there are no rows in the table that meet the selection criteria then the COUNT(*) function will return a value 0 and so we will always get back a row from the above query.
However, what happens in other scenarios with other aggregate functions? This post explorers some of the basic aspects of aggregate functions that appear to trip up inexperienced application developers.
For the examples in this post I will use the following table and data:
CREATE TABLE agg_demo (id NUMBER(6) NOT NULL ,cat NUMBER(2) NOT NULL ,val NUMBER(4) ,ts TIMESTAMP ) / INSERT INTO agg_demo SELECT ROWNUM AS id , MOD(ROWNUM,5) AS cat , CASE WHEN TRUNC(ROWNUM/7) = 0 THEN NULL ELSE TRUNC(ROWNUM/7) END AS val FROM dual CONNECT BY level <= 100 / COMMIT /
So, we have a table with:
- 100 rows with a unique ID value
- a CAT column with values that range from 0 to 4
- a VAL column with values in the range of 0 to 14 that also contains NULLs
Let’s start with a simple row count:
SELECT COUNT(*) FROM agg_demo / COUNT(*) ---------- 100
The result is quite obvious: we have 100 rows in our table. Adding a WHERE clause that excludes all rows results in the following:
SELECT COUNT(*) FROM agg_demo WHERE cat > 20 / COUNT(*) ---------- 0
This is the scenario that is similar to the one that I noted at the start of this post. We still get back a result and that result is 0. So, this query will never throw the PL/SQL NO_DATA_FOUND exception. Furthermore, this type of query will never return NULL, i.e. you will always get a value and that value will be an integer of 0 or greater.
When it comes to the COUNT function, there are three common ways that developers use it:
- COUNT(*)
- COUNT(1), or some other literal constant
- COUNT(column)
So, what are the differences in the above:
SELECT COUNT(*) AS "count_*" , COUNT(1) AS "count_1" , COUNT(val) AS "count_val" FROM agg_demo / count_* count_1 count_val ---------- ---------- ---------- 100 100 94
We can see that COUNT(*) and COUNT(1) returned the same value, 100. Both of these functions are returning a count of the number of rows. As an aside, I have heard people claim that one approach is more efficient than the other but there’s no truth in these claims.
The COUNT(val) returns a different value than the other two COUNT functions, which highlights an important difference in how the COUNT function works. When the COUNT function is applied to a specific column, or any expression, then NULLs are not included in the counted result. So, the result of 94 is the number of non-NULL VAL entries.
Regardless of the way COUNT is expressed, when the WHERE clause eliminates all rows from the table the various COUNT functions will always return a value of 0:
SELECT COUNT(*) AS "count_*" , COUNT(1) AS "count_1" , COUNT(val) AS "count_val" FROM agg_demo WHERE cat > 20 / count_* count_1 count_val ---------- ---------- ---------- 0 0 0
It’s interesting to note that COUNT(val) will return a value even if all the rows that the function is applied to contain NULL for the VAL column, i.e. we don’t get back NULL:
SELECT COUNT(*) AS "count_*" , COUNT(1) AS "count_1" , COUNT(val) AS "count_val" FROM agg_demo WHERE val IS NULL / count_* count_1 count_val ---------- ---------- ---------- 6 6 0
The above illustrates that it’s not necessary to include an NVL function around the count of a column in order to force a non-NULL response, as I’ve seen some developers do. For example, the NVL in the following query is unnecessary:
SELECT NVL(COUNT(val),0) AS "count_val" FROM agg_demo WHERE val IS NULL / count_val ---------- 0
It’s also important to note that the above is very different to the following query:
SELECT COUNT(NVL(val,0)) AS "count_val" FROM agg_demo WHERE val IS NULL / count_val ---------- 6
In the above query, the NVL function is applied prior to the aggregate COUNT function. This means that the COUNT function is not counting 6 NULLs, which it would normally ignore and therefore report a result of 0. Instead it is counting 6 0’s and returns a count of 6.
Now let’s go back a few steps and add a GROUP BY into the query with no WHERE clause:
SELECT cat , COUNT(*) AS "count_*" , COUNT(1) AS "count_1" , COUNT(val) AS "count_val" FROM agg_demo GROUP BY cat ORDER BY cat / CAT count_* count_1 count_val ---------- ---------- ---------- ---------- 0 20 20 19 1 20 20 18 2 20 20 19 3 20 20 19 4 20 20 19
Ther’s probably no surprises here, we end up with a count of the rows per CAT for the COUNT(*) and COUNT(1) functions and a count of the non-NULL VAL entries for the COUNT(val) function. Let’s repeat this query with a WHERE clause that eliminates all rows:
SELECT cat , COUNT(*) AS "count_*" , COUNT(1) AS "count_1" , COUNT(val) AS "count_val" FROM agg_demo WHERE cat > 20 GROUP BY cat ORDER BY cat / no rows selected
Ah, now that’s different. This time we didn’t get any rows back, not even one indicating that the various counts are 0. This is probably where most developers go wrong in assuming that COUNT(*) can return no rows and therefore could potentially raise a NO_DATA_FOUND exception in PL/SQL. Once GROUP BY is included in the SQL then it is possible for the query to return no rows since all rows from the table can be excluded and therefore there is nothing to perform the GROUP BY on.
So far we’ve been looking at the COUNT function. What about other aggregate functions? Let’s give SUM, MIN and AVG a try.
SELECT SUM(val) AS sum_val , MIN(val) AS min_val , AVG(val) AS avg_val FROM agg_demo / SUM_VAL MIN_VAL AVG_VAL ---------- ---------- ---------- 679 1 7.22340426
Again, probably no surprises here. For the other aggregate functions we have to specify the column (or expression) on which we are operating, i.e. it makes no sense to try to calculate the average of a row, only a given column (or expression) in the row.
However, on closer consideration, this query does highlight an important point with aggregate functions: NULLs are ignored for the purpose of the calculations performed. For most functions and calculations, the inclusion of NULL will result in a NULL answer. This doesn’t apply to aggregate functions. A simple test illustrates the difference:
SELECT 1 + 2 + NULL AS result FROM dual / RESULT ---------- WITH vals AS (SELECT 1 AS val FROM dual UNION ALL SELECT 2 AS val FROM dual UNION ALL SELECT NULL AS val FROM dual) SELECT SUM(val) AS result FROM vals / RESULT ---------- 3 /
In the above test we added the values 1, 2 and NULL using the addition operator (+) and the result returned was NULL. When we used the SUM aggregate function on the same set of values we got a result of 2. So, aggregate functions ignore NULLs when performing their calculations.
Let’s see what happens when we have a WHERE clause that eliminates all rows:
SELECT SUM(val) AS sum_val , MIN(val) AS min_val , AVG(val) AS avg_val FROM agg_demo WHERE cat > 20 / SUM_VAL MIN_VAL AVG_VAL ---------- ---------- ----------
Similar to the COUNT function we get back a result but in this case the results were NULL. This makes sense. COUNT told us that there were no results, i.e. we had a count of 0, so what else should the sum, minimum and average of no values be? However, keep in mind that we did actually get a result so, again, this type of query will not raise a NO_DATA_FOUND exception.
Potentially you may decide that you do not want NULL and instead default to some other value using the NVL function:
SELECT NVL(SUM(val),0) AS sum_val , NVL(MIN(val),0) AS min_val , NVL(AVG(val),0) AS avg_val FROM agg_demo WHERE cat > 20 / SUM_VAL MIN_VAL AVG_VAL ---------- ---------- ---------- 0 0 0
However, this is definitely incorrect from a mathematical perspective and more than likely to be incorrect from a business logic perspective. In a code review I certainly would be questioning the use of this type of logic very, very closely. As an aside, I find it amusing that something that is definitely incorrect mathematically may only be unlikely to be incorrect from a business perspective.
Going back to the GROUP BY example:
SELECT cat , SUM(val) AS sum_val , MIN(val) AS min_val , AVG(val) AS avg_val FROM agg_demo GROUP BY cat ORDER BY cat / CAT SUM_VAL MIN_VAL AVG_VAL ---------- ---------- ---------- ---------- 0 141 1 7.42105263 1 130 1 7.22222222 2 133 1 7 3 136 1 7.15789474 4 139 1 7.31578947
We get the sum, minimum and average of the VAL column for each CAT value. There’s nothing unusual here. Likewise, when we add a WHERE clause that eliminates all rows from the table we get:
SELECT cat , SUM(val) AS sum_val , MIN(val) AS min_val , AVG(val) AS avg_val FROM agg_demo WHERE cat > 20 GROUP BY cat ORDER BY cat / no rows selected
This is the same as for the COUNT function so there should be no surprises here.