Aggregate Function Basics

Most developers have a workable knowledge of SQL and are quite familiar with the aggregate functions SUM, COUNT, MIN, MAX and such. Yet I still see logic similar to the following in PL/SQL code:

DECLARE
   l_rows   NUMBER;
BEGIN
   SELECT COUNT(*)
   INTO   l_rows
   FROM   some_table
   WHERE  some_column = some_value;

   dbms_output.put_line ('Rows found: ' || TO_CHAR(l_rows));
   
EXCEPTION
   WHEN NO_DATA_FOUND THEN
      dbms_output.put_line ('No rows found');
END;

To an experienced Oracle developer the defect is quite clear: the SQL statement will never throw the NO_DATA_FOUND exception. If there are no rows in the table that meet the selection criteria then the COUNT(*) function will return a value 0 and so we will always get back a row from the above query.

However, what happens in other scenarios with other aggregate functions? This post explorers some of the basic aspects of aggregate functions that appear to trip up inexperienced application developers.

For the examples in this post I will use the following table and data:

CREATE TABLE agg_demo
   (id      NUMBER(6)    NOT NULL
   ,cat     NUMBER(2)    NOT NULL
   ,val     NUMBER(4)
   ,ts      TIMESTAMP
   )
/

INSERT INTO agg_demo
SELECT ROWNUM AS id
,      MOD(ROWNUM,5) AS cat
,      CASE WHEN TRUNC(ROWNUM/7) = 0
            THEN NULL
            ELSE TRUNC(ROWNUM/7)
       END AS val
FROM   dual
CONNECT BY level <= 100
/

COMMIT
/

So, we have a table with:

  • 100 rows with a unique ID value
  • a CAT column with values that range from 0 to 4
  • a VAL column with values in the range of 0 to 14 that also contains NULLs

Let’s start with a simple row count:

SELECT COUNT(*)
FROM   agg_demo
/

  COUNT(*)
----------
       100

The result is quite obvious: we have 100 rows in our table. Adding a WHERE clause that excludes all rows results in the following:

SELECT COUNT(*)
FROM   agg_demo
WHERE  cat > 20
/

  COUNT(*)
----------
         0

This is the scenario that is similar to the one that I noted at the start of this post. We still get back a result and that result is 0. So, this query will never throw the PL/SQL NO_DATA_FOUND exception. Furthermore, this type of query will never return NULL, i.e. you will always get a value and that value will be an integer of 0 or greater.

When it comes to the COUNT function, there are three common ways that developers use it:

  • COUNT(*)
  • COUNT(1), or some other literal constant
  • COUNT(column)

So, what are the differences in the above:

SELECT COUNT(*) AS "count_*"
,      COUNT(1) AS "count_1"
,      COUNT(val) AS "count_val"
FROM   agg_demo
/

   count_*    count_1  count_val
---------- ---------- ----------
       100        100         94

We can see that COUNT(*) and COUNT(1) returned the same value, 100. Both of these functions are returning a count of the number of rows. As an aside, I have heard people claim that one approach is more efficient than the other but there’s no truth in these claims.

The COUNT(val) returns a different value than the other two COUNT functions, which highlights an important difference in how the COUNT function works. When the COUNT function is applied to a specific column, or any expression, then NULLs are not included in the counted result. So, the result of 94 is the number of non-NULL VAL entries.

Regardless of the way COUNT is expressed, when the WHERE clause eliminates all rows from the table the various COUNT functions will always return a value of 0:

SELECT COUNT(*) AS "count_*"
,      COUNT(1) AS "count_1"
,      COUNT(val) AS "count_val"
FROM   agg_demo
WHERE  cat > 20
/

   count_*    count_1  count_val
---------- ---------- ----------
         0          0          0

It’s interesting to note that COUNT(val) will return a value even if all the rows that the function is applied to contain NULL for the VAL column, i.e. we don’t get back NULL:

SELECT COUNT(*) AS "count_*"
,      COUNT(1) AS "count_1"
,      COUNT(val) AS "count_val"
FROM   agg_demo
WHERE  val IS NULL
/

   count_*    count_1  count_val
---------- ---------- ----------
         6          6          0

The above illustrates that it’s not necessary to include an NVL function around the count of a column in order to force a non-NULL response, as I’ve seen some developers do. For example, the NVL in the following query is unnecessary:

SELECT NVL(COUNT(val),0) AS "count_val"
FROM   agg_demo
WHERE  val IS NULL
/

 count_val
----------
         0

It’s also important to note that the above is very different to the following query:

SELECT COUNT(NVL(val,0)) AS "count_val"
FROM   agg_demo
WHERE  val IS NULL
/

 count_val
----------
         6

In the above query, the NVL function is applied prior to the aggregate COUNT function. This means that the COUNT function is not counting 6 NULLs, which it would normally ignore and therefore report a result of 0. Instead it is counting 6 0’s and returns a count of 6.

Now let’s go back a few steps and add a GROUP BY into the query with no WHERE clause:

SELECT cat
,      COUNT(*) AS "count_*"
,      COUNT(1) AS "count_1"
,      COUNT(val) AS "count_val"
FROM   agg_demo
GROUP  BY
       cat
ORDER  BY
       cat
/

       CAT    count_*    count_1  count_val
---------- ---------- ---------- ----------
         0         20         20         19
         1         20         20         18
         2         20         20         19
         3         20         20         19
         4         20         20         19

Ther’s probably no surprises here, we end up with a count of the rows per CAT for the COUNT(*) and COUNT(1) functions and a count of the non-NULL VAL entries for the COUNT(val) function. Let’s repeat this query with a WHERE clause that eliminates all rows:

SELECT cat
,      COUNT(*) AS "count_*"
,      COUNT(1) AS "count_1"
,      COUNT(val) AS "count_val"
FROM   agg_demo
WHERE  cat > 20
GROUP  BY
       cat
ORDER  BY
       cat
/

no rows selected

Ah, now that’s different. This time we didn’t get any rows back, not even one indicating that the various counts are 0. This is probably where most developers go wrong in assuming that COUNT(*) can return no rows and therefore could potentially raise a NO_DATA_FOUND exception in PL/SQL. Once GROUP BY is included in the SQL then it is possible for the query to return no rows since all rows from the table can be excluded and therefore there is nothing to perform the GROUP BY on.

So far we’ve been looking at the COUNT function. What about other aggregate functions? Let’s give SUM, MIN and AVG a try.

SELECT SUM(val) AS sum_val
,      MIN(val) AS min_val
,      AVG(val) AS avg_val
FROM   agg_demo
/

   SUM_VAL    MIN_VAL    AVG_VAL
---------- ---------- ----------
       679          1 7.22340426

Again, probably no surprises here. For the other aggregate functions we have to specify the column (or expression) on which we are operating, i.e. it makes no sense to try to calculate the average of a row, only a given column (or expression) in the row.

However, on closer consideration, this query does highlight an important point with aggregate functions: NULLs are ignored for the purpose of the calculations performed. For most functions and calculations, the inclusion of NULL will result in a NULL answer. This doesn’t apply to aggregate functions. A simple test illustrates the difference:

SELECT 1 + 2 + NULL AS result
FROM   dual
/

    RESULT
----------


WITH vals AS
   (SELECT 1    AS val FROM dual UNION ALL
    SELECT 2    AS val FROM dual UNION ALL
    SELECT NULL AS val FROM dual)
SELECT SUM(val) AS result
FROM   vals
/

    RESULT
----------
         3
/

In the above test we added the values 1, 2 and NULL using the addition operator (+) and the result returned was NULL. When we used the SUM aggregate function on the same set of values we got a result of 2. So, aggregate functions ignore NULLs when performing their calculations.

Let’s see what happens when we have a WHERE clause that eliminates all rows:

SELECT SUM(val) AS sum_val
,      MIN(val) AS min_val
,      AVG(val) AS avg_val
FROM   agg_demo
WHERE  cat > 20
/

   SUM_VAL    MIN_VAL    AVG_VAL
---------- ---------- ----------

Similar to the COUNT function we get back a result but in this case the results were NULL. This makes sense. COUNT told us that there were no results, i.e. we had a count of 0, so what else should the sum, minimum and average of no values be? However, keep in mind that we did actually get a result so, again, this type of query will not raise a NO_DATA_FOUND exception.

Potentially you may decide that you do not want NULL and instead default to some other value using the NVL function:

SELECT NVL(SUM(val),0) AS sum_val
,      NVL(MIN(val),0) AS min_val
,      NVL(AVG(val),0) AS avg_val
FROM   agg_demo
WHERE  cat > 20
/

   SUM_VAL    MIN_VAL    AVG_VAL
---------- ---------- ----------
         0          0          0

However, this is definitely incorrect from a mathematical perspective and more than likely to be incorrect from a business logic perspective. In a code review I certainly would be questioning the use of this type of logic very, very closely. As an aside, I find it amusing that something that is definitely incorrect mathematically may only be unlikely to be incorrect from a business perspective.

Going back to the GROUP BY example:

SELECT cat
,      SUM(val) AS sum_val
,      MIN(val) AS min_val
,      AVG(val) AS avg_val
FROM   agg_demo
GROUP  BY
       cat
ORDER  BY
       cat
/

       CAT    SUM_VAL    MIN_VAL    AVG_VAL
---------- ---------- ---------- ----------
         0        141          1 7.42105263
         1        130          1 7.22222222
         2        133          1          7
         3        136          1 7.15789474
         4        139          1 7.31578947

We get the sum, minimum and average of the VAL column for each CAT value. There’s nothing unusual here. Likewise, when we add a WHERE clause that eliminates all rows from the table we get:

SELECT cat
,      SUM(val) AS sum_val
,      MIN(val) AS min_val
,      AVG(val) AS avg_val
FROM   agg_demo
WHERE  cat > 20
GROUP  BY
       cat
ORDER  BY
       cat
/

no rows selected

This is the same as for the COUNT function so there should be no surprises here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s