Existence checks using COUNT(*)

One thing guaranteed to draw my ire in a code review is the use of COUNT(*) to test for the existence of data. This can be done in a number of ways. For example, a simple count via a SELECT statement and then checking if the result was 1 or greater:

...
SELECT COUNT(*)
INTO   l_rows_exist
FROM   some_table
WHERE  ... blah,blah,blah;

IF (l_rows_exist >= 1) THEN
   ... blah,blah,blah;
END IF;

or as some sort of logical sub-query construct inside a query:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah) >= 1

Unless there’s some constraint involved that will guarantee that the query will return at the most one row counting all entries to simply detect if one entry exists is asking for trouble. Oracle will cheerfully table scan a billion row table looking for all rows that satisfy the query even after it found a matching entry in the first block examined.

Let’s see that in action using a 1,000,000 row table, using AUTOTRACE to obtain some statistics:

SQL>CREATE TABLE lots_of_rows
  2     (id      NUMBER(10) NOT NULL
  3     ,cat     NUMBER(2)  NOT NULL
  4     ,padding CHAR(30)   NOT NULL)
  5  /

Table created.

SQL>-- generate 1,000,000 rows
SQL>INSERT INTO lots_of_rows
  2  WITH row_src AS
  3     (SELECT ROWNUM r FROM dual CONNECT BY ROWNUM <= 1000)
  4  SELECT ROWNUM
  5  ,      MOD(ROWNUM,100)
  6  ,      'X'
  7  FROM   row_src r2
  8  ,      row_src r2
  9  /

1000000 rows created.

SQL>COMMIT
  2  /

Commit complete.

SQL>EXEC dbms_stats.gather_table_stats ('','lots_of_rows')

PL/SQL procedure successfully completed.

SQL>SET AUTOTRACE TRACEONLY

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) >= 1
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 3376271402

------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |     1 |       |  1652   (1)| 00:00:01 |
|*  1 |  FILTER             |              |       |       |            |          |
|   2 |   FAST DUAL         |              |     1 |       |     2   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |              |     1 |     3 |            |          |
|*  4 |    TABLE ACCESS FULL| LOTS_OF_ROWS | 10000 | 30000 |  1650   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( (SELECT COUNT(*) FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42)>=1)
   4 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6069  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

The execution plan above clearly shows a full table scan and the consistent gets statistic would confirm that the entire table was indeed scanned. The solution to this situation is trivial: rewrite the query to use EXISTS. In our example above, this would be to use:

SELECT ... blah,blah,blah
FROM   some_table
WHERE  EXISTS (SELECT COUNT(*) FROM some_other_table WHERE ... blah,blah,blah)

The impact of this simple change can be seen using AUTOTRACE:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  EXISTS(SELECT 'X'
  4                FROM   lots_of_rows
  5                WHERE  cat = 42)
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

While the execution plan still shows a full table scan the consistent gets has dropped to just 3. This would have been the number that Oracle performed before it found the fist “42” entry and stopped scanning the table. The query also has be advantage of being better documented as EXISTS describes perfectly what if being sought.

However, perhaps my ire at the use of COUNT(*) is somewhat misplaced. The Oracle optimiser is capable of detecting subqueries in the form of:

(SELECT COUNT(*) FROM ....) > 0

and automatically translate it into the equivalent EXISTS form:

EXISTS (SELECT 'X' FROM ....)

We can see this in action using our million row table:

SQL>SELECT 'X'
  2  FROM   dual
  3  WHERE  (SELECT COUNT(*)
  4          FROM   lots_of_rows
  5          WHERE  cat = 42) > 0
  6  /


Execution Plan
----------------------------------------------------------
Plan hash value: 4135382906

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |     1 |       |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |              |       |       |            |          |
|   2 |   FAST DUAL        |              |     1 |       |     2   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| LOTS_OF_ROWS |     2 |     6 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LOTS_OF_ROWS" "LOTS_OF_ROWS" WHERE
              "CAT"=42))
   3 - filter("CAT"=42)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So, while we wrote the query to use SELECT COUNT(*) the predicate information section in the above AUTOTRACE output shows that Oracle actually used EXISTS. Note, this rewrite only occurs if the test is “> 0”. The rewrite will not take place using a test of “>= 1”, which is what the original example above used.

I’m not sure what version of Oracle that this automatic rewrite appeared. It certainly works in 11.2 and above. It’s a nice feature but I would still argue that it would not be good practice to rely of this kind of thing and to explicitly write the query using EXISTS.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s