Qualified Naming

When supporting an application, one thing that really irritates me is being confronted by a complex query where the developer has not qualified the column references; either with the table name or, as is usually the case, a table alias. For example, if you were to take the following query:

SELECT col_1
,      col_2
,      col_3
FROM   tab_1
,      tab_2
WHERE  id1 = id2

you would have no idea which table the referenced columns come from. As a result, the first step for analysis is to cross reference the columns used by the query to the columns defined in the table and rewrite the query in the following manner:

SELECT t2.col_1
,      t2.col_2
,      t1.col_3
FROM   tab_1 t1
,      tab_2 t2
WHERE  t1.id1 = t2.id2

“Does it really matter? My query still runs!” I hear you saying. My response would be that while Oracle itself doesn’t care, anything that makes code clearer and more readable should be followed. The second query above leaves the reader with no doubt as to where the columns are originating and requires very, very little effort. In a complex query this also conveys useful information regarding the intent of the query, e.g. a column selected from the employees table implies that it is a function of the employee, as opposed to another table that might be selected in the same query.

“Yeah, yeah… so the query could be a bit more readable. Big deal.” is the type of response I typically get concerning calls for greater code clarity. However, the lack of qualified column naming has implications beyond code clarity that can lead to more serious problems. The following scenarios illustrate a couple of potential problems and are based on 2 tables, defined as follows. It should also be noted that these scenarios were generated on Oracle 11.2.0.3.

CREATE TABLE table_1
   (id      NUMBER(10) NOT NULL
   ,cat     NUMBER(10) NOT NULL
   ,val     NUMBER(10) NOT NULL
   ,CONSTRAINT table_1_pk PRIMARY KEY (id))
/

CREATE TABLE table_2
   (id        NUMBER(10) NOT NULL
   ,cat       NUMBER(10) NOT NULL
   ,val       NUMBER(10) NOT NULL
   ,CONSTRAINT table_2_pk PRIMARY KEY (id))
/

INSERT INTO table_1
SELECT ROWNUM
,      MOD(ROWNUM,5)
,      ROWNUM
FROM   dual
CONNECT BY LEVEL <= 10
/

INSERT INTO table_2
SELECT ROWNUM
,      MOD(ROWNUM,2)
,      1
FROM   dual
CONNECT BY LEVEL <= 5
/

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','table_1')
EXEC dbms_stats.gather_table_stats ('','table_2')

The data in each table is:

SELECT *
FROM   table_1
ORDER  BY
       id
/

        ID        CAT        VAL
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3
         4          4          4
         5          0          5
         6          1          6
         7          2          7
         8          3          8
         9          4          9
        10          0         10

SELECT *
FROM   table_2
ORDER  BY
       id
/

        ID        CAT        VAL
---------- ---------- ----------
         1          1          1
         2          0          1
         3          1          1
         4          0          1
         5          1          1

First up, let’s set the scene. A simple SELECT query without aliases on the columns in the SELECT list:

SELECT id
,      cat
,      val
FROM   table_1
,      table_2
WHERE  cat = cat
/

WHERE  cat = cat
             *
ERROR at line 5:
ORA-00918: column ambiguously defined

There should be no surprised here I think. Both tables contain columns named ID, CAT and VAL so Oracle simply doesn’t know which columns are being referenced in the SELECT and WHERE parts of the query when these columns are referenced. The solution is to qualify the column references with either the table names or, as is more common, table aliases:

Now, let’s take a look at a different query:

SELECT *
FROM   table_1
WHERE  cat IN (SELECT cat FROM table_2)
ORDER  BY
       id
/

        ID        CAT        VAL
---------- ---------- ----------
         1          1          1
         5          0          5
         6          1          6
        10          0         10

Here we have all the entries from TABLE_1 where the CAT column value exists in TABLE_2. No qualification of the columns is required as Oracle has correctly determined the appropriate columns to use. However, let’s suppose that some developer is making application changes and decides to drop or rename the CAT column in TABLE_2. Let’s see what happens to our query:

ALTER TABLE table_2
RENAME COLUMN cat TO new_cat
/

Table altered.

SELECT *
FROM   table_1
WHERE  cat IN (SELECT cat FROM table_2)
ORDER  BY
       id
/

        ID        CAT        VAL
---------- ---------- ----------
         1          1          1
         2          2          2
         3          3          3
         4          4          4
         5          0          5
         6          1          6
         7          2          7
         8          3          8
         9          4          9
        10          0         10

Hmmmm… not only did the query continue to run without error, even though we’ve renamed a column that was used in it, it produced a completely different result. The reason that the query did not error is that Oracle was still able to resolve the CAT reference in the subquery despite CAT being removed from TABLE_2. When a subquery contains a reference to an identifier that is not local to a subquery then Oracle will expand its search outside of the subquery to th emain query. So, Oracle was able to resolve the unqualified CAT reference in the subquery to the CAT column in TABLE_1. If we were to rewrite our query with table aliases, this is what Oracle executed:

SELECT t1.*
FROM   table_1 t1
WHERE  t1.cat IN (SELECT t1.cat FROM table_2 t2)
ORDER  BY
       t1.id

This represents one of the worst type of database errors; a query that runs successfully but produces an incorrect result. If we were to have used table aliases in our original query then we would not have encountered this problem as Oracle would have flagged an error, as shown below:

SELECT t1.*
FROM   table_1 t1
WHERE  t1.cat IN (SELECT t2.cat FROM table_2 t2)
ORDER  BY
       t1.id
/
       
WHERE  t1.cat IN (SELECT t2.cat FROM table_2 t2)
                         *
ERROR at line 3:
ORA-00904: "T2"."CAT": invalid identifier

So, if this query were contained in a package or procedure then when the developer renamed the column the package or procedure would have gone invalid, thereby flagging the problem immediately.

Now let’s take a look at another scenario, this time involving a merge statement. In this scenario we want to merge the contents of TABLE_2 into TABLE_1 based on matching ID columns. We’ll add new rows if no corresponding ID row exists in TABLE_1, otherwise we’ll increment the VAL column of TABLE_1 with the VAL column in TABLE_2. The query could be written as:

MERGE INTO table_1 dest
USING (SELECT *
       FROM   table_2) src
ON (src.id = dest.id)
WHEN MATCHED THEN UPDATE
SET val = val + src.val
WHEN NOT MATCHED THEN INSERT
  (id, cat, val)
VALUES
  (src.id, src.cat, src.val)
/

5 rows merged.

Giving the following results:

SELECT *
FROM   table_1
ORDER  BY
       id
/

        ID        CAT        VAL
---------- ---------- ----------
         1          1          2
         2          2          2
         3          3          2
         4          4          2
         5          0          2
         6          1          6
         7          2          7
         8          3          8
         9          4          9
        10          0         10

If you refer back to the initial table data earlier in this post you will easily be able to see the problem; the VAL column for the rows with ID values of 1 through 5 are wrong. These rows all have an ID value of 2, instead of 1 greater than their initial value. It seems that once again we have a scoping issue in our query. The problem is in the update part of the merge. Instead of executing:

...
WHEN MATCHED THEN UPDATE
SET val = dest.val + src.val
...

Oracle has chosen instead to do:

...
WHEN MATCHED THEN UPDATE
SET val = src.val + src.val
...

which has set the TABLE_1 VAL column to twice the TABLE_2 VAL column. To me, this behaviour is a borderline bug. The VAL columns from both tables are obviously within scope and, unlike the subquery situation, it’s not apparent that one column should have precedence over the other. So, I would have thought that Oracle would raise an exception along the lines of “ORA-00918: column ambiguously defined”. Instead, Oracle has given precedence to the TABLE_2 reference. However, given the clear ambiguity of the unqualified reference in the query, a developer should really look first at their coding practices rather than trying to place any blame for this problem on the behaviour of Oracle.

So if clarity of code isn’t enough to justify qualifying column references then I hope that the above problem scenarios will have. Having had to spend time debugging both of the scenarios noted above I can honestly state that the few seconds required to qualify the column references would have paid off many, many times over. My advice is quite simple: always use qualified references in your queries.

Advertisements

One thought on “Qualified Naming

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s