When supporting an application, one thing that really irritates me is being confronted by a complex query where the developer has not qualified the column references; either with the table name or, as is usually the case, a table alias. For example, if you were to take the following query:
SELECT col_1 , col_2 , col_3 FROM tab_1 , tab_2 WHERE id1 = id2
you would have no idea which table the referenced columns come from. As a result, the first step for analysis is to cross reference the columns used by the query to the columns defined in the table and rewrite the query in the following manner:
SELECT t2.col_1 , t2.col_2 , t1.col_3 FROM tab_1 t1 , tab_2 t2 WHERE t1.id1 = t2.id2
“Does it really matter? My query still runs!” I hear you saying. My response would be that while Oracle itself doesn’t care, anything that makes code clearer and more readable should be followed. The second query above leaves the reader with no doubt as to where the columns are originating and requires very, very little effort. In a complex query this also conveys useful information regarding the intent of the query, e.g. a column selected from the employees table implies that it is a function of the employee, as opposed to another table that might be selected in the same query.
“Yeah, yeah… so the query could be a bit more readable. Big deal.” is the type of response I typically get concerning calls for greater code clarity. However, the lack of qualified column naming has implications beyond code clarity that can lead to more serious problems. The following scenarios illustrate a couple of potential problems and are based on 2 tables, defined as follows. It should also be noted that these scenarios were generated on Oracle 11.2.0.3.
CREATE TABLE table_1 (id NUMBER(10) NOT NULL ,cat NUMBER(10) NOT NULL ,val NUMBER(10) NOT NULL ,CONSTRAINT table_1_pk PRIMARY KEY (id)) / CREATE TABLE table_2 (id NUMBER(10) NOT NULL ,cat NUMBER(10) NOT NULL ,val NUMBER(10) NOT NULL ,CONSTRAINT table_2_pk PRIMARY KEY (id)) / INSERT INTO table_1 SELECT ROWNUM , MOD(ROWNUM,5) , ROWNUM FROM dual CONNECT BY LEVEL <= 10 / INSERT INTO table_2 SELECT ROWNUM , MOD(ROWNUM,2) , 1 FROM dual CONNECT BY LEVEL <= 5 / COMMIT / EXEC dbms_stats.gather_table_stats ('','table_1') EXEC dbms_stats.gather_table_stats ('','table_2')
The data in each table is:
SELECT * FROM table_1 ORDER BY id / ID CAT VAL ---------- ---------- ---------- 1 1 1 2 2 2 3 3 3 4 4 4 5 0 5 6 1 6 7 2 7 8 3 8 9 4 9 10 0 10 SELECT * FROM table_2 ORDER BY id / ID CAT VAL ---------- ---------- ---------- 1 1 1 2 0 1 3 1 1 4 0 1 5 1 1
First up, let’s set the scene. A simple SELECT query without aliases on the columns in the SELECT list:
SELECT id , cat , val FROM table_1 , table_2 WHERE cat = cat / WHERE cat = cat * ERROR at line 5: ORA-00918: column ambiguously defined
There should be no surprised here I think. Both tables contain columns named ID, CAT and VAL so Oracle simply doesn’t know which columns are being referenced in the SELECT and WHERE parts of the query when these columns are referenced. The solution is to qualify the column references with either the table names or, as is more common, table aliases:
Now, let’s take a look at a different query:
SELECT * FROM table_1 WHERE cat IN (SELECT cat FROM table_2) ORDER BY id / ID CAT VAL ---------- ---------- ---------- 1 1 1 5 0 5 6 1 6 10 0 10
Here we have all the entries from TABLE_1 where the CAT column value exists in TABLE_2. No qualification of the columns is required as Oracle has correctly determined the appropriate columns to use. However, let’s suppose that some developer is making application changes and decides to drop or rename the CAT column in TABLE_2. Let’s see what happens to our query:
ALTER TABLE table_2 RENAME COLUMN cat TO new_cat / Table altered. SELECT * FROM table_1 WHERE cat IN (SELECT cat FROM table_2) ORDER BY id / ID CAT VAL ---------- ---------- ---------- 1 1 1 2 2 2 3 3 3 4 4 4 5 0 5 6 1 6 7 2 7 8 3 8 9 4 9 10 0 10
Hmmmm… not only did the query continue to run without error, even though we’ve renamed a column that was used in it, it produced a completely different result. The reason that the query did not error is that Oracle was still able to resolve the CAT reference in the subquery despite CAT being removed from TABLE_2. When a subquery contains a reference to an identifier that is not local to a subquery then Oracle will expand its search outside of the subquery to th emain query. So, Oracle was able to resolve the unqualified CAT reference in the subquery to the CAT column in TABLE_1. If we were to rewrite our query with table aliases, this is what Oracle executed:
SELECT t1.* FROM table_1 t1 WHERE t1.cat IN (SELECT t1.cat FROM table_2 t2) ORDER BY t1.id
This represents one of the worst type of database errors; a query that runs successfully but produces an incorrect result. If we were to have used table aliases in our original query then we would not have encountered this problem as Oracle would have flagged an error, as shown below:
SELECT t1.* FROM table_1 t1 WHERE t1.cat IN (SELECT t2.cat FROM table_2 t2) ORDER BY t1.id / WHERE t1.cat IN (SELECT t2.cat FROM table_2 t2) * ERROR at line 3: ORA-00904: "T2"."CAT": invalid identifier
So, if this query were contained in a package or procedure then when the developer renamed the column the package or procedure would have gone invalid, thereby flagging the problem immediately.
Now let’s take a look at another scenario, this time involving a merge statement. In this scenario we want to merge the contents of TABLE_2 into TABLE_1 based on matching ID columns. We’ll add new rows if no corresponding ID row exists in TABLE_1, otherwise we’ll increment the VAL column of TABLE_1 with the VAL column in TABLE_2. The query could be written as:
MERGE INTO table_1 dest USING (SELECT * FROM table_2) src ON (src.id = dest.id) WHEN MATCHED THEN UPDATE SET val = val + src.val WHEN NOT MATCHED THEN INSERT (id, cat, val) VALUES (src.id, src.cat, src.val) / 5 rows merged.
Giving the following results:
SELECT * FROM table_1 ORDER BY id / ID CAT VAL ---------- ---------- ---------- 1 1 2 2 2 2 3 3 2 4 4 2 5 0 2 6 1 6 7 2 7 8 3 8 9 4 9 10 0 10
If you refer back to the initial table data earlier in this post you will easily be able to see the problem; the VAL column for the rows with ID values of 1 through 5 are wrong. These rows all have an ID value of 2, instead of 1 greater than their initial value. It seems that once again we have a scoping issue in our query. The problem is in the update part of the merge. Instead of executing:
... WHEN MATCHED THEN UPDATE SET val = dest.val + src.val ...
Oracle has chosen instead to do:
... WHEN MATCHED THEN UPDATE SET val = src.val + src.val ...
which has set the TABLE_1 VAL column to twice the TABLE_2 VAL column. To me, this behaviour is a borderline bug. The VAL columns from both tables are obviously within scope and, unlike the subquery situation, it’s not apparent that one column should have precedence over the other. So, I would have thought that Oracle would raise an exception along the lines of “ORA-00918: column ambiguously defined”. Instead, Oracle has given precedence to the TABLE_2 reference. However, given the clear ambiguity of the unqualified reference in the query, a developer should really look first at their coding practices rather than trying to place any blame for this problem on the behaviour of Oracle.
So if clarity of code isn’t enough to justify qualifying column references then I hope that the above problem scenarios will have. Having had to spend time debugging both of the scenarios noted above I can honestly state that the few seconds required to qualify the column references would have paid off many, many times over. My advice is quite simple: always use qualified references in your queries.
I totally agree.