Sequence triggers and column defaults

Posted on October 29, 2013 by Mark Hoxey

Tables with simple sequence assigned ID columns for primary keys are quite common. Unfortunately it also seems quite common to use row level triggers to assign those sequence values to the columns. The rationale trotted out for this strategy is that it doesn’t matter what the application layer does, the database will take case of ensuring that the ID is unique and it can’t be bypassed.

The primary problem with using a trigger to assign a sequence value to an ID column is one of performance. Let’s look at a simple test case that shows the overhead incurred in using a trigger to assign the ID column value, followed by a new feature from Oracle 12c that achieves the same end but without the overhead of a trigger.

Our test is to insert 1,000,000 rows into a table. We’ll run through the scenarios of:

ID value is not sequence generated – this is our baseline scenario to show how fast Oracle can copy rows
ID value is sequence assigned as part of the insert statement
ID value is sequence assigned from a trigger
ID value is sequence assigned from a column level default – this is the new 12c feature

The set-up is to create the table we will be testing against, TEST_TABLE, and also a source data table, SOURCE_DATA, which we’ll load with 1,000,000 rows. Each test is to copy the data from SOURCE_DATA to TEST_TABLE. The tests were all conducted on Oracle 12.1.0.1.

CREATE TABLE test_table
   (id      NUMBER(10) NOT NULL
   ,padding CHAR(20)  NOT NULL)
/

CREATE UNIQUE INDEX test_table_pk
   ON test_table (id)
/

ALTER TABLE test_table
   ADD CONSTRAINT test_table_pk
   PRIMARY KEY (id)
/   

CREATE TABLE source_data
   (id, padding)
PCTFREE 0   
AS
SELECT ROWNUM AS id
,      'X' AS padding
FROM   dual
CONNECT BY ROWNUM <= 1000000
/

COMMIT
/

First of all we time how long it takes to copy across the 1,000,000 rows in SOURCE_DATA to TEST_TABLE:

SQL>SET TIMING ON

SQL>INSERT INTO test_table
  2  SELECT *
  3  FROM   source_data
  4  /

1000000 rows created.

Elapsed: 00:00:03.06

Now we measure how long it takes to copy across the rows, assigning ID values from a sequence directly from within the INSERT statement:

SQL>TRUNCATE TABLE test_table
  2  /

Table truncated.

SQL>CREATE SEQUENCE test_table_id_seq
  2     CACHE 1000
  3  /

Sequence created.

SQL>SET TIMING ON

SQL>INSERT INTO test_table
  2  SELECT test_table_id_seq.NEXTVAL
  3  ,      padding
  4  FROM   source_data
  5  /

1000000 rows created.

Elapsed: 00:00:09.68

The insert of the rows went from 3 seconds to just under 10 seconds simply by assigning new ID values from a sequence (the cache size for the sequence was 1,000).

Let’s see what happens if we use a trigger to assign the sequence value to the ID column:

SQL>TRUNCATE TABLE test_table
  2  /

Table truncated.

SQL>CREATE OR REPLACE TRIGGER trg_test_table
  2     BEFORE INSERT ON test_table
  3     FOR EACH ROW
  4  DECLARE
  5  BEGIN
  6     :NEW.id := test_table_id_seq.NEXTVAL;
  7  END trg_test_table;
  8  /

Trigger created.

SQL>SET TIMING ON

SQL>INSERT INTO test_table
  2     (padding)
  3  SELECT padding
  4  FROM   source_data
  5  /

1000000 rows created.

Elapsed: 00:01:29.65

Yikes!!! With the trigger in place it took 1.5 minutes to insert the rows! This emphasises a really important aspect of triggers: they’re performance killers!

So, you have a system that is littered with triggers doing nothing more than assign sequence values to the ID columns… what can you do? Well, in Oracle 12c there’s a new feature that can come to your rescue. Column defaults can now reference sequence NEXTVAL and CURRVAL. Let’s see how this works:

SQL>TRUNCATE TABLE test_table
  2  /

Table truncated.

SQL>DROP TRIGGER trg_test_table
  2  /

Trigger dropped.

SQL>ALTER TABLE test_table
  2    MODIFY id DEFAULT test_table_id_seq.NEXTVAL
  3  /

Table altered.

SQL>SET TIMING ON

SQL>INSERT INTO test_table
  2     (padding)
  3  SELECT padding
  4  FROM   source_data
  5  /

1000000 rows created.

Elapsed: 00:00:07.04

From the above you can see that we’ve added a default to the ID column that is the sequence NEXTVAL. Our timing for the insert is now down to 7 seconds, which is pretty much the same as we got when calling the sequence directly in the insert statement. We’ve achieved the outcome of the trigger with the performance of the direct insert!

However, if you’re thinking “Great, I can get rid of my triggers and replace them with column defaults!” be a little bit careful though. While the trigger and default perform the same function there is a slight difference in them; a trigger might always set the ID column to a new value even if one is explicitly supplied by the application code whereas the default will only set the ID value if one is not supplied. Consider the following, first with the trigger in place:

SQL>CREATE OR REPLACE TRIGGER trg_test_table
  2     BEFORE INSERT ON test_table
  3     FOR EACH ROW
  4  DECLARE
  5  BEGIN
  6     :NEW.id := test_table_id_seq.NEXTVAL;
  7  END trg_test_table;
  8  /

Trigger created.

SQL>INSERT INTO test_table
  2  VALUES (0, 'X')
  3  /

1 row created.

SQL>INSERT INTO test_table
  2  VALUES (0, 'X')
  3  /

1 row created.

Even though the INSERT explicitly provided a value for the ID column it was overridden by the sequence value assigned by the trigger. Contrast that to what would happen with a column default:

SQL>DROP TRIGGER trg_test_table
  2  /

Trigger dropped.

SQL>ALTER TABLE test_table
  2    MODIFY id DEFAULT test_table_id_seq.NEXTVAL
  3  /

Table altered.

SQL>INSERT INTO test_table
  2  VALUES (0, 'X')
  3  /

1 row created.

SQL>INSERT INTO test_table
  2  VALUES (0, 'X')
  3  /
INSERT INTO test_table
*
ERROR at line 1:
ORA-00001: unique constraint (DEVELOPER.TEST_TABLE_PK) violated

Since an ID value was specified the column default is not used. Hopefully you do not have any code that seeks to explicitly set a column value only to be overridden by a trigger assigned value as that would be illogical and confusing. Chances are you will quickly catch this sort of a problem in your test environment if you do so choose to replace your triggers with column defaults.

Check constraint evaluation

Posted on October 15, 2013 by Mark Hoxey

Perhaps one of the most confusing aspects in dealing with relational databases is dealing with the concept of null. In particular, logical expressions involving null can evaluate to one of three possible outcomes, not just to the traditional “true” and “false” but also “unknown”.

When we look at a typical WHERE clause in a SELECT query, a logical expression is deemed to have been successfully passed if the expression outcome is “true”. Outcomes of “false” and “unknown” are rejected. To provide an example, if we want to identify employees with a salary of $2,000 or greater we would write our query as:

SELECT ename
FROM   emp
WHERE  sal >= 2000

As well as not selecting those with a salary less than $2,000, anyone without a salary is excluded from the above query on the basis that the WHERE clause expression has a logical result of “unknown”.

In contrast to the way queries operate, check constraint expressions are passed if they do not evaluate to “false”. Hence a check constraint is successfully passed if the logical expression evaluates to either “true” or “unknown”. Taking our previous example, suppose we want to add a constraint onto our employees table to prevent negative salaries, i.e. we think that that people who pay for the pleasure of turning up to work are crazy and have no place in our organisation, then we might add the following:

ALTER TABLE emp
   ADD CONSTRAINT emp_chk1
   CHECK (sal >= 0)

With this check constraint in place an employee row with a null salary passes the check constraint and is therefore permitted, whilst a negative salary is prohibited:

SQL>INSERT INTO emp
  2     (empno
  3     ,ename
  4     ,job
  5     ,sal)
  6  VALUES
  7    (1234
  8    ,'MARK'
  9    ,'DEVELOPER'
 10    ,NULL -- note the NULL salary
 11    );

1 row created.

SQL>INSERT INTO emp
  2      (empno
  3      ,ename
  4      ,job
  5      ,sal)
  6  VALUES
  7     (2345
  8     ,'CARL'
  9     ,'DEVELOPER'
 10     ,-2000 -- note the negative salary
 11     );
INSERT INTO emp
*
ERROR at line 1:
ORA-02290: check constraint (DEVELOPER.EMP_CHK1) violated

Fast query results in GUIs

Posted on October 1, 2013 by Mark Hoxey

Every now and again I get a developer asking why their query in PL/SQL takes so long to process when their GUI application returns the data so quickly. When asked to demonstrate the problem they provide a query, run it in their GUI of choice, see the first set of results appear and say “there, that was quick”.

What is misunderstood in this situation is that Oracle may not need to resolve the entire result set of a query in order to start returning rows back to the client application. The analogy I tend to use is that of a search engine. You obtain the initial results from a search engine very quickly and generally you will only look at one or two pages of results before trying to refine the search terms. The search engine provides the results back to you in an “on-demand” basis.

Contrast this situation to the processing of query results in some sort of PL/SQL batch process. In this scenario you are going through every single entry returned by the query, performing some action on it. Obtaining just the first few results is pretty much meaningless as the process is not complete until it has trawled through all the results… and you will just have to wait for that to happen.

We can use a simple SQL trick to limit the number of rows returned by a query to illustrate the speed and resource cost in retrieving a subset of rows from a query. First, we create a table with a 1,000,000 rows:

CREATE TABLE fetch_test
   (id      NUMBER(10)   NOT NULL
   ,cat     NUMBER(2)    NOT NULL
   ,padding CHAR(30)     NOT NULL)
/

INSERT INTO fetch_test
SELECT ROWNUM
,      MOD(ROWNUM,10)
,      'X'
FROM   dual
CONNECT BY ROWNUM <= 1000000
/

COMMIT
/

CREATE INDEX fetch_test_ix1
   ON fetch_test (cat)
/

EXEC dbms_stats.gather_table_stats ('','fetch_test')

Next we simply run a query that returns all the rows for a specific category value (5) and another version where we get the first 20 entries back to simulate the fetching done by a GUI application. In both situations we time the response and also track the execution statistics via AUTOTRACE:

SQL>SET ARRAYSIZE 500
SQL>SET AUTOTRACE TRACEONLY
SQL>SET TIMING ON

SQL>SELECT /*+ index (f (cat)) */
  2         *
  3  FROM   fetch_test f
  4  WHERE  cat = 5
  5  /

100000 rows selected.

Elapsed: 00:00:00.31

Execution Plan
----------------------------------------------------------
Plan hash value: 1955820726

----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |   100K|  3816K|  6274   (1)| 00:01:16 |
|   1 |  TABLE ACCESS BY INDEX ROWID| FETCH_TEST     |   100K|  3816K|  6274   (1)| 00:01:16 |
|*  2 |   INDEX RANGE SCAN          | FETCH_TEST_IX1 |   100K|       |   198   (1)| 00:00:03 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CAT"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6645  consistent gets
          0  physical reads
          0  redo size
    4436704  bytes sent via SQL*Net to client
       2688  bytes received via SQL*Net from client
        201  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

SQL>SELECT *
  2  FROM   (SELECT /*+ index (f (cat)) */
  3                 *
  4          FROM   fetch_test f
  5          WHERE  cat = 5)
  6  WHERE  ROWNUM <= 20
  7  /

20 rows selected.

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3438997906

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |    20 |   780 |     5   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY               |                |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| FETCH_TEST     |    21 |   819 |     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | FETCH_TEST_IX1 |   100K|       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=20)
   3 - access("CAT"=5)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1550  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         20  rows processed

SQL>SET TIMING OFF
SQL>SET AUTOTRACE OFF

It took 0.31 seconds to retrieve all 10,000 results for the query whereas obtaining the first 20 of those entries was virtually instantaneous. A look at the consistent gets statistics reveals the reason behind the difference in time. Only 7 consistent gets was required to obtain the first 20 results and 6,645 were required to obtain all results. When retrieving the first 20 entries, Oracle simply stopped retrieving any more data once it had the required 20 rows.

This example shows us that in order to return the first “batch” of results from our query Oracle only performed a small amount of the effort required to process the entire result set. This is why queries run in GUI applications might seem to be faster than when run in some sort of PL/SQL batch mode. Furthermore, I hope it also illustrates that you cannot gauge the performance of a query simply by how long it takes for results to appear in a GUI.

In a GUI situation Oracle would be holding an open cursor, ready to fetch the next batch of rows. If you encounter this situation then my suggestion is to page all the way to the end of the result set and see if it really is as fast as you think. You’ll notice that as you page down the query results the GUI pauses every now and again. This is where it’s fetching more results back from the database and Oracle is having to actually figure out what those results are, i.e. continue to generate them as they are requested. Once you get to the end of the result set you’ll find that it doesn’t really differ much from the batch process that had to process all the results.

So how many rows are returned with each fetch operation a GUI does (or any client for that matter)? This is controlled by the client application itself. If you’re using a programming API you’ll find a property to control this, such as the fetch size in JDBC and ODP.Net. Below are are some details for controlling the fetch size in SQL*Plus, Quest TOAD and Oracle SQL Developer. With modern servers and networks, a reasonable setting would probably be between 100 to 500.

Quest TOAD

The fetch size in Quest TOAD is configured in the TOAD options by selecting the Oracle | General category in the left-hand options menu. The setting is named “OCI Array Buffer size” and defaults to 500.

Oracle SQL Developer

Control of the rows fetched in Oracle’s SQL Developer is done in the Preferences (found in the Tools menu), under the Database | Advanced category.

SQL*Plus

The number of rows fetched at a time in SQL*Plus is determined via the ARRAYSIZE parameter. By default 15 rows are fetched at a time but this can be changed by issuing the command:

SET ARRAYSIZE n

Where ‘n’ is a value from 1 to 5000. Keep in mind that the default value can be overridden in the glogin.sql or login.sql configuration files.

Mark Hoxey

Musings on Oracle Database Development

Monthly Archives: October 2013

Sequence triggers and column defaults

Check constraint evaluation

Fast query results in GUIs