Package: pkg_log

I’ve previously commented that the best way to monitor database processes is for the processes to log their progress. Unlike traditional applications where logs are written to simple text files database processes have at their disposal a far more useful avenue for writing logs; the database itself. This post outlines a simple logging mechanism and shows how it can be combined with other process monitoring features to provide in-depth information into what a process is doing.

I am a firm believer in logging to a database table. The reasoning is simple; at some stage you’ll want to query your logs to find out what is currently happening or what has previously happened. The key word in the previous sentence is “query”… and nothing handles querying of data better than the database itself.

Many Oracle developers seem to think that process instrumentation starts and ends with DBMS_OUTPUT. While this might be suitable for ad-hoc output during the development process it’s mostly useless in an active system where a process is not being executed from the command line.

The following are the basic requirements for the logging mechanism:

  • logs are written to a database table
  • fast and efficient so as not to create significant overhead in the recording of log messages
  • minimalistic interface so it can be called easily
  • support for the classification of different types of log message
  • captures the process attributes of MODULE, ACTION and CLIENT_IDENTIFIER

Table LOGS

The DDL for the LOGS tables is:

  log_id             NUMBER                     NOT NULL,
  log_timestamp      TIMESTAMP(4)               NOT NULL,
  msg_type           VARCHAR2(1)                NOT NULL,
  msg                VARCHAR2(500)              NOT NULL,
  sid                NUMBER                     NOT NULL,
  unique_sid         VARCHAR2(24)               NOT NULL,
  module             VARCHAR2(48),
  action             VARCHAR2(32),
  client_identifier  VARCHAR2(64),
  err_no             NUMBER(7),
  msg_detail         CLOB
 (PARTITION logs_20121231 VALUES LESS THAN (TO_DATE('01/01/2013','dd/mm/yyyy'))

The attributes used in the LOGS table are:

  • LOG_ID – a sequence generated ID. Fairly useless except for differentiating the order that entries were created for a single process
  • LOG_TIMESTAMP – date/time the entry was created
  • MSG_TYPE – contains a value of:
    • D – debug message
    • I – information message
    • W – warning message
    • E – error message
  • MSG – the log message itself
  • SID – the session ID of the session creating the message
  • UNIQUE_SID – the unique session ID, as reported by dbms_session.unique_session_id, which can be useful in a RAC environment
  • MODULE – the MODULE setting for the session
  • ACTION – the ACTION setting for the session
  • CLIENT_IDENTIFIER – the CLIENT_IDENTIFIER setting for the session
  • ERR_NO – the error number associated with any error message entry
  • MSG_DETAIL – an optional large message store. I added this for a project where the developers wanted to store an application stack dump for errors and it seemed like a good think to retain.

Unlike a number of logging mechanisms I have chosen to avoid a numeric “log level” attribute in favour of a type classification that is roughly similar to what is used in some application development environments. The key reason for this is that an outsider is unlikely to know what level 1, 2, 3, etc are meant to be and where they should be used. Sticking with defined concepts might be slightly less flexible but is far more descriptive. The API package described next goes into this further.

Attributes aside, the only other feature of the table is that it’s partitioned by the LOG_TIMESTAMP with each partition being 1 day. The idea here is that we will want to purge old data in the table and dropping partitions is the easiest way to do that. Furthermore, there are no indexes on the table and so the partitioning will help when it comes to querying the data, assuming the queries will be able to restrict themselves to a fairly narrow time period. Obviously if you are not licensed for partitioning then you will need to disable this feature. Potentially sub-partitioning by MSG_TYPE would permit selective retention of the different message types, e.g. debug messages might be purged after a couple of days but error messages might be retained for months.

As noted, there are no indexes on the table. This is deliberate so as to reduce the overhead in creating entries. Querying the LOGS table is not likely to be common activity in a production system but creating entries will be very common.

So, let’s take a look at the API for creating the log entries…

Package PKG_LOG

The primary focus is on a short and simple interface to creating log messages. A its heart PKG_LOG contains 4 logging routines:

  • DEBUG – creates a debug log entry
  • INFO – creates a information log entry. This is the “standard” type of entry
  • WARN – creates a warning log entry
  • ERR – creates an error log entry

With the exception of ERR the routines accept a log message and, optionally, an extended message detail parameter. Creating a standard information message is as simple as: ('Some text to write out to the LOGS table');

The ERR routine takes a log message and error number so calling it will probably resemble:

pkg_log.err (SQLERRM, SQLCODE);

Other routines provided by PKG_LOG include:

  • enable_logging / disable_logging – used to enable and disable message logging entirely (calls to ERR will still be logged though). By default logging is enabled.
  • enable_autonomous / disable_autonomous – used to enable or disable the writing of log message as autonomous transactions. This functionality was introduced to work around issues with autonomous transactions within distributed transactions that Oracle had in some earlier versions so may not be relevant any longer. The default is to use autonomous transactions.
  • set_level_normal / set_level_debug – By default debug messages are not logged. Calling set_level_debug causes debug messages to be logged.
  • logging_enabled – A function that returns TRUE if the logging facility is enabled
  • debug_logging – a function that returns TRUE if the logging facility is set to write out debug messages

As an example of general usage, combining PKG_LOG with PKG_APP_INFO for setting the MODULE and ACTION, PKG_DB_TYPE for standard type definitions and PKG_ERR for logging the last error generated, the normal way I would instrument a routine would be:

   c_proc  CONSTANT pkg_db_type.t_object_name := 'proc';
   pkg_app_info.set_action (c_proc); ('Entering ' || c_proc);
   pkg_log.debug ('Exiting ' || c_proc);
END proc;

So, you can see that routine entry is logged as an INFO message and exit is logged as a DEBUG message. What else would I log? Any important steps and logical flows through the code would receive an INFO message. Likewise, the row count of any bulk SQL statement would get logged. I also include DEBUG message logging for minor steps in a process, which can easily be enabled for when in-depth analysis is required. Naturally exceptions thrown are logged too.

Combining a simple logging mechanism with MODULE, ACTION and CLIENT_IDENTIFER process identifiers will allow you to determine with precision the exact series of steps a process has gone through. Furthermore, the details can be cross referenced against the process history in Active Session History to diagnose most logical and performance related problems. The overhead of setting up effective logging is very low but the benefits it provides are enormous.

Download code shown in this post.

Database Source Control

Here’s hoping that this post is less rant and more useful information as the topic is about database source control (also currently known as “the bane of my life”).

All organisations seem to approach database source control in a different way. Some places don’t even have the database code under source control! Regardless of the source control system used there are some basic principles that need to be addressed in the database source control procedures. Once these are covered then they should go a long way to keeping your database under control. First a basic observation though…

It’s about deployment!

There are many ways to structure the code and scripts in a source control system. From a database perspective it’s all about addressing the deployment of changes. Source control branching can help with concurrent development activities, tags and labels can be used to flag the what and why of changes but all these are just to help identify a set of changes that need to be deployed to the database.

Developers more familiar with non-database systems might not grasp this. After all, source control systems are about versioning code, allowing simultaneous streams of development, etc. These aspects are also applicable to the database as there is quite a bit of code in packages, procedures, functions and triggers. However, the code objects in the database interact closely with the database itself and that’s when things get complex as the database has a persistent state across deployments.

The deployment of typical application code can be seen as “compile version X of the code into an executable and deploy that, overwriting the previous version”. Yes, this is a simplification but generally speaking you’re no longer concerned about the previously deployed version of the application as it’s been overwritten by the new version. A database deployment can’t adopt this strategy. The database structure and, more importantly, the data it contains exists before and after the system change. So, this means that the database is deployed once in its life and then has incremental changes applied to it rather than being deployed from scratch for every system change.

As an example, consider the creation of a table. During the initial deployment of the database a CREATE TABLE script is run. Future deployments might add columns, modify the column data types, add constraints, etc. Each of these changes will be a separate ALTER TABLE script and not a modification to the initial CREATE TABLE script. The end result is that the structure of the table is governed by the initial CREATE TABLE script and the series of modification scripts that have been applied since then.

This incremental change process introduces complication that is easily overlooked. Consider that I have scripted the creation of a database, called version 1.0, to which a series of 5 modifications have been created, versions 1.1 through 1.5. If I have a copy of the database that is currently at version 1.2 then in order to move that instance of the database to version 1.5 I need to apply 1.3, 1.4 and then 1.5 scripts to it. I cannot simply run the 1.5 modification scripts as I will not apply the structural changes that 1.3 and 1.4 introduced. So, the changes required to bring the database up to a particular version are dependent on the starting version of the database. This might seem self-evident but in a typical development environment where there are many database instances at different versions (for development, test, production purposes) it becomes important to track what has and hasn’t been applied to any specific instance. If version tracking is haphazard then working out the changes required can be very difficult, even if everything has been scripted properly.

Code objects are subject to similar concerns. Just like any code that interacts with the database they will be dependent on a particular structure, or version, of the database. COde written for a particlar table structure might not work against a different structure so there exists a link between the code and the database structure that needs to be tracked.

The structure and processes surrounding the source control system need to support the database deployment processes. As previously noted, simply storing DDL scripts that would be used to recreate the database from scratch won’t help with deployments. The creation of scripts to alter the database are required and this is what should go into the source control system. Furthermore scripts for the code objects need to be related to the DDL scripts so that their dependencies can be tracked.

So, with all the above you might think that I will now go on to describe the perfect strategy for managing database code within a source control system, right? Sorry, I have yet to uncover that and suspect that the perfect strategy doesn’t exist. Effective processes will need to be constructed around the specifics of the development practices and the particular source control product used. I can offer some fairly generic of advice though:

  • Identify clearly the scripts needed for a particular release or system change; via tags, labels, or whatever your source control system supports. Make sure you are able to easily identify the specific files and versions of files that make up a release.
  • Try to serialise the changes going on to reduce complexity in the deployment process. As the database DDL takes the database from one version to another, the same scripts might not work against a different version of the database. This means that simultaneous development of different streams of work becomes very hard to manage. The code merging capabilities of most source control systems can help here but will not prevent problems from occurring.
  • The deployment process itself needs to be tested. Far too many developers assume that the deployment process takes care of itself. For large and complex deployments the best strategy might be to clone the target database and test against that, as opposed to using a pre-existing test database that should, in theory, be the equivalent of the target database but no-one is prepared to guarantee it.
  • If the system is to remain live whilst the change is made then it is very important to consider what will happen to objects as changes are made; will other sessions be blocked or fail completely by the deployment activities? Deployment scripts against a live system will be different to those used against a non-live system. For example, adding a constraint to a live table will probably mean creating the constraint in a ENABLED NOVALIDATE state and then validating it to reduce blocking, rather than creating it in a VALIDATED state.
  • Make sure to script all the fine details, like object grants and synonyms. Grants are easy to overlook as they are far too often done in an ad-hoc manner by DBAs during development, as opposed to planned in advance by developers as part of their changes.
  • Don’t allow the deployment scripts to generate unnecessary errors. For example, if the process is to drop a table if it exists prior to recreating it then ensure the DROP TABLE statement doesn’t throw an error message when the table doesn’t exist. A simple DDL API will take care of this kind of thing. You want to be in a position such that if any error thrown during the deployment then ensure it really is an error that needs to be investigated and resolved.
  • Don’t forget the back-out! Backing out DDL changes from a database can get complex. For example dropping a column from a table is a resource intensive process so it is often better to set the column to unusable instead. When back-outs are required there is probably not much time left in the change window so you don’t want to be scrabbling around hacking together scripts at this time. Also remember to test the back-out process too!

Yes, there may be many things in the above points that might appear to be very ideal. However, having seen too many database implementation plans that have as their first step “back up the database” and then have as the back-out strategy “restore the database from backup” it would seem that far too many people don’t give database deployments proper consideration. Database deployments are about controlling change and a source control system is pivotal for that.