Here’s hoping that this post is less rant and more useful information as the topic is about database source control (also currently known as “the bane of my life”).
All organisations seem to approach database source control in a different way. Some places don’t even have the database code under source control! Regardless of the source control system used there are some basic principles that need to be addressed in the database source control procedures. Once these are covered then they should go a long way to keeping your database under control. First a basic observation though…
It’s about deployment!
There are many ways to structure the code and scripts in a source control system. From a database perspective it’s all about addressing the deployment of changes. Source control branching can help with concurrent development activities, tags and labels can be used to flag the what and why of changes but all these are just to help identify a set of changes that need to be deployed to the database.
Developers more familiar with non-database systems might not grasp this. After all, source control systems are about versioning code, allowing simultaneous streams of development, etc. These aspects are also applicable to the database as there is quite a bit of code in packages, procedures, functions and triggers. However, the code objects in the database interact closely with the database itself and that’s when things get complex as the database has a persistent state across deployments.
The deployment of typical application code can be seen as “compile version X of the code into an executable and deploy that, overwriting the previous version”. Yes, this is a simplification but generally speaking you’re no longer concerned about the previously deployed version of the application as it’s been overwritten by the new version. A database deployment can’t adopt this strategy. The database structure and, more importantly, the data it contains exists before and after the system change. So, this means that the database is deployed once in its life and then has incremental changes applied to it rather than being deployed from scratch for every system change.
As an example, consider the creation of a table. During the initial deployment of the database a CREATE TABLE script is run. Future deployments might add columns, modify the column data types, add constraints, etc. Each of these changes will be a separate ALTER TABLE script and not a modification to the initial CREATE TABLE script. The end result is that the structure of the table is governed by the initial CREATE TABLE script and the series of modification scripts that have been applied since then.
This incremental change process introduces complication that is easily overlooked. Consider that I have scripted the creation of a database, called version 1.0, to which a series of 5 modifications have been created, versions 1.1 through 1.5. If I have a copy of the database that is currently at version 1.2 then in order to move that instance of the database to version 1.5 I need to apply 1.3, 1.4 and then 1.5 scripts to it. I cannot simply run the 1.5 modification scripts as I will not apply the structural changes that 1.3 and 1.4 introduced. So, the changes required to bring the database up to a particular version are dependent on the starting version of the database. This might seem self-evident but in a typical development environment where there are many database instances at different versions (for development, test, production purposes) it becomes important to track what has and hasn’t been applied to any specific instance. If version tracking is haphazard then working out the changes required can be very difficult, even if everything has been scripted properly.
Code objects are subject to similar concerns. Just like any code that interacts with the database they will be dependent on a particular structure, or version, of the database. COde written for a particlar table structure might not work against a different structure so there exists a link between the code and the database structure that needs to be tracked.
The structure and processes surrounding the source control system need to support the database deployment processes. As previously noted, simply storing DDL scripts that would be used to recreate the database from scratch won’t help with deployments. The creation of scripts to alter the database are required and this is what should go into the source control system. Furthermore scripts for the code objects need to be related to the DDL scripts so that their dependencies can be tracked.
So, with all the above you might think that I will now go on to describe the perfect strategy for managing database code within a source control system, right? Sorry, I have yet to uncover that and suspect that the perfect strategy doesn’t exist. Effective processes will need to be constructed around the specifics of the development practices and the particular source control product used. I can offer some fairly generic of advice though:
- Identify clearly the scripts needed for a particular release or system change; via tags, labels, or whatever your source control system supports. Make sure you are able to easily identify the specific files and versions of files that make up a release.
- Try to serialise the changes going on to reduce complexity in the deployment process. As the database DDL takes the database from one version to another, the same scripts might not work against a different version of the database. This means that simultaneous development of different streams of work becomes very hard to manage. The code merging capabilities of most source control systems can help here but will not prevent problems from occurring.
- The deployment process itself needs to be tested. Far too many developers assume that the deployment process takes care of itself. For large and complex deployments the best strategy might be to clone the target database and test against that, as opposed to using a pre-existing test database that should, in theory, be the equivalent of the target database but no-one is prepared to guarantee it.
- If the system is to remain live whilst the change is made then it is very important to consider what will happen to objects as changes are made; will other sessions be blocked or fail completely by the deployment activities? Deployment scripts against a live system will be different to those used against a non-live system. For example, adding a constraint to a live table will probably mean creating the constraint in a ENABLED NOVALIDATE state and then validating it to reduce blocking, rather than creating it in a VALIDATED state.
- Make sure to script all the fine details, like object grants and synonyms. Grants are easy to overlook as they are far too often done in an ad-hoc manner by DBAs during development, as opposed to planned in advance by developers as part of their changes.
- Don’t allow the deployment scripts to generate unnecessary errors. For example, if the process is to drop a table if it exists prior to recreating it then ensure the DROP TABLE statement doesn’t throw an error message when the table doesn’t exist. A simple DDL API will take care of this kind of thing. You want to be in a position such that if any error thrown during the deployment then ensure it really is an error that needs to be investigated and resolved.
- Don’t forget the back-out! Backing out DDL changes from a database can get complex. For example dropping a column from a table is a resource intensive process so it is often better to set the column to unusable instead. When back-outs are required there is probably not much time left in the change window so you don’t want to be scrabbling around hacking together scripts at this time. Also remember to test the back-out process too!
Yes, there may be many things in the above points that might appear to be very ideal. However, having seen too many database implementation plans that have as their first step “back up the database” and then have as the back-out strategy “restore the database from backup” it would seem that far too many people don’t give database deployments proper consideration. Database deployments are about controlling change and a source control system is pivotal for that.