Point in time extracts

Extracts from a database are a common activity for any system. In our information driven world we are constantly extracting data for reporting systems, data warehouses, extracts for other systems or simple reports. Naturally, getting data out of an Oracle database is as simple as writing one or more SQL queries. Things can get a bit tricky when we need to consider the consistency of the data we’re extracting.

As an example of data extract inconsistency, one place I worked at had a requirement to extract data into CSV files for a third party on a daily basis. Primarily the data consisted of account details and transactions on those accounts. The third party started to complain that the account balances did not match up with the transactions that were reported. The cause for the difference was easily diagnosed. The overall process simply ran one query after another dumping the results to a CSV file. The transaction extract occurred after the account details one. By the time the account details extract had finished new transactions had made their way into the system and were picked up by the transaction extract. These new transactions were not reflected in the account balances of the account extract. What the extract process required was to do all the extracts as at the same point in time and Oracle provides this functionality via flashback query.

Flashback query has been part of Oracle since version 9. In it’s simplest form it’s a matter of specifying the point in time that you want to query a table, e.g.:

SELECT *
FROM   transactions AS OF TIMESTAMP CAST(TRUNC(SYSDATE) AS TIMESTAMP)

which will display the details of the TRANSACTION table as at the start of the day. Activity on the table since the start of the day will be rolled back for the purpose of the query. Using the above approach it’s possible to construct a set of queries that are consistent at the same point in time, which is ideal for extracts that need to be data consistent with each other.

Using PKG_CSV, creating a data consistent set of extracts is as simple as:

DECLARE
   l_directory         VARCHAR2(30) := 'EXTRACT_DIR';
   l_extract_timestamp TIMESTAMP     := CAST(TRUNC(SYSDATE) AS TIMESTAMP);
   l_emp_cur           SYS_REFCURSOR;
   l_dept_cur          SYS_REFCURSOR;
BEGIN
   OPEN l_emp_cur
   FOR
   SELECT * FROM emp AS OF TIMESTAMP l_extract_timestamp;
   
   OPEN l_dept_cur
   FOR
   SELECT * FROM dept AS OF TIMESTAMP l_extract_timestamp;
   
   pkg_csv.to_file (p_cursor      => l_emp_cur
                   ,p_directory   => l_directory
                   ,p_filename    => 'emp_' || TO_CHAR(l_extract_timestamp,'yyyymmddhh24miss') || '.csv');
   pkg_csv.to_file (p_cursor      => l_dept_cur
                   ,p_directory   => l_directory
                   ,p_filename    => 'dept_' || TO_CHAR(l_extract_timestamp,'yyyymmddhh24miss') || '.csv');
END;

So how far back can you go with this type of query? Oracle uses the undo segments (or rollback) to work out what the data looked like in the past. The obvious question is then “how much undo does Oracle retain?”. Unfortunately the answer to this isn’t clear cut. Generally speaking, Oracle will retain undo for at least the duration specified by the initialisation parameter UNDO_RETENTION, which is specified in seconds. This parameter defaults to 900, or 15 minutes, although many systems will have this set to a larger value. The Oracle documentation contains this warning on the way Oracle manages the undo available:

The UNDO_RETENTION parameter can only be honored if the current undo tablespace has enough space. If an active transaction requires undo space and the undo tablespace does not have available space, then the system starts reusing unexpired undo space. This action can potentially cause some queries to fail with a “snapshot too old” message

You can read more about flashback query in the Oracle documentation.

Strongly typed cursor variables

Arguably the easiest way to return a set of data back to a client application from PL/SQL is via a cursor variable. Oracle simply opens a cursor to a query and the application pulls data from the cursor on demand. Conveniently Oracle provides a generic SYS_REFCURSOR data type that can be used for any cursor variable, as shown below to retrieve employee details:

CREATE OR REPLACE PACKAGE pkg_emp
AS
   PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR);
END pkg_emp;
/
   
CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR)
AS
BEGIN
   OPEN p_emp_cur
   FOR
   SELECT id
   ,      first_name
   ,      last_name
   ,      dept_id
   FROM   emp;
END get_emps;

END pkg_emp;
/

While the above is very quick to code it suffers from a significant flaw; an ill-defined contract between the provider of the data (the database via the PL/SQL routine) and the consumer of the data (whatever application is calling the routine). The key problem here is that the SYS_REFCURSOR type can be used with any query. In this regard it’s like having an “any data” type in an application programming language.

If I were to modify the query inside the get_emps routine to return the department name then I could simply compile a new version as:

CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

   PROCEDURE get_emps (p_emp_cur OUT SYS_REFCURSOR)
   AS
   BEGIN
      OPEN p_emp_cur
      FOR
      SELECT e.id
      ,      e.first_name
      ,      e.last_name
      ,      e.dept_id
      ,      d.dept_name
      FROM   emp e
      ,      dept d
      WHERE  d.id = e.dept_id;
   END get_emps;

END pkg_emp;
/

All done, right? Well, although this code will compile and run correctly I have probably inadvertently broken any application that was using it. The code does nothing to enforce the structure of the cursor that I am returning so any query is deemed acceptable. From Oracle’s perspective this is fine but it won’t be okay to the consumers of the cursor, which will be expecting a cursor of a particular structure.

A better way to code the above routine is to use a strongly typed cursor, which declares in advance the attributes and data types that the cursor will contain. If the cursor is opened using a query that does not match the definition then a compile time error is thrown. Sure, the definition of the strongly type cursor could be altered to match the new query but the error acts as a warning to the database developer that they might be about to break the application.

So, how do we create a strongly typed cursor variable? My preferred approach is to use a record type defined in the package specification as shown below:

CREATE OR REPLACE PACKAGE pkg_emp
AS
   TYPE t_emp_rec IS RECORD (id         emp.id%TYPE
                            ,first_name emp.first_name%TYPE
                            ,last_name  emp.last_name%TYPE
                            ,dept_id    emp.dept_id%TYPE);
   TYPE t_emp_cur IS REF CURSOR RETURN t_emp_rec;                          

   PROCEDURE get_emps (p_emp_cur OUT t_emp_cur);
END pkg_emp;
/
   
CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

PROCEDURE get_emps (p_emp_cur OUT t_emp_cur)
AS
BEGIN
   OPEN p_emp_cur
   FOR
   SELECT id
   ,      first_name
   ,      last_name
   ,      dept_id
   FROM   emp;
END get_emps;

END pkg_emp;
/

The record type is useful for “self documenting” the code as shows what data will be returned by the cursor without having to go into the package body and read the query. It’s a little bit of extra typing but now when I try to modify the package body and replace DEPT_ID with DEPT_NAME I get the following compilation error:

CREATE OR REPLACE PACKAGE BODY pkg_emp
AS

   PROCEDURE get_emps (p_emp_cur OUT t_emp_cur)
   AS
   BEGIN
      OPEN p_emp_cur
      FOR
      SELECT e.id
      ,      e.first_name
      ,      e.last_name
      ,      e.dept_id
      ,      d.dept_name
      FROM   emp e
      ,      dept d
      WHERE  d.id = e.dept_id;
   END get_emps;

END pkg_emp;
/

Warning: Package Body created with compilation errors.

SQL> show errors
Errors for PACKAGE BODY PKG_EMP:

LINE/COL ERROR
-------- ---------------------------------------------
7/7      PL/SQL: SQL Statement ignored
9/7      PLS-00382: expression is of wrong type

The “expression is of wrong type” is telling me that my query no longer matches the record declaration in the package specification. Time for me to revise my code… or change the record type and go searching for the application code that calls this routine and ensure it will continue to work with the new cursor. In a large, complex multi-tier system additional checks and balances like strongly typed cursors are invaluable to prevent inadvertent mistakes.