Package: PKG_FTP

On a previous system I worked with there were a lot of data transfers involving CSV files. Each day batch processes would load files off the file system into the database and other batch processes would extract data from the database into CSV files. FTP was the transport mechanism for getting the files onto and off the database file system. Shell scripts coordinated the processing by performing the FTP operations and then invoking database stored procedures for processing the transferred files.

Given that the shell scripts did very little other than to shuttle the data files between whatever external system lay on the other side of the FTP connection and the local file system it seemed to me that the processing could be simplified by removing the shell scripts and have the database perform the FTP operations direct. A quick search of the internet told me that others had successfully managed to script up FTP capabilities in PL/SQL so, like any developer, I set out to do something similar. I set myself the following objectives:

  • A standalone component that was not dependent on the specifics of system itself, i.e. a utility package
  • The interface needed to mimic the basic FTP commands, e.g. GET, PUT, etc.
  • The ability to GET/PUT to and from local files
  • The ability to GET into LOBs
  • The ability to stream a text file GET via pipelined function
  • The ability to PUT a cursor
  • Instrumented for traceability and debugging purposes

The end result was a package imaginatively named PKG_FTP. Unfortunately I moved onto a new job before fully completing the objective of the package and it’s been languishing in my code toolbox ever since. So, I’m releasing it into the wilds in case any Oracle developer finds it interesting or useful.

Installation

Installation of the PKG_FP is quite simple:

  • The installation account requires execute on the in-built package UTL_TCP
  • As the code performs TCP operations, the installation account requires an ACL permitting network access
  • PKG_FTP uses my previously posted PKG_CSV in order to format the output of cursors when PUTting them to remote hosts. The easiest approach is to install both packages in the same account. If you are using Oracle 10g or earlier then a version of PKG_FTP is available with slightly reduced functionality. See the end of this post for details.
  • The nested table T_TEXT_TAB object type is required by certain routines
  • If local file system operations are required then the installation account will require read and/or write on the directory objects referencing those directories

Once the above have been set up then simply compile the code into the selected account.

A series of test scripts is available to test the operation of PKG_FTP. Obviously the details of the remote host and file system details will need customising. These details are contained in variables at the start of each script for easy manipulation. I suggest trying these out first to test that the package is functioning correctly and to get an basic understanding of how the package works.

Operation

It’s probably best to describe the operation of PKG_FTP via examples. In the first example, taken from the test01_dir_listing script, we’ll simply open a connection to the remote host and obtain a directory listing. The script is:

DECLARE
   -- FTP connection details. Edit as appropriate
   c_site       CONSTANT VARCHAR2(100) := 'localhost';
   c_port       CONSTANT NUMBER        := 21;
   c_username   CONSTANT VARCHAR2(100) := 'anonymous';
   c_password   CONSTANT VARCHAR2(100) := 'anonymous';
   
   l_connection NUMBER;
   l_file_list t_text_tab; -- stores the directory listing

   -- a simple routine for outputting a T_TEXT_TAB variable
   -- to DBMS_OUTPUT
   PROCEDURE output_text_tab (p_text_tab IN t_text_tab)
   AS
   BEGIN
      IF (p_text_tab IS NOT NULL) THEN
         FOR i IN 1..p_text_tab.COUNT
         LOOP
            dbms_output.put_line (p_text_tab(i));
         END LOOP;
      ELSE
         dbms_output.put_line ('No data to output');
      END IF;
   END output_text_tab;

BEGIN
   -- Create a new FTP connection
   l_connection := pkg_ftp.create_connection (p_site      => c_site
                                             ,p_port      => c_port
                                             ,p_username  => c_username
                                             ,p_password  => c_password);

   -- Open the FTP connection                                             
   pkg_ftp.open_connection (l_connection);
   
   -- Get a file listing of the remote site and output it
   l_file_list := pkg_ftp.get_remote_file_list(l_connection);
   output_text_tab (l_file_list);

   -- Close the connection
   pkg_ftp.close_connection (l_connection);

   -- Uncomment this next line to display the FTP session log
   --output_text_tab (pkg_ftp.get_session_log (l_connection));

EXCEPTION
   WHEN OTHERS THEN
      -- Something bad happened. Dump the FTP session log, the
      -- Oracle error and close the FTP connection
      output_text_tab (pkg_ftp.get_session_log (l_connection));
      IF (pkg_ftp.connection_open (l_connection)) THEN
         pkg_ftp.close_connection (l_connection);
      END IF;
      dbms_output.put_line (dbms_utility.format_error_backtrace);
      RAISE;                                         
END;
/

PKG_FTP operates using connection handles, much like UTL_FILE operates with file handles. In the above script that’s stored in the variable l_connection and is assigned by a call to the routine create_connection. Consider this routine to be similar to a constructor routine in OO languages. Once we have our connection handle and have set the connection properties then we can open the connection, via a call to open_connection. Obtaining a directory listing is done using get_remote_file_list, which returns a collection of type t_text_tab. The local routine output_text_tab simply writes the directory listing out using dbms_output. Afterwards we clean up by calling close_connection.

PKG_FTP retains a log of the operations performed for a connection. This log can be retrieved via a call to get_session_log. The return data type is t_text_tab, just like the directory listing data type. The log is very valuable when debugging problems with PKG_FTP.

Let’s now move on to a simple GET file operation, which is an abbreviated version of the test03_file_system_get.sql script:

DECLARE
   -- FTP connection details. Edit as appropriate
   c_site       CONSTANT VARCHAR2(100) := 'localhost';
   c_port       CONSTANT NUMBER        := 21;
   c_username   CONSTANT VARCHAR2(100) := 'anonymous';
   c_password   CONSTANT VARCHAR2(100) := 'anonymous';
   
   c_file_name  CONSTANT VARCHAR2(100) := 'text_file_02.txt';
   c_remote_dir CONSTANT VARCHAR2(100) := '/public/data';
   c_local_dir  CONSTANT VARCHAR2(30)  := 'DEV_DATA';

   l_connection NUMBER;
   
   -- a simple routine for outputting a T_TEXT_TAB variable
   -- to DBMS_OUTPUT
   PROCEDURE output_text_tab (p_text_tab IN t_text_tab)
   AS
   BEGIN
      IF (p_text_tab IS NOT NULL) THEN
         FOR i IN 1..p_text_tab.COUNT
         LOOP
            dbms_output.put_line (p_text_tab(i));
         END LOOP;
      ELSE
         dbms_output.put_line ('No data to output');
      END IF;
   END output_text_tab;

BEGIN
   -- Create a new FTP connection
   l_connection := pkg_ftp.create_connection (p_site      => c_site
                                             ,p_port      => c_port
                                             ,p_username  => c_username
                                             ,p_password  => c_password);

   -- Open the FTP connection                                             
   pkg_ftp.open_connection (l_connection);
   
   -- Set the transfer to ASCII
   pkg_ftp.ascii (l_connection);
   
   -- Change the remote directory
   pkg_ftp.set_remote_dir (l_connection, c_remote_dir);
   
   pkg_ftp.get (p_connection       => l_connection
               ,p_remote_filename  => c_file_name
               ,p_local_dir        => c_local_dir
               ,p_local_filename   => c_file_name);

   -- Close the connection
   pkg_ftp.close_connection (l_connection);
END;
/

The new routines introduced here are:

  • ascii: which sets the transfer mode to ASCII (text), as opposed to binary mode
  • set_remote_dir: this changes the active directory on the remote site
  • get: this procedure is overloaded in PKG_FTP. This version specified the connection, the remote file name, the local directory object name and the local file name. The routine GETs the remote file and writes it out to the local file details specified.

PKG_FTP provides a pipelined GET function. Test script test04_plsql_pipeline_get.sql demonstrates this function. Furthermore an overloaded version of the GET function exists that permit all the remote file details to be specified as parameters so a file can be retrieved and queried direct from SQL, as the script test05_sql_pipeline_get.sql illustrates:

SELECT f.column_value AS file_text
FROM   TABLE(pkg_ftp.get ('localhost', 21, 'anonymous','anonymous','/public/data/text_file_02.txt')) f
/

The commonly used FTP command MGET is supported for file system transfers, as shown in the script test07_file_system_mget.sql. Unlike the file system GET procedure there’s no opportunity to rename the files during transfer. An SQL pipelined version of MGET can be achieved by calling get_file_list and using a lateral join to obtain the contents of each file returned, as the script test08_sql_pipeline_dir_and_get.sql demonstrates:

SELECT t.file_name
,      f.*
FROM   (SELECT d.column_value AS file_name
        FROM   TABLE(pkg_ftp.get_file_list ('localhost', 21, 'anonymous','anonymous','*.txt')) d) t
,      TABLE(pkg_ftp.get ('localhost', 21, 'anonymous','anonymous',t.file_name)) f
/

FTP PUT operations are similar to the GET operations. A local file system PUT operation can be made using:

   pkg_ftp.put (p_connection       => l_connection
               ,p_local_filename   => c_file_name
               ,p_local_dir        => c_local_dir
               ,p_remote_filename  => c_file_name);

as script test09_file_system_put.sql shows. However, PKG_FTP supports direct the transfer of cursors, as script test13_cursor_put.sql illustrates:

   -- Open the cursor we're going to output
   OPEN l_cursor
   FOR
   SELECT *
   FROM   dba_objects
   WHERE  ROWNUM <= 2000;

   -- Put the remote file   
   pkg_ftp.put (p_connection       => l_connection
               ,p_cursor           => l_cursor
               ,p_remote_filename  => c_file_name
               ,p_column_headers   => TRUE);

PKG_FTP handballs the conversion of the cursor to PKG_CSV to format each row returned by the cursor prior to sending the data to the remote host. As such all the limitations about supported data types in PKG_CSV also apply to PKG_FTP.

A series of miscellaneous FTP commands against the remote host are also handled:

  • Delete file (procedure delete)
  • Rename file (procedure rename)
  • Create directory (procedure mkdir)
  • Remove directory (procedure rmdir)
  • Get file size (procedure get_size)
  • Get file modifcation date (procedure get_mod_time)

One last point to note about the operation of PKG_FTP is that FTP transfer operations are tracked using session longops. The operation name will start with “FTP” and the target will reflect the file name if possible. This mechanism is useful for tracking the progress of large transfers. A simple query for the FTP operations would be:

SELECT * 
FROM   v$session_longops 
WHERE  opname LIKE 'FTP%';

Internals

The following attempts to describe some of the internals of PKG_FTP to help anyone trying to modify or debugging it.

The global variable g_ftp_conn_tab contains the FTP handles. This variable is an associative array of t_ftp_conn_rec, which is a record type containing all the handle details. The handle number itself is simply the element number into the associative array. The parameters of the create_connection procedure pretty much correspond to the elements of the t_ftp_conn_rec record type. “Getter” and “setter” routines have been provided for the various connection parameters, however some of them cannot be changed after the FTP connection has ben established.

Two separate TCP connections are tracked; the main FTP command connection (typically on port 21) and the data transfer connection. The data transfer connection is opened and closed according to the FTP command being processed (passive mode transfers).

When first referenced PKG_FTP initialises a collection named g_ftp_response_msg_tab, in the routine init_ftp_response_msg_tab. This is an associative array of records that contains the FTP response messages and a flag as to whether it is an error condition, i.e. a response code of 400 or greater. This collection is indexed by the FTP response code. The routine process_ftp_response, which is a generic FTP response handler, uses this collection to check for and raise FTP response exceptions.

The core of issuing an FTP command is done in the following way:

  • Command is issued by calling send_cmd, specifying the connection and the text of the command, which returns a TCP response object
  • send_cmd calls the routine write_cmd, which sends the command to the remote host
  • send_cmd then retrieves the FTP command response via a call to get_cmd_response
  • get_cmd_response in turn gets all text back from the remote host, formats the response into a TCP response object and writes out the FTP connection log before returning the TCP response object back to send_cmd
  • send_cmd then runs the TCP response object through process_ftp_response to raise any FTP exceptions before passing the response object back to the caller
  • The caller can then choose to either ignore the FTP response or it can act on the specifics of the response content

Potentially the above is over-engineered but it does mean that the interface for FTP commands quite easy. For example, the send_username routine is simply:

PROCEDURE send_username (p_connection  IN NUMBER)
AS
   l_tcp_response t_ftp_response;
BEGIN
   send_cmd (p_tcp_response  => l_tcp_response
            ,p_connection    => p_connection
            ,p_cmd           => 'USER ' || g_ftp_conn_tab(p_connection).username);
END send_username;   

as the response is ignored. In this scenario it is assumed that if no exception is raised then all must be okay. Potentially every routine could check the a correct FTP response code but I haven’t adopted that strategy (yet).

Exception handling in PKG_FTP is quite simple. A number of exceptions error numbers and messages are defined in the PKG_FTP specification. Exceptions with these details are raised in various situations. Many routines can only be invoked if the FTP connection is open or closed. Invoking a routine with the connection in the wrong state will raise an exception numbered c_connection_open_errno or c_connection_closed_errno. Attempting to perform an operation on an invalid connection will raise an exception numbered c_connection_closed_errno. If an FTP operation returns an FTP error response (response code 400 or greater) then an exception numbered c_operation_failed_errno is returned. Failure to connect to a remote site will result in an exception numbered c_connection_failed_errno.

If the calling routine wants to intercept the FTP error and process it somehow, as opposed to simply passing it up the call chain, then the functions get_last_response_code and get_last_response_msg are provided. These routines return the last FTP response code and message respectively and can be used regardless of whether there was an exception raised by PKG_FTP or not.

Bugs and Other Issues

PKG_FTP as ben tested against a handful of FTP servers only. It may be that flaws with how it operates only show against certain servers. For example, one server would not permit getting file sizes unless the trasnfer mode was set to ASCII. I haven’t confirmed if this behaviour is actually FTP standard but I simply modified the get_size routine to always operate in ASCII mode. I’m more than happy to hear of these flaws. Time permitting I will try to fix any that I am told about.

There is little logic concerning interpretation of the FTP response. The basic assumption is that if the response is not an error reponse then the operation has succeeded. The specific response is only checked in the open_connection routine to see if a username and password is required.

One area that gives trouble is the transfer time-out of UTL_TCP. It would seem that if no time-out is set when the TCP connection is opened then Oracle will wait for a period of time before deciding to give up with the utl_tcp.end_of_input exception. On my Windows notebook, this timeout is about 40 seconds. This isn’t good when you want to read all remaining data from a connection. I have tried using utl_tcp.available and also reading all lines at once with utl_tcp.get_text and all seem to have the same trouble. The easiest solution seems to be setting to time-out on the connection to a small value, such as 1 second as my test scripts do. Ideally it would be good to get an immediate response that there is no more data from the connection and the program can move on immediately. If anyone has a solution to this or can see what I’m doing wrong then I would appreciate it if you drop me a note.

PKG_FTP Test Scripts

The following PKG_FTP test scripts have been provided in the scripts download area:

  • test01_dir_listing.sql – displays a directory listing from the remote host
  • test02_clob_get – GETs a remote text file into a CLOB and displays it
  • test03_file_system_get – GETs a remote text file and saves it to a local file
  • test04_plsql_pipeline_get.sql – GETs a remote text file into a nest table collection via the pipeline GET function
  • test05_sql_pipeline_get – GETs a remote text file within a standalone SQL statement via a pipeline GET function
  • test06_file_system_binary_get – GETs a remote binary file and saves it to a local file
  • test07_file_system_mget – GETs multiple text files and saves them as local files
  • test08_sql_pipeline_dir_and_get – reads a remote text file directory listing and GETs the files all within a single SQL statement
  • test09_file_system_put – PUTs a text file onto a remote host
  • test10_file_system_binary_put – PUTs a binary file onto a remote host
  • test11_clob_put – PUTs a CLOB onto a remote host as a text file
  • test12_file_system_binary_put – reads a local binary file into a BLOB and then PUTs the BLOB onto a remote host
  • test13_cursor_put – opens a cursor and sends the content to a remote host as a CSV file
  • test14_simple_cursor_put – opens a cursor and sends to content to a remote host as a CSV file
  • test15_delete_file – deletes a file on the remote host
  • test16_rename_file – renames a file on the remote host
  • test17_create_directory – creates a directory on the remote host
  • test18_remove_directory – removes a directory on the remote host
  • test19_get_file_details – obtain the file size and last update date/time of a remote file
  • test20_ftp_exception – demonstrates a typical exception that is raised by PKG_FTP

PKG_FTP for Oracle 10g (and earlier)
As noted above, PKG_FTP requires Oracle 11g as it uses PKG_CSV, which has a dependency on 11g functionality. For earlier versions of Oracle I have modified PKG_FTP to remove the calls to PKG_CSV. This impacts the two cursor PUT routines. In the non-11g version of PKG_FTP, the cursor PUT routines are sill present but require that the cursor returns a single VARCHAR2 column. So, it is up to the caller to perform the concatenation of the columns that PKG_CSV would normally do.

The following files are in the scripts area:

  • pkg_ftp_10g – PKG_FTP package specification for pre-11g systems
  • pkg_ftp_10g – PKG_FTP package body for pre-110g systems
  • test25_simple_cursor_put_10g.sql – a test script for a cursor PUT using the pre-11g version of PKG_FTP

Download the various PKG_FTP scripts.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s