Sparse Collections

Anyone familiar with programming languages will know of the array data type. Oracle PL/SQL supports 3 different array data types: associative arrays, nested tables and varrays. The Oracle database engine recognises only two of these data types, nested tables and varrays. Associative arrays are PL/SQL constructs only.

A common way to populate collections is to SELECT…BULK COLLECT into them, as the following example illustrates:

DECLARE
   -- define a nested table type and variable of that type
   TYPE t_emp_tab IS TABLE OF emp%ROWTYPE;
   l_emp_tab t_emp_tab;
BEGIN
   -- populate the collection
   SELECT e.*
   BULK COLLECT INTO l_emp_tab
   FROM emp e;

   -- display the names in the collection
   FOR i IN 1..l_emp_tab.LAST
   LOOP
      dbms_output.put_line (l_emp_tab(i).ename);
   END LOOP;
END;
/

which simply displays the employee names from the EMP table.

SMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD
MILLER

PL/SQL procedure successfully completed.

The above example uses a nested table. We can do exactly the same with an associative array:

DECLARE
   -- define an associative array type and variable of that type
   TYPE t_emp_tab IS TABLE OF emp%ROWTYPE INDEX BY PLS_INTEGER;
   l_emp_tab t_emp_tab;
BEGIN
   -- populate the collection
   SELECT e.*
   BULK COLLECT INTO l_emp_tab
   FROM emp e;

   -- display the names in the collection
   FOR i IN 1..l_emp_tab.LAST
   LOOP
      dbms_output.put_line (l_emp_tab(i).ename);
   END LOOP;
END;
/

and get the same result.

So, why this discussion on collections? Well, I nearly introduced a hard to spot bug into an application recently. In both the examples above you will notice that the FOR LOOP counter, i, iterated through the values 1 to the l_emp_tab.LAST to output the details from the collection. The logic here is that each iteration of the loop will gives an index that can be used to access an element in the collection…

… but that logic isn’t quite correct as it assumes that the collection index starts at 1 and that each element will be sequencially numbered such that l_emp_tab.LAST gives the highest index in the collection. This will only hold true when the collection is dense, i.e. has no gaps in the indicies of the collection. In this particular situation we can feel safe to make this assumption as the SELECT…BULK COLLECT populates the collection in this manner.

A collection with gaps in the indicies is known as a sparse collection. We can see what would happen to our example script if the collection was a sparse collection by simply removing an element from the collection between the SELECT and FOR loop statements:

DECLARE
   -- define a nested table type and variable of that type
   TYPE t_emp_tab IS TABLE OF emp%ROWTYPE;
   l_emp_tab t_emp_tab;
BEGIN
   -- populate the collection
   SELECT e.*
   BULK COLLECT INTO l_emp_tab
   FROM emp e;
   
   -- remove the fifth element
   l_emp_tab.DELETE(5);

   -- display the names in the collection
   FOR i IN 1..l_emp_tab.LAST
   LOOP
      dbms_output.put_line (l_emp_tab(i).ename);
   END LOOP;
END;
/

which results in:

SMITH
ALLEN
WARD
JONES
DECLARE
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at line 17

So, back to the bug I nearly introduced. The code I was dealing with was rather more complex than the scenario above. A routine in one package populated a nested table collection. This collection was part of a larger data structure and was passed back out of the package to the caller. Further along in the code the data structure was passed to another routine in yet a totally different package that iterated through the collection using a FOR loop in a manner similar to the examples above, i.e. it assumed that the collection was dense.

The code change I was required to make was to remove from the original collection elements that satisfied rather specific criteria. Generally there would not be any elements removed but in rare situations it could occur. My initial solution was to do exactly as I did in the last example above and use the DELETE method to remove the necessary elements. However, realising that this would result in a sparse collection I decided to check whether this would have unintended consequences and that was when I discovered the problem.

So, while I nearly introduced a bug, the assumption by the later routine that the collection provided would be dense is also a type of bug… something I would describe as “a bug in waiting”. Anyone familiar with the principles of coupling and cohesion as they apply to structured programming will understand this. In this situation the coupling between the routine that populated the collection and the routine that consumed the collection went beyond the collection data type itself by the assumption that the collection would be dense. This kind of assumption isn’t represented in the data structure itself and is therefore hard to detect.

So, if a collection is sparse, how can it be processed? One solution is to use the FIRST and NEXT methods to control the loop structure:

DECLARE
   -- define a nested table type and variable of that type
   TYPE t_emp_tab IS TABLE OF emp%ROWTYPE;
   l_emp_tab t_emp_tab;
   i INTEGER;
BEGIN
   -- populate the collection
   SELECT e.*
   BULK COLLECT INTO l_emp_tab
   FROM emp e;
   
   -- remove the fifth element
   l_emp_tab.DELETE(5);
   
   -- obtain the first index in the collection
   i := l_emp_tab.FIRST;

   LOOP
      -- explicitly exit when we've run out of elements in the collection
      EXIT WHEN l_emp_tab.NEXT(i) IS NULL;
      dbms_output.put_line (l_emp_tab(i).ename);
      -- move from the current element to the next one
      i := l_emp_tab.NEXT(i);
   END LOOP;
END;
/

which gives us the required output:

SMITH
ALLEN
WARD
JONES
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD

PL/SQL procedure successfully completed.

The ideal solution to the code I was dealing with would have been to rewrite the routine that consumed the collection to handle sparse collections. This approach would result in a more robust code as the unnecessary assumption that the collection needs to be dense has been removed. However, I decided to take the approach of ensuring the collection passed back by the producer was always dense (I believe “chickened out” is the appropriate term). This seemed a more prudent choice in the interests of managing the overall scale of change required, although it does mean that the “bug in waiting” is still present. Hopefully the next developer to touch that bit of code will read the comments that I have left behind though…

Advertisements

One thought on “Sparse Collections

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s