Index Compression

Whilst preparing this post I noticed that I’ve been beaten to it by Connor McDonald’s recent post on index compression. Since my post was virtually ready I’m going to post it anyway but make sure to read Connor’s and follow the links to Richard Foote’s articles too.

Index compression is a feature that’s been around since something like Oracle 8. With index compression, if the index has repeating values for the leading columns (or “attributes” since we can index expressions… I’ll continue to use the term “columns” for the rest of the post though) of the index then Oracle is able to optimise the storage of the index entries and fit more entries per block thereby producing an index that has a smaller on disk footprint.

The syntax for compressing an index is:

CREATE INDEX <index_name> 
   ON <table_name> (<columns>)
   COMPRESS <n>

where n is the number of leading columns to compress. Obviously n cannot be a number greater than there are columns in the index and it may not be worth compressing on all columns so n is often less than the number of columns in the index.

“How much smaller will the index be?” I hear you ask. Well, as with many things results will vary. Thankfully Oracle will tell you what you can expect to achieve so you can make a decision as to whether to proceed or not before actually performing the operation.

Let’s take a look at how we would go about assessing an index for compression. We’ll start with a table containing 4 million rows:

CREATE TABLE index_demo
    (id         NUMBER(10)  NOT NULL
    ,val1       NUMBER(4) NOT NULL
    ,val2       NUMBER(4) NOT NULL)
/

INSERT INTO index_demo
WITH r AS (SELECT ROWNUM rn FROM dual CONNECT BY ROWNUM <= 2000)
SELECT ROWNUM
,      MOD(ROWNUM,100)
,      TRUNC(ROWNUM/1001)
FROM   r, r
/

COMMIT
/

EXEC dbms_stats.gather_table_stats ('','index_demo')

With this table, ID is unique, VAL1 has 100 distinct values and VAL2 has 1001 distinct values. Next we create 4 different indexes; two non-unique and two unique:

CREATE INDEX index_demo_i1
   ON index_demo (val1)
/

CREATE INDEX index_demo_i2
   ON index_demo (val2, val1)
/

CREATE UNIQUE INDEX index_demo_i3
   ON index_demo (val1, id)
/

CREATE UNIQUE INDEX index_demo_i4
   ON index_demo (id, val2)
/

Next we’ll get some basic stats for our indexes:

SELECT index_name
,      blevel
,      leaf_blocks
,      distinct_keys
,      avg_leaf_blocks_per_key AS avg_lf_key
,      num_rows
FROM   user_ind_statistics
WHERE  table_name = 'INDEX_DEMO'
ORDER  BY
       index_name
/

On my 12.1.0.2 database with an 8 kb block size, this gave the following:

INDEX_NAME       BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LF_KEY   NUM_ROWS
--------------- ------- ----------- ------------- ---------- ----------
INDEX_DEMO_I1         2        7792           100         77    4000000
INDEX_DEMO_I2         2       10006        399605          1    4000000
INDEX_DEMO_I3         2       10431       4000000          1    4000000
INDEX_DEMO_I4         2       10982       4000000          1    4000000

The key metric we’re interested in is the LEAF_BLOCKS. When we compress an index we are wanting to see a reduction in that. Oracle will tell us if we analyse the index and look at the index statistics gathered:

ANALYZE INDEX index_demo_i1 VALIDATE STRUCTURE
/

SELECT opt_cmpr_count
,      opt_cmpr_pctsave
FROM   index_stats
/

The two measures from INDEX_STATS are the recommended number of columns to compress and the percent saving in space that can be expected. Repeating the above for each index we end up with:

INDEX_NAME       OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
--------------- --------------- ----------------
INDEX_DEMO_I1                 1               21
INDEX_DEMO_I2                 2               31
INDEX_DEMO_I3                 1               15
INDEX_DEMO_I4                 0                0

So, from the above, Oracle is recommending we use COMPRESS 1 for index INDEX_DEMO_I1 and we should see a 21% reduction in space used by our index. Compression on the first of our unique indexes, INDEX_DEMO_I3, is still possible as the first column has repeating values. Oracle is saying we should see a 15% reduction in space usage. Our second unique index, INDEX_DEMO_I4, leads with a column that is unique and so Oracle says that compression isn’t worthwhile on that index (note, it’s actually still possible to apply compression to the index but you will end up increasing the size of the index!).

Now let’s test out Oracle’s recommendations and see how accurate they are. We could simply drop and recreate the indexes with the compress option but we can also rebuild them on-line, which is great for a system with high availability requirements.

ALTER INDEX index_demo_i1
   REBUILD
   COMPRESS 1
   ONLINE
/   

ALTER INDEX index_demo_i2
   REBUILD
   COMPRESS 2
   ONLINE
/   

ALTER INDEX index_demo_i3
   REBUILD
   COMPRESS 1
   ONLINE
/   

Rerunning our query against USER_IND_STATISTICS we get:

INDEX_NAME       BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LF_KEY   NUM_ROWS
--------------- ------- ----------- ------------- ---------- ----------
INDEX_DEMO_I1         2        6145           100         61    4000000
INDEX_DEMO_I2         2        6863        399605          1    4000000
INDEX_DEMO_I3         2        8782       4000000          1    4000000
INDEX_DEMO_I4         2       10982       4000000          1    4000000

Some quick calculations show the compression actually achieved:

INDEX_NAME       PCT COMPRESSION
--------------- ----------------
INDEX_DEMO_I1                 21
INDEX_DEMO_I2                 30
INDEX_DEMO_I3                 16

We can see that Oracle’s estimates of compression were quite accurate.

Index compression is a feature that can result in a significant space saving for any moderately large database. By the time the space saving is multiplied over development and test environments and back-up space it’s pretty much a no-brainer to use it whereever possible.