Using Oracle Table Compression


In a data warehouse database it is always likely to find some static data portion (usually a large portion) . This means that this portion of data will not be updated any more but rather it will be queried. In this post you will find the benefit of compressing your historical segments in a data warehouse environment.

Cost of creating a compressed table

It is usually known that creating compressed tables requires more time. Let me show why this is the case. We prepare a source table called big_table_new using Thomas Kyte’s infamous script (http://www.apress.com/book/downloadfile/2351). Then we create two identical tables, except that one is compressed and the other is not. Moreover we use runstats_pkg package to compare two executions of CTAS:

hsensoy@dds> set timing onhsensoy@dds> set autot tracehsensoy@dds> exec runstats_pkg.rs_start;PL/SQL procedure successfully completed.

Elapsed: 00:00:00.73

hsensoy@dds> create table big_table_nocomp pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 as select * from big_table_new;

Table created.

Elapsed: 00:00:59.49

hsensoy@dds> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.73

hsensoy@dds> create table big_table_comp compress pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 as select * from big_table_new;

Table created.

Elapsed: 00:01:49.07

hsensoy@dds> exec runstats_pkg.rs_stop;

Run1 ran in 9974 hsecs

Run2 ran in 14599 hsecs

run 1 ran in 68,32% of the time

Name Run1 Run2 Diff

STAT…Elapsed Time 10,046 14,670 4,624

STAT…user I/O wait time 57,814 8,085 -49,729

STAT…CPU used by this sessio 117,697 333,567 215,870

STAT…recursive cpu usage 117,615 333,497 215,882

STAT…physical writes direct 2,653,136 956,400 -1,696,736

STAT…undo change vector size 6,478,648 4,326,472 -2,152,176

STAT…redo size 33,033,884 18,905,292 -14,128,592

You see only a small portion of runstats_pkg results. As we look at the results, we immediately understand that the reason of compressed table creation to take longer time than uncompressed table creation. It is due to CPU time. Compression algorithm requires a significant computation time (x2 CPU time). But notice that since it creates a smaller data segment, it generates less redo, consumes less undo space, and performs less I/O requests.

How much to compress?

This is directly related to level of REDUNDANCY within a data block. If the rows with in a data block contains many common value columns this means the benefit you get from compression will be higher. Let’s show this. To make a redundancy prediction, we first analyze the BIG_TABLE_NEW from which we load the data. This is for filling the necessary columns of dba_tab_cols view, which we use later on.

hsensoy@dds> begindbms_stats.gather_table_stats(ownname => ‘HSENSOY’,

tabname => ‘BIG_TABLE_NEW’,

estimate_percent =>dbms_stats.AUTO_SAMPLE_SIZE,

degree => 32);

end;

/

PL/SQL procedure successfully completed.

Remember that we will make only a rough prediction(later you will see that it works pretty good) based on the fact that sortting data such according to most redundant columns (least distinct ones) will create table whose blocks contains highly redundant data. As we look at the dba_tab_cols table statistics we immediately see that some columns are significantly redundant.

hsensoy@dds> select dtc.column_name, dtc.num_distinct, dtc.avg_col_lenfrom dba_tab_cols dtc

where dtc.owner = ‘HSENSOY’ and dtc.table_name = ‘BIG_TABLE_NEW’

order by dtc.num_distinct,dtc.avg_col_len desc;

COLUMN_NAME NUM_DISTINCT AVG_COL_LEN

————— —————- ———–

STATUS 1 6

SECONDARY 1 2

GENERATED 2 2

TEMPORARY 2 2

OWNER 17 6

OBJECT_TYPE 31 10

SUBOBJECT_NAME 650 2

CREATED 756 8

TIMESTAMP 761 20

LAST_DDL_TIME 767 8

DATA_OBJECT_ID 5398 2

OBJECT_NAME 27059 24

OBJECT_ID 50581 5

ID 200043516 7

14 rows selected.

Then we create the table using CTAS and we sort the data according to column order we obtain in previous step.

hsensoy@dds> create table big_table_careful_comp compress pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 asselect *

from big_table_new

order by status,

secondary,

generated,

temporary,

owner,

object_type,

subobject_name,

created,

timestamp,

last_ddl_time,

data_object_id,

object_name,

object_id;

Table created.

And ta taa!!! Same data and same compression algorithm but only 5 GB with compared to randomly ordered data.

hsensoy@dds> column segment_name format a30

hsensoy@dds> select segment_name, trunc(bytes / 1024 / 1024 / 1024,2) gigabytes, blocks, extents

from dba_segments

where owner = ‘HSENSOY’

and segment_name in (‘BIG_TABLE_COMP’, ‘BIG_TABLE_CAREFUL_COMP’);

SEGMENT_NAME GIGABYTES BLOCKS EXTENTS

—————————— ———- ———- ———-

BIG_TABLE_COMP 14,67 961944 4129

BIG_TABLE_CAREFUL_COMP 5,27 345744 2793

So remember that in a data warehouse database, in contrast to an operational one, it may be possible to load data sorted into tables in ETL step. You have to take into consideration of the fact that sorted data is much more compressible.

Query Performance

In the previous section we see that compressed table (or partition, subpartition,index) segments require less space on disk. One may infer that this will improve query performance. It’s true. Let’s see how

hsensoy@dds> select count(*) from big_table_nocomp;

Elapsed: 00:00:28.99

hsensoy@dds> select count(*) from big_table_comp;

Elapsed: 00:00:11.75

hsensoy@dds> select count(*) from big_table_careful_comp;

Elapsed: 00:00:06.18

Remember that timing may be misleading. To see that compression really works let’s look closer:

No Compression Compress Unsorted Data Compress Sorted Data
Elapsed Time 873.15 371.51 169.13
CPU Time 432.50 200.30 92.99
Wait Time 440.64 171.21 76.14
Disk Reads 2653175 928355.67 342878.00

Yes, querying a compressed table really works well in terms of I/O and CPU usage.

Conclusion

To sum up, we may conclude that data warehouse database may utilize Oracle table compression option by making a correct physical database design. That’s because their fundamental characteristics is being redundant/denormalized. So in order to improve your query performance, storage efficiency and in return shorten your backup & recovery timings you HAVE TO carefully think on using compression option.

Finally, I introduce some practical practics on using compression:

  • Separate your OLD data from the NEW one in terms of tablespace whenever it’s possible.
  • Instead of loading data into compressed tables. You load the data swiftly into a uncompressed segment and then recreate segment periodically as compressed using alter table statements clause.
  • Keep your compressed segments in readonly tablespaces whenever it is possible to prevent compressed blocks from update operations. Update cause block to be decompressed decreasing the effect of compression.
  • Tables range partitioned according to a date column are usually best candidates for partition/subpartition compression. Write scheduled scripts to perform segment compression.
  • Talk with your business requirement, development and administration team to identify which partition of data is read-only and suitable for compression (For example, six month old data is static)

About kocakahin

Just a computer engineer

Posted on February 1, 2008, in Datawarehousing. Bookmark the permalink. 2 Comments.

  1. # Separate your OLD data from the NEW one in terms of tablespace whenever it’s possible.
    # Instead of loading data into compressed tables. You load the data swiftly into a uncompressed segment and then recreate segment periodically as compressed using alter table statements clause.
    # Keep your compressed segments in readonly tablespaces whenever it is possible to prevent compressed blocks from update operations. Update cause block to be decompressed decreasing the effect of compression.

    How can i add more compressed segemnts to read only tablespace ?

    • You can’t. What you will do (periodically) is :
      1. Change your tablespace to read-write mode.
      2. Add necessary segments into it
      3. Change back to readonly.
      4. Take a fresh backup of your tablespace

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: