Using Oracle Table Compression
In a data warehouse database it is always likely to find some static data portion (usually a large portion) . This means that this portion of data will not be updated any more but rather it will be queried. In this post you will find the benefit of compressing your historical segments in a data warehouse environment.
Cost of creating a compressed table
It is usually known that creating compressed tables requires more time. Let me show why this is the case. We prepare a source table called big_table_new using Thomas Kyte’s infamous script (http://www.apress.com/book/downloadfile/2351). Then we create two identical tables, except that one is compressed and the other is not. Moreover we use runstats_pkg package to compare two executions of CTAS:
|hsensoy@dds> set timing onhsensoy@dds> set autot tracehsensoy@dds> exec runstats_pkg.rs_start;PL/SQL procedure successfully completed.
hsensoy@dds> create table big_table_nocomp pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 as select * from big_table_new;
hsensoy@dds> exec runstats_pkg.rs_middle;
PL/SQL procedure successfully completed.
hsensoy@dds> create table big_table_comp compress pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 as select * from big_table_new;
hsensoy@dds> exec runstats_pkg.rs_stop;
Run1 ran in 9974 hsecs
Run2 ran in 14599 hsecs
run 1 ran in 68,32% of the time
Name Run1 Run2 Diff
STAT…Elapsed Time 10,046 14,670 4,624
STAT…user I/O wait time 57,814 8,085 -49,729
STAT…CPU used by this sessio 117,697 333,567 215,870
STAT…recursive cpu usage 117,615 333,497 215,882
STAT…physical writes direct 2,653,136 956,400 -1,696,736
STAT…undo change vector size 6,478,648 4,326,472 -2,152,176
STAT…redo size 33,033,884 18,905,292 -14,128,592
You see only a small portion of runstats_pkg results. As we look at the results, we immediately understand that the reason of compressed table creation to take longer time than uncompressed table creation. It is due to CPU time. Compression algorithm requires a significant computation time (x2 CPU time). But notice that since it creates a smaller data segment, it generates less redo, consumes less undo space, and performs less I/O requests.
How much to compress?
This is directly related to level of REDUNDANCY within a data block. If the rows with in a data block contains many common value columns this means the benefit you get from compression will be higher. Let’s show this. To make a redundancy prediction, we first analyze the BIG_TABLE_NEW from which we load the data. This is for filling the necessary columns of dba_tab_cols view, which we use later on.
|hsensoy@dds> begindbms_stats.gather_table_stats(ownname => ‘HSENSOY’,
tabname => ‘BIG_TABLE_NEW’,
degree => 32);
PL/SQL procedure successfully completed.
Remember that we will make only a rough prediction(later you will see that it works pretty good) based on the fact that sortting data such according to most redundant columns (least distinct ones) will create table whose blocks contains highly redundant data. As we look at the dba_tab_cols table statistics we immediately see that some columns are significantly redundant.
|hsensoy@dds> select dtc.column_name, dtc.num_distinct, dtc.avg_col_lenfrom dba_tab_cols dtc
where dtc.owner = ‘HSENSOY’ and dtc.table_name = ‘BIG_TABLE_NEW’
order by dtc.num_distinct,dtc.avg_col_len desc;
COLUMN_NAME NUM_DISTINCT AVG_COL_LEN
————— —————- ———–
STATUS 1 6
SECONDARY 1 2
GENERATED 2 2
TEMPORARY 2 2
OWNER 17 6
OBJECT_TYPE 31 10
SUBOBJECT_NAME 650 2
CREATED 756 8
TIMESTAMP 761 20
LAST_DDL_TIME 767 8
DATA_OBJECT_ID 5398 2
OBJECT_NAME 27059 24
OBJECT_ID 50581 5
ID 200043516 7
14 rows selected.
Then we create the table using CTAS and we sort the data according to column order we obtain in previous step.
|hsensoy@dds> create table big_table_careful_comp compress pctfree 1 nologging tablespace LMT_AUTO_NEW parallel 32 asselect *
order by status,
And ta taa!!! Same data and same compression algorithm but only 5 GB with compared to randomly ordered data.
|hsensoy@dds> column segment_name format a30
hsensoy@dds> select segment_name, trunc(bytes / 1024 / 1024 / 1024,2) gigabytes, blocks, extents
where owner = ‘HSENSOY’
and segment_name in (‘BIG_TABLE_COMP’, ‘BIG_TABLE_CAREFUL_COMP’);
SEGMENT_NAME GIGABYTES BLOCKS EXTENTS
—————————— ———- ———- ———-
BIG_TABLE_COMP 14,67 961944 4129
BIG_TABLE_CAREFUL_COMP 5,27 345744 2793
So remember that in a data warehouse database, in contrast to an operational one, it may be possible to load data sorted into tables in ETL step. You have to take into consideration of the fact that sorted data is much more compressible.
In the previous section we see that compressed table (or partition, subpartition,index) segments require less space on disk. One may infer that this will improve query performance. It’s true. Let’s see how
|hsensoy@dds> select count(*) from big_table_nocomp;
hsensoy@dds> select count(*) from big_table_comp;
hsensoy@dds> select count(*) from big_table_careful_comp;
Remember that timing may be misleading. To see that compression really works let’s look closer:
|No Compression||Compress Unsorted Data||Compress Sorted Data|
Yes, querying a compressed table really works well in terms of I/O and CPU usage.
To sum up, we may conclude that data warehouse database may utilize Oracle table compression option by making a correct physical database design. That’s because their fundamental characteristics is being redundant/denormalized. So in order to improve your query performance, storage efficiency and in return shorten your backup & recovery timings you HAVE TO carefully think on using compression option.
Finally, I introduce some practical practics on using compression:
- Separate your OLD data from the NEW one in terms of tablespace whenever it’s possible.
- Instead of loading data into compressed tables. You load the data swiftly into a uncompressed segment and then recreate segment periodically as compressed using alter table statements clause.
- Keep your compressed segments in readonly tablespaces whenever it is possible to prevent compressed blocks from update operations. Update cause block to be decompressed decreasing the effect of compression.
- Tables range partitioned according to a date column are usually best candidates for partition/subpartition compression. Write scheduled scripts to perform segment compression.
- Talk with your business requirement, development and administration team to identify which partition of data is read-only and suitable for compression (For example, six month old data is static)