Columnar compression in TimesTen

In this article I would like to write about Columnar Compression in TimesTen.

Oracle TimesTen 11.2.2 introduced several new features. One of them it’s Columnar Compression. Tables could be compressed at the column level, which stores data more efficiently.

The compression unit in Timesten is a compressed column group. A compressed column group consists of set of columns. You can include one or more columns in compressed column group; however, a column can be included in only one compressed column group. Each compressed column group is limited to a maximum of 16 columns.
For enabling compression feature for particular table specify OPTIMIZED FOR READ on the CREATE TABLE statement. For example:

Command> create table test_1 ( id number not null,
       >                       val_1   varchar2(40))
       > optimized for read;
Command> desc test_1;

Table GENA.TEST_1:
  Columns:
    ID                              NUMBER NOT NULL
    VAL_1                           VARCHAR2 (40) INLINE
  OPTIMIZED FOR READ

1 table found.
(primary key columns are indicated with *)
Command>
Additionally, specify the following:
  • COMPRESS - defines a compressed column group for a table.
  • BY DICTIONARY - defines a compression dictionary for each compressed column group.
  • MAXVALUES - defines the total number of distinct values in the column/columns and set the size for the compressed column group pointer column to 1, 2, or 4 bytes.
For example, create a compression table:
Command> create table emp_comp (id number not null,
       >                        val_1   varchar2(40),
       >                        val_2   varchar2(40),
       >                        val_3   varchar2(40))
       > compress ((val_1,val_2) by dictionary maxvalues = 255,
       >            val_3        by dictionary maxvalues = 255) 
       > optimized for read;
Command>
Command> desc emp_comp;

Table GENA.EMP_COMP:
  Columns:
    ID                              NUMBER NOT NULL
    VAL_1                           VARCHAR2 (40) INLINE
    VAL_2                           VARCHAR2 (40) INLINE
    VAL_3                           VARCHAR2 (40) INLINE
  COMPRESS ( ( VAL_1, VAL_2 ) BY DICTIONARY MAXVALUES=255,
             VAL_3 BY DICTIONARY MAXVALUES=255 ) OPTIMIZED FOR READ

1 table found.
(primary key columns are indicated with *)
Command>
In this example I created a table with two compressed column groups. The first one consisted of two columns VAL_1 and VAL_2 and the second - VAL_3 column. Each column can contain 255 distinct vales (pointer size - 1 byte for each compressed column group). When I defined compressed column group, TimesTen created the dictionary tables (for each compression column group).
Command> alltables;
  GENA.CD$_1086056_2
  GENA.CD$_1086056_4
  GENA.EMP_COMP
  …
28 tables found.
Command>
The data dictionary tables name is constructed by the following:
"CD$" + the table identifier (SYS.TABLES.TBLID) + compressed column number 

Command> select tblname, tblid from sys.tables where tblname = 'EMP_COMP';
< EMP_COMP                       , 1086056 >
1 row found.
Command>
There is some information from data dictionary:
Command> VARIABLE TABNAME VARCHAR2(50) := 'EMP_COMP';
Command> select TABLE_NAME,
       >        COMPRESSION,
       >        COMPRESS_FOR
       >   from sys.all_tables
       >  where TABLE_NAME = :TABNAME;
< EMP_COMP, ENABLED, QUERY HIGH >
1 row found.
Command> select tblname,
       >        numcompress,
       >        valtblids
       >   from sys.tables
       >  where TBLNAME=:TABNAME;
< EMP_COMP                       , 3, 7092100078921000 >
1 row found.
Command>
You can see that EMP_COMP is compressed table, contains three compressed columns and compressed for “QUERY HIGH” purpose. I don’t know how to decrypt the "valtblids" values, but I will find out :)
Each system table contains identical columns you defined in compressed column group plus counter (##CD_REFCNT).
Command> desc CD$_1086056_2;

Table GENA.CD$_1086056_2:
  Columns:
   *VAL_1                           VARCHAR2 (40) INLINE
   *VAL_2                           VARCHAR2 (40) INLINE
    ##CD_REFCNT                     TT_INTEGER NOT NULL

1 table found.
(primary key columns are indicated with *)
Command> desc CD$_1086056_4;

Table GENA.CD$_1086056_4:
  Columns:
   *VAL_3                           VARCHAR2 (40) INLINE
    ##CD_REFCNT                     TT_INTEGER NOT NULL

1 table found.
(primary key columns are indicated with *)
Command>

A dictionary table is created for each compressed column group that contains a column with all the distinct values of the compressed column group. The compressed column group in table contains a pointer (1 byte in this case) to the row in the dictionary table for the appropriate value.
The width of the pointer can be 1, 2, or 4 bytes long depending on the maximum number of entries you defined for the dictionary table. When the user configures the maximum number of distinct entries for the compressed column group, the size of the compressed column group is set as follows:
  • 1 byte for a maximum number of entries of 255 (28-1). When the maximum number is between 1 and 255, the dictionary size is set to 255 (28-1) values and the compressed column group pointer column is 1 byte.
  • 2 bytes for a maximum number of entries of 65,535 (216-1). When the maximum number is between 256 and 65,535, the dictionary size is set to 65,535 (216-1) values and the compressed column group pointer column is 2 bytes.
  • 4 bytes for a maximum number of entries of 4,294,967,295 (232-1). When the maximum number is between 65,536 and 4,294,967,295, the dictionary size is set to 4,294,967,295 (232-1) values and the compressed column group pointer column is 4 bytes. This is the default.
Let’s create a simple test:
Command> create table test2  (id number not null,
       >                      val_1   varchar2(40),
       >                      val_2   varchar2(40),
       >                      val_3   varchar2(40));
Command> begin
       >   for i in 1 .. 1000000 loop
       >     insert into test2 
       >      value (i,'1234567890123456789012345678901234567890',
       >               '1234567890123456789012345678901234567890',       
       >               '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command> call ttComputeTabSizes('test2');
Command> tablesize test2;

Sizes of GENA.TEST2:

  INLINE_ALLOC_BYTES:   171470416
  NUM_USED_ROWS:        1000000
  NUM_FREE_ROWS:        192
  AVG_ROW_LEN:          171
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       69544
  TOTAL_BYTES:          171539960
  LAST_UPDATED:         2012-02-20 08:10:38.000000

1 table found.
Command>

Command> create table test2_comp  (id number not null,
       >                           val_1   varchar2(40),
       >                           val_2   varchar2(40),
       >                           val_3   varchar2(40))
       > compress (val_1 by dictionary maxvalues = 255) optimized for read;
Command> begin
       >   for i in 1 .. 1000000 loop
       >     insert into test2_comp 
       >        values (i,'1234567890123456789012345678901234567890',
       >                  '1234567890123456789012345678901234567890',
       >                  '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   131462736
  NUM_USED_ROWS:        1000000
  NUM_FREE_ROWS:        192
  AVG_ROW_LEN:          131
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       87600
  TOTAL_BYTES:          131550336
  LAST_UPDATED:         2012-02-20 08:12:46.000000

1 table found.
Command>

In this example I created two tables (test2 is uncompressed table and test2_comp is compressed table). In this case I compressed only one column (val_1) in test2_comp table. What can we see in this report? Average row length was decreased from 171 to 131 bytes, because all values for val_1 column store in data dictionary. Let’s calculate.

Total size test2 table: 171539960 bytes
Total size test2_comp table: 131550336 bytes

So, we saved 39989624 bytes, it is an impressive result.
As you see in my test I inserted the same row in test2_comp table one million times. In this situation we can save (40 * 999999) – 1 (pointer_size) = 39999959 bytes (one value stores in data dictionary).
We've got: 39999959 theor - 39989624 real = 10336 bytes. Only 10336 for metadata.
Let’s take a look on memory allocation.
Command> create table test2_comp  (id number not null,
       >                           val_1   varchar2(40),
       >                           val_2   varchar2(40),
       >                           val_3   varchar2(40))
       > compress (val_1 BY DICTIONARY MAXVALUES = 1000000) OPTIMIZED FOR READ;
Command>
Command> insert into test2_comp 
       >    values (1,        
       >            '1234567890123456789012345678901000000000',
       >            '1234567890123456789012345678901234567890',
       >            '1234567890123456789012345678901234567890' );
1 row inserted.
Command> select count(*) from CD$_1086048_2;
< 1 >
1 row found.
Command>
Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   33648
  NUM_USED_ROWS:        1
  NUM_FREE_ROWS:        255
  AVG_ROW_LEN:          205
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          52480
  LAST_UPDATED:         2012-02-20 08:14:29.000000

1 table found.
Command>
INLINE page for TEST2_COMP table was allocated – 33648 bytes and the page for CD$_1086048_2 table was allocated as well (the size is included in METADATA_BYTES row). Let’s insert 257 rows.
Command> begin
       >   for i in 2 .. 257 loop
       >     insert into test2_comp 
       >         values (i,'1234567890123456789012345678901000000000',
       >                   '1234567890123456789012345678901234567890',
       >                   '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command> select count(*) from CD$_1086048_2;
< 1 >
1 row found.
Command>

Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   67296
  NUM_USED_ROWS:        257
  NUM_FREE_ROWS:        255
  AVG_ROW_LEN:          168
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          86128
  LAST_UPDATED:         2012-02-20 15:02:57.000000

1 table found.
Command>

One more INLINE page for TEST2_COMP table was allocated 33648 *2 = 67296 (METADATA_BYTES the same). Let’s insert more rows.
Command> begin
       >   for i in 257 .. 511 loop
       >     insert into test2_comp 
       >        values (i,'1234567890123456789012345678901000000'||i,
       >                  '1234567890123456789012345678901234567890',
       >                  '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command> select count(*) from CD$_1086048_2;
< 256 >
1 row found.
Command>

Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   67296
  NUM_USED_ROWS:        512
  NUM_FREE_ROWS:        0
  AVG_ROW_LEN:          168
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          86128
  LAST_UPDATED:         2012-02-20 15:05:24.000000

1 table found.
Command> insert into test2_comp 
       >    values (-1,'1234567890123456789012345678901000000000',       
       >               '1234567890123456789012345678901234567890',       
       >               '1234567890123456789012345678901234567890');
1 row inserted.
Command>
Command>  call ttComputeTabSizes('test2_comp');
Command>  tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   100944
  NUM_USED_ROWS:        513
  NUM_FREE_ROWS:        255
  AVG_ROW_LEN:          155
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       18832
  TOTAL_BYTES:          119776
  LAST_UPDATED:         2012-02-20 15:07:26.000000

1 table found.
Command>

Command> select count(*) from CD$_1086048_2;
< 257 >
1 row found.
Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   100944
  NUM_USED_ROWS:        514
  NUM_FREE_ROWS:        254
  AVG_ROW_LEN:          178
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       36096
  TOTAL_BYTES:          137040
  LAST_UPDATED:         2012-02-20 15:08:42.000000

1 table found.
Command>

The third one INLINE page for TEST2_COMP table was allocated 33648 *3 = 100944 bytes and one more page was allocated for CD$_1086048_2 table. (36096 – 18832 = 17264). Let’s insert more rows.

Command> begin
       >   for i in 513 .. 767 loop
       >     insert into test2_comp 
       >            values (i,'1234567890123456789012345678901000000'||i,
       >                      '1234567890123456789012345678901234567890',
       >                      '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command>  select count(*) from CD$_1086048_2;
< 512 >
1 row found.
Command>
Command> insert into test2_comp 
       >     values (-3,'1234567890123456789012345678901000000768',
       >                '1234567890123456789012345678901234567890',
       >                '1234567890123456789012345678901234567890');
1 row inserted.
Command> select count(*) from CD$_1086048_2;
< 513 >
1 row found.
Command>

Command> call ttComputeTabSizes('test2_comp');
Command> tablesize test2_comp;

Sizes of GENA.TEST2_COMP:

  INLINE_ALLOC_BYTES:   134592
  NUM_USED_ROWS:        770
  NUM_FREE_ROWS:        254
  AVG_ROW_LEN:          183
  OUT_OF_LINE_BYTES:    0
  METADATA_BYTES:       53360
  TOTAL_BYTES:          187952
  LAST_UPDATED:         2012-02-20 16:01:03.000000

1 table found.
Command>

One more page was allocated for CD$_1086048_2 table. (36096 + 17264 = 53360 bytes) and etc.
Of course my example is unrealistic, but as you can see TimesTen can store data more efficiently but it really depends on amount of distinct values in the column. In table below you see a table size for storing different amount of distinct values (the same test2_comp structure, the same row count, different distinct values).
As you can see in some cases (when column stores unique values for example) compression doesn't help. You should know about it.
What else should you know when you decide to use column compression? First of all there are some restrictions for using columnar compression:
  • LOB columns cannot be compressed.
  • Compression is not supported on columns in replicated tables, cache group tables, grid tables, or on global temporary tables. You cannot create a table with the CREATE TABLE AS SELECT statement when defining in-memory columnar compression for that table in that statement.
  • You cannot create materialized views and materialized view logs on tables enabled for compression.
  • Compressed column groups can be added at the time of table creation or added later using ALTER TABLE. You can drop the entire compressed column group with the ALTER TABLE statement.
  • You can create a primary or unique key, where part or or all of the columns included in the key are compressed. For compressed columns included in a primary or unique key, you can include columns that exist within a compressed column group, but you do not have to include all of the columns within the compressed column group. In addition, you can include columns from different compressed column groups.
  • For compressed tables, all SQL operations lock the table. Table and index scans that access the columns of any compressed column group are somewhat slower due to dictionary lookup. However, since all the operations on the table acquire table locks, you do not need to acquire and release lower level locks. INSERT, DELETE and UPDATE operations on these tables will not scale.
  • Indexes can be created on any columns in the table. This includes compressed columns and includes columns that exist in separate compression column groups.
Some of these restrictions keep my attention. Let’s start with locking. For compressed tables, all SQL operations lock the table. It means no concurrency for compressed tables. But compression feature was developed for in-memory analyze purpose, so maybe in this context it’s ok.
Additionally, I was really disappointed when I saw the restriction about impossibility to compress cache group tables and etc. I think it dramatically decreases the opportunity for using this feature.

Conclusion

Columnar compression is the key functionality which was introduced in 11.2.2 TimesTen version. Tables could be compressed at the column level, which stores data more efficiently and as you can see it works well. But you should know about restrictions and some rules it helps you to use this feature more effectively.

7 комментариев:

Mark Rittman комментирует...

Hi Gennady,

Great article, very useful. The other thing to bear in-mind with the column-based compression you talk about, is that it's only licensable through the Oracle TimesTen for Exalytics product, and only usable on an Exalytics server. It doesn't look like you can use this feature with just the standard TimesTen license, or on hardware other than the Exalytics server :-(

Mark

GGS комментирует...

Hi Mark,

Thank you for your remarks. There are much appreciated.

I didn't find any information about limitations for using Columnar Compression.
It would be great if you could send me some links about it.

Thank you in advance,
Gena

Darryn Hinett комментирует...

Hi Gena

The restrictions are listed in the TimesTen licence documentation:

http://www.oracle.com/technetwork/database/timesten/documentation/licensing-info-1478940.pdf

Regards

Darryn

GGS комментирует...

Hi Darryn,

Thank you for your response.
I've already found it :)

Best regards,
Gena

Анонимный комментирует...

Hi,
Thanks for this useful post.

I need help here.
Is it possible to apply Columnar Compression on an already existing table in TimesTen?

If so could you provide the syntax for it?

Thanks.

GGS комментирует...

Hi,

Unfortunately, there is no opportunity to compress table which is already exist. You should to create a new table with clause "optimized for read" and reload data in new table.

Regards,
Gena

Анонимный комментирует...

Hi Gena,
Thanks for your quick response.