SQL query and columnar compression

I've already written about Columnar Compression in TimesTen and now I would like to share some interesting point about it.
Environment:
[oracle@nodett1 sql]$ ttversion
TimesTen Release 11.2.2.2.0 (32 bit Linux/x86) (tt1122:53392) 2011-12-23T09:21:34Z
Instance admin: oracle
Instance home directory: /u01/app/oracle/product/11.2.2/TimesTen/tt1122
Group owner: oinstall 
Daemon home directory: /u01/app/oracle/product/11.2.2/TimesTen/tt1122/info
PL/SQL enabled.
[oracle@nodett1 sql]$
Let's create a compression table and insert 1M rows.
Command> create table tab_comp (id number not null,
       >                        val_1   varchar2(40),
       >                        val_2   varchar2(40))
       > compress (val_1 by dictionary maxvalues = 255) optimized for read;

Command> desc tab_comp;

Table GENA.TAB_COMP:
  Columns:
    ID                              NUMBER NOT NULL
    VAL_1                           VARCHAR2 (40) INLINE
    VAL_2                           VARCHAR2 (40) INLINE
  COMPRESS ( VAL_1 BY DICTIONARY MAXVALUES=255 ) OPTIMIZED FOR READ

1 table found.
(primary key columns are indicated with *)

Command> alltables;
  GENA.CD$_1086048_2
  GENA.TAB_COMP
  ...
26 tables found.
Command>

Command> desc CD$_1086048_2;

Table GENA.CD$_1086048_2:
  Columns:
   *VAL_1                           VARCHAR2 (40) INLINE
    ##CD_REFCNT                     TT_INTEGER NOT NULL

1 table found.
(primary key columns are indicated with *)
Command>
Command> begin
       >   for i in 1 .. 1000000 loop
       >     insert into tab_comp
       >          values (i,
       >                  '1234567890123456789012345678901234567890',
       >                  '1234567890123456789012345678901234567890');
       >   end loop;
       > end;
       > /

PL/SQL procedure successfully completed.

Command> select count(*) from tab_comp;
< 1000000 >
1 row found.
Command> select * from CD$_1086048_2;
< 1234567890123456789012345678901234567890, 1000000 >
1 row found.
Command>
As you can see, I've inserted the same value in VAL_1 column one million times. The CD$_1086048_2 system table contains only one row and TAB_COMP table contains only one 1 byte pointer in VAL_1 column for one row in CD$_1086048_2 table. After that I execute the following simple query (select count(val_1) from tab_comp;).
Command> autocommit 0;
Command> showplan 1;
Command> select count(val_1) from tab_comp;

Query Optimizer Plan:

  STEP:                1
  LEVEL:               2
  OPERATION:           TblLkSerialScan
  TBLNAME:             TAB_COMP
  IXNAME:              
  INDEXED CONDITION:   
  NOT INDEXED:         


  STEP:                2
  LEVEL:               1
  OPERATION:           OneGroupGroupBy
  TBLNAME:             
  IXNAME:              
  INDEXED CONDITION:   
  NOT INDEXED:         

< 1000000 >
1 row found.
Command>
Command> showplan 0;
Command> timing 1;
Command> select count(val_1) from tab_comp;
< 1000000 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.050841 seconds.

Command> select count(val_1) from tab_comp;
< 1000000 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.054260 seconds.

Command> select count(val_1) from tab_comp;
< 1000000 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.046867 seconds.

Command> select count(val_1) from tab_comp;
< 1000000 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.049661 seconds.
Command>
I was confused!!! TimesTen optimizer has chosen the full TAB_COMP table scan and of course it took long time (~0.04s). In TimesTen documentation I found these sentences ‘You can compress tables at the column level, which stores data more efficiently. This eliminates redundant storage of duplicate values within columns and improves the performance of SQL queries that perform full table scans.’ So I was expecting that optimizer would rewrite the query for something like select count('##CD_REFCNT') from CD$_1086048_2, especially taking into account that all information needed for optimizer could be found in CD$_1086048_2 table.
Command> select count('##CD_REFCNT') from CD$_1086048_2;
< 1 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.000058 seconds.
Command> select count('##CD_REFCNT') from CD$_1086048_2;
< 1 >
1 row found.
Execution time (SQLExecute + Fetch Loop) = 0.000081 seconds.
Unfortunately, my assumptions were not confirmed. I hope TimesTen development team will include this feature in next release.

Комментариев нет: