The Sleepiest Feature of Oracle Database

on January 14, 2014

A “sleeper feature” is a little known, under-appreciated, and easily overlooked feature of a product that actually has great potential. In this article, you will hear about the sleepiest feature in Oracle Database. It’s been around for a very long time, very few people know about it, very few people use it, and yet, it has the potential to be a giant-killer, the giant in this case being NoSQL (see The Twelve Days of NoSQL).

If you’ve been holidaying in Siberia or similar places for about a year, you have probably not talked to an Oracle Sales rep a NoSQL sales rep yet about RAC NoSQL. But you will no doubt find that there’s a voice mail waiting for you when you turn your mobile phone on again after returning home from the vacation.

You might hear something like this:

“SQL databases are a really good thing but they also have some problems and the most obvious problem is one that most application developers run into as they are working with them which is that we assemble structures of objects in memory often in terms of a kind of a cohesive whole of things and then in order to save it off to the database, we have to strip out into bits so that it goes into those individual rows in individual tables. A single logical structure for our user interface and for our processing in memory ends up being splattered across lots and lots of tables. This is referred to as the impedance mismatch problem; the fact that we have these two very different models of how to look at things and the fact that we have to match them causes difficulties.” (http://www.youtube.com/watch?v=qI_g07C_Q5I#t=110s)

The TL;DR summary is that the relational model causes all your performance problems.

Wrong! Wrong I say! The relational model is a logical model and has no performance implications.

But surely “splattering data” across lots and lots of tables causes performance problems?

Wrong again! Wrong again I say! Tables are certainly the way we talk about data in the relational world but the relational model does not dictate that each table correspond to a separate bucket of physical storage.

What’s that you say? That’s how your database management system works? You’re not using Oracle Database are you? Consider the following example taken from the very first paper on the relational model by its inventor Dr. Edgar Codd.

employee (employee#, name, birthdate)
jobhistory (employee#, jobdate, title)
salaryhistory (employee#, jobdate, salarydate, salary)
children (employee#, childname, birthyear)

It certainly appears that we have to assemble employee information into a single structure in the user interface and then “splatter” it across four tables for storage.

Or do we?

Oracle Database is unique among mainstream database management systems in providing a storage construct called the “table cluster” for hierarchical schemas such as the ones talked up by NoSQL sales reps. In Dr. Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk thus avoiding the “join penalty.” If the cluster was a “hash cluster,” no indexes would be required for the use case of retrieving records belonging to a single cluster key.

Even though few application developers have heard of table clusters, they are to be found in every single Oracle database on the planet because data dictionary tables like TAB$ (tables), COL$ (table columns), IND$ (indexes), and ICOL$ (index columns) are stored in table clusters.

SQL> select cluster_name, table_name from dba_tables
  2  where owner='SYS' and cluster_name is not null
  3  order by cluster_name, table_name;

CLUSTER_NAME                   TABLE_NAME
------------------------------ ------------------------------
C_COBJ#                        CCOL$
                               CDEF$

C_FILE#_BLOCK#                 SEG$
                               UET$

C_MLOG#                        MLOG$
                               SLOG$

C_OBJ#                         ASSEMBLY$
                               ATTRCOL$
                               CLU$
                               COL$
                               COLTYPE$
                               ICOL$
                               ICOLDEP$
                               IND$
                               LIBRARY$
                               LOB$
                               NTAB$
                               OPQTYPE$
                               REFCON$
                               SUBCOLTYPE$
                               TAB$
                               TYPE_MISC$
                               VIEWTRCOL$

C_OBJ#_INTCOL#                 HISTGRM$

C_RG#                          RGCHILD$
                               RGROUP$

C_TOID_VERSION#                ATTRIBUTE$
                               COLLECTION$
                               METHOD$
                               PARAMETER$
                               RESULT$
                               TYPE$

C_TS#                          FET$
                               TS$

C_USER#                        TSQ$
                               USER$

SMON_SCN_TO_TIME_AUX           SMON_SCN_TIME


37 rows selected.

I can prove that clusters have been a part of Oracle Database since at least Oracle Version 6. The Oracle Version 6 documentation is not available online but the Oracle7 Server SQL Reference Manual lists the enhancements to the cluster feature in Oracle7 Server, the implication being that clusters were a feature of Oracle Version 6 (released in 1988 according to Wikipedia).

Here is a fully-worked example of table clusters using the tables in Dr. Codd’s paper.

First we create the four normalized tables listed in Dr. Codd’s paper and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. The following demonstration was performed using a pre-built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 11.2.0.2.

SQL*Plus: Release 11.2.0.2.0 Production on Sun Jul 28 19:44:23 2013
 
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
 
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

First, we create a table cluster and add our four tables to the cluster.

SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000;
 
Cluster created.
 
SQL> CREATE TABLE employees
  2  (
  3    employee# INTEGER NOT NULL,
  4    name VARCHAR2(16),
  5    birth_date DATE,
  6    CONSTRAINT employees_pk PRIMARY KEY (employee#)
  7  )
  8  CLUSTER employees (employee#);
 
Table created.
 
SQL> CREATE TABLE job_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    title VARCHAR2(16),
  6    CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date),
  7    CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);
 
Table created.
 
SQL> CREATE TABLE salary_history
  2  (
  3    employee# INTEGER NOT NULL,
  4    job_date DATE NOT NULL,
  5    salary_date DATE NOT NULL,
  6    salary NUMBER,
  7    CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date),
  8    CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees,
  9    CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history
 10  )
 11  CLUSTER employees (employee#);
 
Table created.
 
SQL> CREATE TABLE children
  2  (
  3    employee# INTEGER NOT NULL,
  4    child_name VARCHAR2(16) NOT NULL,
  5    birth_date DATE,
  6    CONSTRAINT children_pk PRIMARY KEY (employee#, child_name),
  7    CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees
  8  )
  9  CLUSTER employees (employee#);
 
Table created.

Then we insert data into all four tables, nine records in total. We then use the DBMS_ROWID.ROWID_BLOCK_NUMBER function to prove that all nine records have been stored in the same database block even though they belong to four different tables. Therefore the “join penalty” has been completely eliminated. Queries involving only one employee won’t need indexes since we used the hash flavor of table clusters. Therefore any query involving only one employee will have to retrieve only one database block.

SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970');
 
1 row created.
 
SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001');
 
1 row created.
 
SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002');
 
1 row created.
 
SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER');
 
1 row created.
 
SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN');
 
1 row created.
 
SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000);
 
1 row created.
 
SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000);
 
1 row created.
 
SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000);
 
1 row created.
 
SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000);
 
1 row created.
 
SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1;
 
BLOCK_NUMBER
------------
       22881
 
SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1;
 
BLOCK_NUMBER
------------
       22881
 
SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1;
 
BLOCK_NUMBER
------------
       22881
 
SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1;
 
BLOCK_NUMBER
------------
       22881

Let’s kick it up a notch. We can create an object-relational view that assembles employee information into a single unnormalized structure and perform CRUD operations on it as if it were a single table. We’ll need an INSTEAD OF trigger.

SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE)
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER)
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB)
  2  /
 
Type created.
 
SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec
  2  /
 
Type created.
 
SQL> create or replace view employees_view as
  2  SELECT
  3    employee#,
  4    name,
  5    birth_date,
  6    CAST
  7    (
  8      MULTISET
  9      (
 10        SELECT
 11          child_name,
 12          birth_date
 13        FROM children
 14        WHERE employee#=e.employee#
 15      )
 16      AS children_tab
 17    ) children,
 18    CAST
 19    (
 20      MULTISET
 21      (
 22        SELECT
 23          job_date,
 24          title,
 25          CAST
 26          (
 27            MULTISET
 28            (
 29              SELECT salary_date, salary
 30              FROM salary_history
 31              WHERE employee#=e.employee#
 32              AND job_date=jh.job_date
 33            )
 34            AS salary_history_tab
 35          ) salary_history
 36        FROM job_history jh
 37        WHERE employee#=e.employee#
 38      )
 39      AS job_history_tab
 40    ) job_history
 41  FROM employees e;
 
View created.
SQL> alter session set "_rowsource_execution_statistics"=true;
 
Session altered.
 
SQL> SELECT * FROM employees_view WHERE employee# = 1;
 
 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         1 IGNATIUS         01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))
 
SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  aaxmaqz947aa0, child number 0
-------------------------------------
SELECT * FROM employees_view WHERE employee# = 1
 
Plan hash value: 2117652374
 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows |E-Bytes| Cost  | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |       |     1 |      1 |00:00:00.01 |       1 |
|*  1 |  TABLE ACCESS HASH| CHILDREN       |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  2 |  TABLE ACCESS HASH| SALARY_HISTORY |      2 |      1 |    44 |       |      4 |00:00:00.01 |       3 |
|*  3 |  TABLE ACCESS HASH| JOB_HISTORY    |      1 |      1 |    32 |       |      2 |00:00:00.01 |       1 |
|*  4 |  TABLE ACCESS HASH| EMPLOYEES      |      1 |    845 | 27040 |       |      1 |00:00:00.01 |       1 |
--------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("EMPLOYEE#"=:B1)
   2 - access("EMPLOYEE#"=:B1)
       filter("JOB_DATE"=:B1)
   3 - access("EMPLOYEE#"=:B1)
   4 - access("EMPLOYEE#"=1)
 
Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)
 
30 rows selected.

Next, let’s create an “INSTEAD OF” trigger so that we can insert into the above view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting behind the scenes.

SQL> CREATE OR REPLACE TRIGGER employees_view_insert
  2  INSTEAD OF INSERT ON employees_view
  3  REFERENCING NEW AS n
  4  FOR EACH ROW
  5  DECLARE
  6    i NUMBER;
  7  BEGIN
  8    INSERT INTO employees
  9    VALUES
 10    (
 11      :n.employee#,
 12      :n.name,
 13      :n.birth_date
 14    );
 15
 16    FOR i IN :n.children.FIRST .. :n.children.LAST
 17    LOOP
 18      INSERT INTO children
 19      VALUES
 20      (
 21        :n.employee#,
 22        :n.children(i).child_name,
 23        :n.children(i).birth_date
 24      );
 25    END LOOP;
 26
 27    FOR i IN :n.job_history.FIRST .. :n.job_history.LAST
 28    LOOP
 29      INSERT INTO job_history VALUES
 30      (
 31        :n.employee#,
 32        :n.job_history(i).job_date,
 33        :n.job_history(i).title
 34      );
 35      FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST
 36      LOOP
 37        INSERT INTO salary_history
 38        VALUES
 39        (
 40          :n.employee#,
 41          :n.job_history(i).job_date,
 42          :n.job_history(i).salary_history(j).salary_date,
 43          :n.job_history(i).salary_history(j).salary
 44         );
 45      END LOOP;
 46    END LOOP;
 47  END;
 48  /
 
Trigger created.
SQL> INSERT INTO employees_view
  2  VALUES
  3  (
  4    2,
  5    'YGNACIO',
  6    '01-JAN-70',
  7    CHILDREN_TAB
  8    (
  9      CHILDREN_REC('INIGA', '01-JAN-01'),
 10      CHILDREN_REC('INIGO', '01-JAN-02')
 11    ),
 12    JOB_HISTORY_TAB
 13    (
 14      JOB_HISTORY_REC
 15      (
 16        '01-JAN-91',
 17        'PROGRAMMER',
 18        SALARY_HISTORY_TAB
 19        (
 20          SALARY_HISTORY_REC('01-FEB-91', 1000),
 21          SALARY_HISTORY_REC('01-MAR-91', 1000)
 22        )
 23      ),
 24      JOB_HISTORY_REC
 25      (
 26        '01-JAN-92',
 27        'DATABASE ADMIN',
 28        SALARY_HISTORY_TAB
 29        (
 30          SALARY_HISTORY_REC('01-FEB-92', 2000),
 31          SALARY_HISTORY_REC('01-MAR-92', 2000)
 32        )
 33      )
 34    )
 35  );
 
1 row created.
 
SQL> SELECT * FROM employees_view WHERE employee# = 2;
 
 EMPLOYEE# NAME             BIRTH_DAT
---------- ---------------- ---------
CHILDREN(CHILD_NAME, BIRTH_DATE)
------------------------------------------------------------------------------------------------------------------------------------
JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY))
------------------------------------------------------------------------------------------------------------------------------------
         2 YGNACIO          01-JAN-70
CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02'))
JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_
REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S
ALARY_HISTORY_REC('01-MAR-92', 2000))))

The above example proves that you can use the relational model of data for hierarchical schemas—such as the ones talked up by NoSQL sales reps—without incurring a performance penalty. There are certainly some cases—such as Microsoft Word documents and Excel spreadsheets—where the relational model doesn’t add much value and for these cases mainstream database management systems provide the XML data type.

Oracle table clusters can be partitioned with all the benefits of partition pruning and parallel query but we’ll need to use another very sleepy feature of Oracle Database. I’ll leave that for a follow-up article.

On page 379 of Effective Oracle by Design, Tom Kyte quotes Steve Adams as saying:

“If a schema has no IOTs or clusters, that is a good indication that no thought has been given to the matter of optimizing data access.”

I’ll leave you with that thought.

P.S. If you’re trying to get a handle on the NoSQL phenomenon, I recommend that you read The Twelve Days of NoSQL.