A “sleeper feature” is a little known, under-appreciated, and easily overlooked feature of a product that actually has great potential. In this article, you will hear about the sleepiest feature in Oracle Database. It’s been around for a very long time, very few people know about it, very few people use it, and yet, it has the potential to be a giant-killer, the giant in this case being NoSQL (see The Twelve Days of NoSQL).
If you’ve been holidaying in Siberia or similar places for about a year, you have probably not talked to an Oracle Sales rep a NoSQL sales rep yet about RAC NoSQL. But you will no doubt find that there’s a voice mail waiting for you when you turn your mobile phone on again after returning home from the vacation.
You might hear something like this:
“SQL databases are a really good thing but they also have some problems and the most obvious problem is one that most application developers run into as they are working with them which is that we assemble structures of objects in memory often in terms of a kind of a cohesive whole of things and then in order to save it off to the database, we have to strip out into bits so that it goes into those individual rows in individual tables. A single logical structure for our user interface and for our processing in memory ends up being splattered across lots and lots of tables. This is referred to as the impedance mismatch problem; the fact that we have these two very different models of how to look at things and the fact that we have to match them causes difficulties.” (http://www.youtube.com/watch?v=qI_g07C_Q5I#t=110s)
The TL;DR summary is that the relational model causes all your performance problems.
Wrong! Wrong I say! The relational model is a logical model and has no performance implications.
But surely “splattering data” across lots and lots of tables causes performance problems?
Wrong again! Wrong again I say! Tables are certainly the way we talk about data in the relational world but the relational model does not dictate that each table correspond to a separate bucket of physical storage.
What’s that you say? That’s how your database management system works? You’re not using Oracle Database are you? Consider the following example taken from the very first paper on the relational model by its inventor Dr. Edgar Codd.
employee (employee#, name, birthdate) jobhistory (employee#, jobdate, title) salaryhistory (employee#, jobdate, salarydate, salary) children (employee#, childname, birthyear)
It certainly appears that we have to assemble employee information into a single structure in the user interface and then “splatter” it across four tables for storage.
Or do we?
Oracle Database is unique among mainstream database management systems in providing a storage construct called the “table cluster” for hierarchical schemas such as the ones talked up by NoSQL sales reps. In Dr. Codd’s example, employee# would be the cluster key, and rows corresponding to the same cluster key from all four tables could be stored in the same physical block on disk thus avoiding the “join penalty.” If the cluster was a “hash cluster,” no indexes would be required for the use case of retrieving records belonging to a single cluster key.
Even though few application developers have heard of table clusters, they are to be found in every single Oracle database on the planet because data dictionary tables like TAB$ (tables), COL$ (table columns), IND$ (indexes), and ICOL$ (index columns) are stored in table clusters.
SQL> select cluster_name, table_name from dba_tables 2 where owner='SYS' and cluster_name is not null 3 order by cluster_name, table_name; CLUSTER_NAME TABLE_NAME ------------------------------ ------------------------------ C_COBJ# CCOL$ CDEF$ C_FILE#_BLOCK# SEG$ UET$ C_MLOG# MLOG$ SLOG$ C_OBJ# ASSEMBLY$ ATTRCOL$ CLU$ COL$ COLTYPE$ ICOL$ ICOLDEP$ IND$ LIBRARY$ LOB$ NTAB$ OPQTYPE$ REFCON$ SUBCOLTYPE$ TAB$ TYPE_MISC$ VIEWTRCOL$ C_OBJ#_INTCOL# HISTGRM$ C_RG# RGCHILD$ RGROUP$ C_TOID_VERSION# ATTRIBUTE$ COLLECTION$ METHOD$ PARAMETER$ RESULT$ TYPE$ C_TS# FET$ TS$ C_USER# TSQ$ USER$ SMON_SCN_TO_TIME_AUX SMON_SCN_TIME 37 rows selected.
I can prove that clusters have been a part of Oracle Database since at least Oracle Version 6. The Oracle Version 6 documentation is not available online but the Oracle7 Server SQL Reference Manual lists the enhancements to the cluster feature in Oracle7 Server, the implication being that clusters were a feature of Oracle Version 6 (released in 1988 according to Wikipedia).
Here is a fully-worked example of table clusters using the tables in Dr. Codd’s paper.
First we create the four normalized tables listed in Dr. Codd’s paper and prove that all the records of a single employee including job history, salary history, and children are stored in a single database block so that there is never any join-penalty when assembling employee data. The following demonstration was performed using a pre-built developer VM for Oracle VM VirtualBox. The version of Oracle Database is 11.2.0.2.
SQL*Plus: Release 11.2.0.2.0 Production on Sun Jul 28 19:44:23 2013 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options
First, we create a table cluster and add our four tables to the cluster.
SQL> CREATE CLUSTER employees (employee# INTEGER) hashkeys 1000; Cluster created. SQL> CREATE TABLE employees 2 ( 3 employee# INTEGER NOT NULL, 4 name VARCHAR2(16), 5 birth_date DATE, 6 CONSTRAINT employees_pk PRIMARY KEY (employee#) 7 ) 8 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE job_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 title VARCHAR2(16), 6 CONSTRAINT job_history_pk PRIMARY KEY (employee#, job_date), 7 CONSTRAINT job_history_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE salary_history 2 ( 3 employee# INTEGER NOT NULL, 4 job_date DATE NOT NULL, 5 salary_date DATE NOT NULL, 6 salary NUMBER, 7 CONSTRAINT salary_history_pk PRIMARY KEY (employee#, job_date, salary_date), 8 CONSTRAINT salary_history_fk1 FOREIGN KEY (employee#) REFERENCES employees, 9 CONSTRAINT salary_history_fk2 FOREIGN KEY (employee#, job_date) REFERENCES job_history 10 ) 11 CLUSTER employees (employee#); Table created. SQL> CREATE TABLE children 2 ( 3 employee# INTEGER NOT NULL, 4 child_name VARCHAR2(16) NOT NULL, 5 birth_date DATE, 6 CONSTRAINT children_pk PRIMARY KEY (employee#, child_name), 7 CONSTRAINT children_fk1 FOREIGN KEY (employee#) REFERENCES employees 8 ) 9 CLUSTER employees (employee#); Table created.
Then we insert data into all four tables, nine records in total. We then use the DBMS_ROWID.ROWID_BLOCK_NUMBER function to prove that all nine records have been stored in the same database block even though they belong to four different tables. Therefore the “join penalty” has been completely eliminated. Queries involving only one employee won’t need indexes since we used the hash flavor of table clusters. Therefore any query involving only one employee will have to retrieve only one database block.
SQL> INSERT INTO employees VALUES (1, 'IGNATIUS', '01-JAN-1970'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGA', '01-JAN-2001'); 1 row created. SQL> INSERT INTO children VALUES (1, 'INIGO', '01-JAN-2002'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1991', 'PROGRAMMER'); 1 row created. SQL> INSERT INTO job_history VALUES (1, '01-JAN-1992', 'DATABASE ADMIN'); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-FEB-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1991', '1-MAR-1991', 1000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-FEB-1992', 2000); 1 row created. SQL> INSERT INTO salary_history VALUES (1, '01-JAN-1992', '1-MAR-1992', 2000); 1 row created. SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM employees where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM children where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM job_history where employee# = 1; BLOCK_NUMBER ------------ 22881 SQL> SELECT DISTINCT DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) AS block_number FROM salary_history where employee# = 1; BLOCK_NUMBER ------------ 22881
Let’s kick it up a notch. We can create an object-relational view that assembles employee information into a single unnormalized structure and perform CRUD operations on it as if it were a single table. We’ll need an INSTEAD OF trigger.
SQL> CREATE OR REPLACE TYPE children_rec AS OBJECT (child_name VARCHAR2(16), birth_date DATE) 2 / Type created. SQL> CREATE OR REPLACE TYPE children_tab AS TABLE OF children_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_rec AS OBJECT (salary_date DATE, salary NUMBER) 2 / Type created. SQL> CREATE OR REPLACE TYPE salary_history_tab AS TABLE OF salary_history_rec 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_rec AS OBJECT (job_date DATE, title VARCHAR2(16), salary_history SALARY_HISTORY_TAB) 2 / Type created. SQL> CREATE OR REPLACE TYPE job_history_tab AS TABLE of job_history_rec 2 / Type created. SQL> create or replace view employees_view as 2 SELECT 3 employee#, 4 name, 5 birth_date, 6 CAST 7 ( 8 MULTISET 9 ( 10 SELECT 11 child_name, 12 birth_date 13 FROM children 14 WHERE employee#=e.employee# 15 ) 16 AS children_tab 17 ) children, 18 CAST 19 ( 20 MULTISET 21 ( 22 SELECT 23 job_date, 24 title, 25 CAST 26 ( 27 MULTISET 28 ( 29 SELECT salary_date, salary 30 FROM salary_history 31 WHERE employee#=e.employee# 32 AND job_date=jh.job_date 33 ) 34 AS salary_history_tab 35 ) salary_history 36 FROM job_history jh 37 WHERE employee#=e.employee# 38 ) 39 AS job_history_tab 40 ) job_history 41 FROM employees e; View created. SQL> alter session set "_rowsource_execution_statistics"=true; Session altered. SQL> SELECT * FROM employees_view WHERE employee# = 1; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 1 IGNATIUS 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000)))) SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor(null, null, 'TYPICAL IOSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------ SQL_ID aaxmaqz947aa0, child number 0 ------------------------------------- SELECT * FROM employees_view WHERE employee# = 1 Plan hash value: 2117652374 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 1 | 1 |00:00:00.01 | 1 | |* 1 | TABLE ACCESS HASH| CHILDREN | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 2 | TABLE ACCESS HASH| SALARY_HISTORY | 2 | 1 | 44 | | 4 |00:00:00.01 | 3 | |* 3 | TABLE ACCESS HASH| JOB_HISTORY | 1 | 1 | 32 | | 2 |00:00:00.01 | 1 | |* 4 | TABLE ACCESS HASH| EMPLOYEES | 1 | 845 | 27040 | | 1 |00:00:00.01 | 1 | -------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("EMPLOYEE#"=:B1) 2 - access("EMPLOYEE#"=:B1) filter("JOB_DATE"=:B1) 3 - access("EMPLOYEE#"=:B1) 4 - access("EMPLOYEE#"=1) Note ----- - cpu costing is off (consider enabling it) - dynamic sampling used for this statement (level=2) 30 rows selected.
Next, let’s create an “INSTEAD OF” trigger so that we can insert into the above view directly; that is, use a single insert statement instead of multiple insert statements. The trigger will do all the heavy-lifting behind the scenes.
SQL> CREATE OR REPLACE TRIGGER employees_view_insert 2 INSTEAD OF INSERT ON employees_view 3 REFERENCING NEW AS n 4 FOR EACH ROW 5 DECLARE 6 i NUMBER; 7 BEGIN 8 INSERT INTO employees 9 VALUES 10 ( 11 :n.employee#, 12 :n.name, 13 :n.birth_date 14 ); 15 16 FOR i IN :n.children.FIRST .. :n.children.LAST 17 LOOP 18 INSERT INTO children 19 VALUES 20 ( 21 :n.employee#, 22 :n.children(i).child_name, 23 :n.children(i).birth_date 24 ); 25 END LOOP; 26 27 FOR i IN :n.job_history.FIRST .. :n.job_history.LAST 28 LOOP 29 INSERT INTO job_history VALUES 30 ( 31 :n.employee#, 32 :n.job_history(i).job_date, 33 :n.job_history(i).title 34 ); 35 FOR j IN :n.job_history(i).salary_history.FIRST .. :n.job_history(i).salary_history.LAST 36 LOOP 37 INSERT INTO salary_history 38 VALUES 39 ( 40 :n.employee#, 41 :n.job_history(i).job_date, 42 :n.job_history(i).salary_history(j).salary_date, 43 :n.job_history(i).salary_history(j).salary 44 ); 45 END LOOP; 46 END LOOP; 47 END; 48 / Trigger created. SQL> INSERT INTO employees_view 2 VALUES 3 ( 4 2, 5 'YGNACIO', 6 '01-JAN-70', 7 CHILDREN_TAB 8 ( 9 CHILDREN_REC('INIGA', '01-JAN-01'), 10 CHILDREN_REC('INIGO', '01-JAN-02') 11 ), 12 JOB_HISTORY_TAB 13 ( 14 JOB_HISTORY_REC 15 ( 16 '01-JAN-91', 17 'PROGRAMMER', 18 SALARY_HISTORY_TAB 19 ( 20 SALARY_HISTORY_REC('01-FEB-91', 1000), 21 SALARY_HISTORY_REC('01-MAR-91', 1000) 22 ) 23 ), 24 JOB_HISTORY_REC 25 ( 26 '01-JAN-92', 27 'DATABASE ADMIN', 28 SALARY_HISTORY_TAB 29 ( 30 SALARY_HISTORY_REC('01-FEB-92', 2000), 31 SALARY_HISTORY_REC('01-MAR-92', 2000) 32 ) 33 ) 34 ) 35 ); 1 row created. SQL> SELECT * FROM employees_view WHERE employee# = 2; EMPLOYEE# NAME BIRTH_DAT ---------- ---------------- --------- CHILDREN(CHILD_NAME, BIRTH_DATE) ------------------------------------------------------------------------------------------------------------------------------------ JOB_HISTORY(JOB_DATE, TITLE, SALARY_HISTORY(SALARY_DATE, SALARY)) ------------------------------------------------------------------------------------------------------------------------------------ 2 YGNACIO 01-JAN-70 CHILDREN_TAB(CHILDREN_REC('INIGA', '01-JAN-01'), CHILDREN_REC('INIGO', '01-JAN-02')) JOB_HISTORY_TAB(JOB_HISTORY_REC('01-JAN-91', 'PROGRAMMER', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-91', 1000), SALARY_HISTORY_ REC('01-MAR-91', 1000))), JOB_HISTORY_REC('01-JAN-92', 'DATABASE ADMIN', SALARY_HISTORY_TAB(SALARY_HISTORY_REC('01-FEB-92', 2000), S ALARY_HISTORY_REC('01-MAR-92', 2000))))
The above example proves that you can use the relational model of data for hierarchical schemas—such as the ones talked up by NoSQL sales reps—without incurring a performance penalty. There are certainly some cases—such as Microsoft Word documents and Excel spreadsheets—where the relational model doesn’t add much value and for these cases mainstream database management systems provide the XML data type.
Oracle table clusters can be partitioned with all the benefits of partition pruning and parallel query but we’ll need to use another very sleepy feature of Oracle Database. I’ll leave that for a follow-up article.
On page 379 of Effective Oracle by Design, Tom Kyte quotes Steve Adams as saying:
“If a schema has no IOTs or clusters, that is a good indication that no thought has been given to the matter of optimizing data access.”
I’ll leave you with that thought.
P.S. If you’re trying to get a handle on the NoSQL phenomenon, I recommend that you read The Twelve Days of NoSQL.
Leave a Reply