Performance Costs of Oracle Database Denormalization

on May 28, 2015


Throughout the history of relational databases, the issue of denormalization has never been settled. Theoreticians still continue to argue about it, but from the practical point of view, most database systems are denormalized to some extent. For example, about 99.9 percent of address records in the United States have the ZIP code, city, and state as separate attributes in the same table, even though it is known that ZIP codes never cross state boundaries and mostly never cross city boundaries. Few recognize that this design is a violation of third normal form (3NF).

You can avoid the problem altogether by using logical denormalization and introducing views instead of tables. However, this approach has its own challenges, mostly due to the lack of developers’ experience with INSTEAD OF triggers.

If you cannot avoid physical denormalization, you should do it properly. Until recently, there have been very limited options from which to choose. Assume that you want to use “full name,” which looks like a well-structured concatenation of first, middle, and last names. The way to accomplish this is by using triggers or materialized views. The first option includes the potential danger of data becoming out of sync after some time (because of either developer mistakes or DBA mistakes). Materialized views are significantly safer, but they only help if you can separate reading from writing.

Starting with Oracle Database 11g, the notion of “virtual columns” created a viable alternative to trigger-based techniques. Oracle does not store calculated values for these columns (which are functionally the same as columns belonging to views). Oracle gathers statistics about virtual columns and allows those virtual columns to be part of different constraints or indexes. Therefore, even if you pay the price for using SELECT, you will still encounter significant optimization on WHERE and ORDER clauses, especially if you have indexes.

If your virtual columns are built based on PL/SQL functions, this could be another area where the Oracle Database 12c feature PRAGMA UDF may help. Adding this PRAGMA UDF clause causes Oracle to compile your PL/SQL units in a SQL-friendly way that has a significant effect on the user-defined functions that are being called from SQL.

The following problem is very common in legacy systems in which storing dates in text format as YYYYMMDD was normal practice. Sooner or later, columns in this format unavoidably become corrupted, and some percentage of values start to look different. For testing purposes, add a separate column LAST_DDT_TX to the existing TEST_TAB table and simulate some bad data:

p0141-01

The existence of bad data makes reporting tasks a nightmare because at any point in time, you can get an Oracle exception. Unfortunately, it is very hard to get permission to clean up existing data even when you know that it is bad. To overcome that non-technical restriction, developers often build a special function to suppress the error and then use that function in reports. This time, such a function will be built with and without the PRAGMA UDF clause:

p0141-02

Note the DETERMINISTIC clause in the previous code example. It is a requirement for user-defined PL/SQL functions. Now newly created functions can be used to define virtual columns:

p0142-01

NOTE

Oracle does not validate that your function is indeed deterministic, meaning that for the same IN parameter(s), the function will always return the same OUT value. The system will trust your statement and behave accordingly. You should not add the DETERMINISTIC clause just to overcome the formal restriction. If you create a function that is not DETERMINISTIC and you get strange results or bugs, you have only yourself to blame. You have been warned!

Now all of the pieces of the test are in place:

p0142-02

These results are compelling. SQL optimization of PL/SQL code by using a PRAGMA UDF clause once again demonstrated its effectiveness as a powerful performance tuning method. Cutting the overhead of using virtual columns is important because that could make them less cost prohibitive and more appealing. Of course, trigger-based denormalized columns are much easier to manage, but don’t forget about data quality. No matter how good your rules are, they are very hard to enforce when you start to work with real people. To err is human!

Related Posts

Leave a Reply