Why Functionally Equivalent SQL Queries In Oracle May Not Be Equally Efficient

on December 4, 2014


The inventor of the relational model, Dr. Edgar Codd, was of the opinion that “[r]equesting data by its properties is far more natural than devising a particular algorithm or sequence of operations for its retrieval. Thus, a calculus-oriented language provides a good target language for a more user-oriented source language” (“Relational Completeness of Data Base Sublanguages”). Therefore, with the exception of the union operation, the original version of SQL was based on relational calculus, although, over time, other elements of relational algebra like difference (minus), intersection, and outer join crept in.

Testing of existence in a set using subqueries is the hallmark of relational calculus. The following example has nested subqueries. It lists the locations containing a department that either contains an employee named Steven King or an employee who holds the title of President or an employee who has previously held the title of President.

SELECT l.location_id, l.city

FROM locations l

WHERE EXISTS (

  SELECT * FROM departments d, employees e, jobs j

  WHERE d.location_id = l.location_id

  AND e.department_id = d.department_id

  AND j.job_id = e.job_id

  AND (

    (e.first_name = 'Steven' AND e.last_name = 'King')

    OR j.job_title  = 'President'

    OR EXISTS (

      SELECT * FROM job_history jh, jobs j2

      WHERE jh.employee_id = e.employee_id

      AND j2.job_id = jh.job_id

      AND j2.job_title = 'President'

    )

  )

)

 

The challenge was to rewrite the above query without subqueries and in the most efficient way possible, as measured by the number of consistent gets—the Buffers column in the query execution plan—when executed in the HR sample schema. The competitors published their submissions on the NoCOUG blog nocoug.wordpress.com. Oracle ACE Directors Kyle Hailey and Tim Gorman helped judge the competition.

Backgrounder

The secret sauce of relational database management systems is composed of what Dr. Edgar Codd referred to as “relational calculus” and “relational algebra.” He was a mathematician by training so he instinctively used complex-sounding mathematical terms that normal folk don’t use a lot. Relational calculus has nothing to do with college calculus; a relational calculus expression is simply an English-like non-procedural specification of the user’s query requirements. Relational algebra, on the other hand, is a step-by-step procedural recipe that details how to meet the user’s requirements. The whole reason why there is so much interest in EXPLAIN PLAN is that it documents the sequence of relational algebra operations that Oracle Database used at runtime to execute any particular query; that is, it documents the query execution plan used by Oracle Database.

More precisely, relational algebra is a collection of operations that can be used to combine tables. Just as you can combine numbers using the operations of addition, subtraction, multiplication, and division, you can combine tables using operations like “restriction,” “projection,” “union,” “difference,” and “cartesian join.” These five operations are provably sufficient to implement any requirement expressed in relational calculus. They are also primitive; that is, none of them is a combination of the others. Composite relational operations can be devised by combining these five operations; examples include “inner join,” “intersection,” “outer join,” “semi-join,” “anti-join,” and “division.” For example, inner join is obviously a combination of cartesian join and selection. Another example is A INTERSECTION B = A MINUS (A MINUS B).

In “Relational Completeness of Data Base Sublanguages,” Codd showed how to mechanically convert a relational calculus expression into a relational algebra expression. Refer to http://goo.gl/SHcWLE for a worked example. In the case of the challenge query in this article, a little thought—or a lot of thought—will convince you that the EXISTS clauses can be eliminated by the simple expedient of selecting distinct values of location_id and city from the join of all the tables in the original query, as shown below. Note that the jobs table has to be joined twice because it occurs twice in the challenge query.

SELECT DISTINCT l.location_id, l.city
FROM locations l, departments d, employees e, jobs j, job_history jh, jobs j2
WHERE EXISTS (
  SELECT * FROM departments d, employees e, jobs j
  WHERE d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND j.job_id = e.job_id
  AND (
    (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_title  = 'President'
    OR EXISTS (
      SELECT * FROM job_history jh, jobs j2
      WHERE jh.employee_id = e.employee_id
      AND j2.job_id = jh.job_id
      AND j2.job_title = 'President'
    )
  )
)

However, such a mechanical conversion from relational calculus to relational algebra may not be your best bet. In general, there is more than one way to write a SQL query, and the query optimizer may choose a different query execution plan for each version. In other words, functionally equivalent SQL queries may not be equally efficient in practice. This is the biggest unsolved problem of the relational era.

How the mini-challenge was judged

The criteria used by the judges were correctness and efficiency. Efficiency was measured by the number of consistent gets, that is, the Buffers column in the query execution plan. The GATHER_PLAN_STATISTICS hint instructs Oracle Database to keep track of “rowsource execution statistics,” which can then be printed out by DBMS_XPLAN.DISPLAY_CURSOR. Judging correctness is a much harder task, because no tools are available for determining the equivalence of two queries. In fact, the reason why the query optimizer may choose different query execution plans for two functionally equivalent queries is that it cannot tell if they are equivalent. What is needed is a systematic method of converting a SQL query into a “canonical” version such that all functionally equivalent queries would be converted into the same canonical version. In the absence of automated tools, we have to resort to thinking very hard and writing test cases.

Announcing the winner!

Kim Berg Hansen from the Netherlands wins the mini-challenge. Not only did his solution survive the test cases written by the judges but he reduced the number of consistent gets to the theoretical minimum of 1 by using a UNION ALL materialized view. The first branch of the UNION ALL finds employees named Steven King or who hold the title of President, while the second branch finds employees who previously held the title of President. Materialized views can be indexed just like tables, so Kim created an index consisting only of the location_id and city columns. He also took pains to ensure that the materialized view was fast-refreshable on commit. Therefore, his solution continues to return correct results even after the tables are updated.

CREATE MATERIALIZED VIEW LOG ON employees WITH ROWID, COMMIT SCN
   (employee_id, job_id, first_name, last_name, department_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON departments WITH ROWID, COMMIT SCN
   (department_id, location_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON locations WITH ROWID, COMMIT SCN
   (location_id, city)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON jobs WITH ROWID, COMMIT SCN
   (job_id, job_title)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON job_history WITH ROWID, COMMIT SCN
   (job_id, employee_id)
   INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW employees_mv
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE
AS
SELECT 'CURRENT' record_type, l.location_id, l.city, j.rowid j_rowid, NULL jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
FROM locations l, departments d, employees e, jobs j
WHERE (
  d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND j.job_id = e.job_id
  AND (
    (e.first_name = 'Steven' AND e.last_name = 'King')
    OR j.job_title  = 'President'
  )
)
UNION ALL
SELECT 'HISTORY' record_type, l.location_id, l.city, j2.rowid j_rowid, jh.rowid jh_rowid, e.rowid e_rowid, d.rowid d_rowid, l.rowid l_rowid
FROM locations l, departments d, employees e, job_history jh, jobs j2
WHERE (
  d.location_id = l.location_id
  AND e.department_id = d.department_id
  AND (
      jh.employee_id = e.employee_id
      AND j2.job_id = jh.job_id
      AND j2.job_title = 'President'
  )
);

CREATE INDEX employees_mv_ix on employees_mv (
   location_id, city
);

 

Kim’s solution to the mini-challenge mimics the definition of the materialized view in the hope that “query rewrite” will kick in. Because Oracle Database does not index null values, the redundant clause “WHERE location_id IS NOT NULL” is added to ensure that the query optimizer uses the index.

SELECT /*+ GATHER_PLAN_STATISTICS */ DISTINCT location_id, city
FROM (
  SELECT l.location_id, l.city
  FROM locations l, departments d, employees e, jobs j
  WHERE (
    d.location_id = l.location_id
    AND e.department_id = d.department_id
    AND j.job_id = e.job_id
    AND (
      (e.first_name = 'Steven' AND e.last_name = 'King')
      OR j.job_title  = 'President'
    )
  )
  UNION ALL
  SELECT l.location_id, l.city
  FROM locations l, departments d, employees e,
    job_history jh, jobs j2
  WHERE (
    d.location_id = l.location_id
    AND e.department_id = d.department_id
    AND (
        jh.employee_id = e.employee_id
        AND j2.job_id = jh.job_id
        AND j2.job_title = 'President'
    )
  )
)
WHERE location_id IS NOT NULL;

 

WHERE location_id IS NOT NULL;

On checking the query execution plan, we find that the data was indeed obtained from the index on the materialized view. Only one consistent get operation was needed.

SELECT * FROM
TABLE(dbms_xplan.display_cursor(format => 'TYPICAL IOSTATS LAST'));

Plan hash value: 4093995400

--------------------------------------------------------------------------
| Id  | Operation          | Name            | Starts | A-Rows | Buffers |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |      1 |      1 |       1 |
|   1 |  SORT UNIQUE NOSORT|                 |      1 |      1 |       1 |
|*  2 |   INDEX FULL SCAN  | EMPLOYEES_MV_IX |      1 |      1 |       1 |

 

————————————————————————–

Kim wins a $75 Amazon gift certificate for his most excellent effort. The entries of the other competitors have been published on the NoCOUG blog nocoug.wordpress.com.

Related Posts

Comments

  1. Undeniably consider that thatt you stated. Your
    favourite justification seemed to bee on the web the easijest thing to ear in mind of.
    I say to you, I certainly get irked even as othewr folks think about concerns that they plainly don’t understand about.
    You managed to hit the nail upon the highest and defined out the entire thing with no need
    side-effects , other folks can take a signal. Will probably be back to get more.

    Thank you

Leave a Reply