Extended Optimizer Statistics in Oracle 11g Improve Performance

on December 21, 2012

Better execution plans for faster running SQL is a common goal. Unfortunately good index choices may often be missed in an execution plan. Tables joined in the wrong order can lead to slower, more cumbersome joins as excess data is carried through a query. To circumvent, programmers have tried to insert the LEADING or ORDERED hints. But even after histograms have been run, Oracle still may not generate an optimal execution plan because it doesn’t have enough understanding of the data.

One piece of missing information was the understanding of relationships between columns on a given table. If two or more columns were related, Oracle sometimes had difficulty calculating the cardinality or selectivity when the columns were in a query together. Additionally, the optimizer needed more information to get exact cardinality when there were expressions or functions used on a column in a query. Oracle 11g now has a mechanism for extending the statistics so that better execution paths can be chosen.

Gathering Statistics

To demonstrate the gathering of the additional statistics, a small table CARS has been created. The columns MAKE and MODEL are interrelated and the column COLOR has mixed case, so that the function UPPER can be used to affect the cardinality.

1. Begin by gathering column statistics without any histograms. Then run a query against user_tab_col_statistics to look at the cardinality, density and histogram data. Through the process of adding extended statistics, we will see this column statistics view change as the new extended statistics are added.

exec dbms_stats.gather_table_stats(null,’cars’, method_opt => ‘for all columns size 1?);
 
select table_name table, column_name column, num_distinct distinct, density, num_buckets buckets, histogram
 from user_tab_col_statistics where table_name = ‘CARS’;
 
TABLE     COLUMN       DISTINCT       DENSITY          BUCKETS      HISTOGRAM
 CARS      CAR_ID         34                    .043478261        1                    NONE
 CARS      MAKE              2                    .5                        1                    NONE
 CARS      MODEL            4                   .25                       1                    NONE
 CARS      COLOR          15                   .111111111           1                    NONE

2. For the skewed columns, let us now gather histograms. This makes changes to the density, number of buckets and the histogram type.

exec dbms_stats.gather_table_stats(null,’cars’, method_opt => ‘for all columns size skewonly’);
 
select table_name table, column_name column, num_distinct distinct, density, num_buckets buckets, histogram
 from user_tab_col_statistics where table_name = ‘CARS’;
 
TABLE     COLUMN_NAME     DISTINCT   DENSITY       BUCKETS      HISTOGRAM
 CARS      CAR_ID                    34               .029411765     1                    NONE
 CARS      MAKE                         2               .014705882     2                    FREQUENCY
 CARS      MODEL                       4               .014705882    4                     FREQUENCY
 CARS      COLOR                     15               .014705882  15                     FREQUENCY

3. Because MODEL is dependant upon MAKE, the data is correlated. The extended statistics are used by Oracle to estimate the combined selectivity of the predicates. This will create a column statistics entry for the column group in the Data Dictionary. The statistics for this entry will be generated whenever statistics are gathered. Here we gather them manually immediately to see the result.

Executing the CREATE_EXTENDED_STATS function will return the system generated name for the statistics entry. The system generated name is 30 characters long and has been truncated for this query below.

select dbms_stats.create_extended_stats(null,’cars’, ‘(make, model)’) from dual;
 
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,’CARS’,'(MAKE,MODEL)’)
 SYS_STU8QPK2S$PEWHARK2CP3#1F#G
 
exec dbms_stats.gather_table_stats(null,’cars’, method_opt => ‘for all columns size skewonly’);
 
select table_name table, column_name column, num_distinct distinct, density, num_buckets buckets, histogram
 from user_tab_col_statistics where table_name = ‘CARS’;
 
TABLE    COLUMN_NAME   DISTINCT    DENSITY        BUCKETS   HISTOGRAM
 CARS    CAR_ID                   34                .029411765       1                 NONE
 CARS    MAKE                        2                .014705882       2                 FREQUENCY
 CARS    MODEL                      4                .014705882       4                 FREQUENCY
 CARS    COLOR                   15                 .014705882     15                 FREQUENCY
 CARS    SYS_STU8QPK2      4                .014705882        4                 FREQUENCY

Optionally, the procedures that take column names will accept extension names in place of the column name. Thus both the steps of create_extended_stats and gather_table_stats can be combined using gather_table_stats with the column group specified.

exec dbms_stats.gather_table_stats(null,’cars’, method_opt => ‘for all columns size skewonly for columns (MAKE,MODEL) skewonly’);

4. Extended statistics can be used to gather statistics that will be used to estimate the selectivity of the predicates on expressions and functions. Here we replace the column name in GATHER_TABLE_STATS with the UPPER function. This creates a second column statistics entry for the function UPPER(COLOR) in the Data Dictionary. Because we used the GATHER_TABLE_STATS procedure, the statistics are gathered simultaneously.

exec dbms_stats.gather_table_stats(null,’cars’, method_opt =>  ‘for all columns size skewonly for columns (upper(color))’);
 
select table_name, column_name, num_distinct, density, num_buckets, histogram
 from user_tab_col_statistics where table_name = ‘CARS’;
 
TABLE   COLUMN_NAME     DISTINCT     DENSITY      BUCKETS    HISTOGRAM
 CARS    CAR_ID                    34                 .029411765    1                  NONE
 CARS    MAKE                         2                 .014705882    2                  FREQUENCY
 CARS    MODEL                       4                .014705882     4                  FREQUENCY
 CARS    COLOR                    15                 .014705882   15                 FREQUENCY
 CARS    SYS_STU8QPK2       4                 .014705882     4                 FREQUENCY
 CARS    SYS_STUW$NUF      9                 .014705882     9                 FREQUENCY

5. Another query to examine the extended statistics queries the USER_STAT_EXTENSIONS table. This shows all the extensions available by table name, extension name or creator.

select extension_name, extension from user_stat_extensions where table_name=’CARS’;
 
EXTENSION_NAME 
SYS_STU8QPK2S$PEWHARK2CP3#1F#G (“MAKE”,”MODEL”)
 
EXTENSION
 SYS_STUW$NUF6P4X10O$CZLDEXPWTU (UPPER(“COLOR

New Subprograms

DBMS_STATS has three new related subprograms. The DBMS_STATS package has a function called CREATE_EXTENDED_STATS which creates its own entry in the column statistics and returns the name of the entry.

There are some limitations for this function. There is a maximum of 32 columns in a column group and a column cannot appear more than once in a column group. The extension cannot contain a virtual column. An expression must contain at least one column and cannot contain a subquery. These extensions cannot be created on tables owned by SYS.

SELECT DBMS_STATS.CREATE_EXTENDED_STATS(
 OWNNAME   => NULL,
 TABNAME   => ‘CARS’,
 EXTENSION => ‘(MAKE,MODEL)’) from dual;
 
DBMS_STATS.CREATE_EXTENDED_STATS(NULL,’CARS’,'(MAKE,MODEL)’)
 SYS_STU8QPK2S$PEWHARK2CP3#1F#G

The function SHOW_EXTENDED_STATS_NAME will retrieve the system generated name of the entry from Oracle.

SELECT DBMS_STATS.SHOW_EXTENDED_STATS_NAME(
 OWNNAME   => NULL,
 TABNAME   => ‘CARS’,
 EXTENSION => ‘(MAKE,MODEL)’)
 from dual;
 
DBMS_STATS.SHOW_EXTENDED_STATS_NAME()
 SYS_STU8QPK2S$PEWHARK2CP3#1F#G

DROP_EXTENDED_STATS is a procedure and not a function as the other two subprograms. It will drop the statistics entry that was created and the extended statistics will no longer be available.

BEGIN DBMS_STATS.DROP_EXTENDED_STATS(
 OWNNAME   => NULL,
 TABNAME   => ‘CARS’,
 EXTENSION => ‘(MAKE,MODEL)’);
 END;

Using these new subprograms, Oracle now has the ability to gather statistics that will more accurately reflect the selectivity of the predicates than ever before. The new statistics will provide additional information that the optimizer needs to choose better execution plans using more accurate counts of rows.