The clustering factor is a measure of the ordered-ness of an index in comparison to the table that it is based on. It is used to check the cost of a table lookup following an index access (multiplying the clustering factor by index’s selectivity gives you the cost of the operation).
The clustering factor records the number of blocks that will be read when scanning the index. If the index being used has a large clustering factor, then more table data blocks have to be visited to get the rows in each index block (because adjacent rows are in different blocks). If the clustering factor is close to the number of blocks in the table, then the index is well ordered, but if the clustering factor is close to the number of rows in the table, then the index is not well ordered. The clustering factor is computed by the following (explained briefly):
- The index is scanned in order.
- The block portion of the ROWID pointed at by the current indexed valued is compared to the previous indexed value (comparing adjacent rows in the index).
- If the ROWIDs point to different TABLE blocks, the clustering factor is incremented (this is done for the entire index).
The CLUSTERING_FACTOR column in the USER_INDEXES view gives an indication as to how organized the data is compared to the indexed columns. If the value of the CLUSTERING_FACTOR column value is close to the number of leaf blocks in the index, the data is well ordered in the table. If the value is not close to the number of leaf blocks in the index, then the data in the table is not well ordered. The leaf blocks of an index store the indexed values as well as the ROWIDs to which they point.
For example, say the CUSTOMER_ID for the CUSTOMERS table is generated from a sequence generator, and the CUSTOMER_ID is the primary key on the table. The index on CUSTOMER_ID would have a clustering factor very close to the number of leaf blocks (well ordered). As the customers are added to the database, they are stored sequentially in the table in the same way the sequence numbers are issued from the sequence generator (well ordered). An index on the CUSTOMER_NAME column would have a very high clustering factor, however, because the arrangement of the customer names is random throughout the table.
The clustering factor can impact SQL statements that perform range scans. With a low clustering factor (relative to the number of leaf blocks), the number of blocks needed to satisfy the query is reduced. This increases the possibility that the data blocks are already in memory. A high clustering factor relative to the number of leaf blocks may increase the number of data blocks required to satisfy a range query based on the indexed column.
The clustering of data within the table can be used to improve the performance of statements that perform range scan–type operations. By determining how the column is being used in the statements, indexing these column(s) may provide a great benefit.
Leave a Reply