Storage indexes build on the smart scan functionality by dividing the physical table data into storage regions and tracking the highest and lowest values for each storage region. When a SELECT query is run on the table, the smart scan code can compare the saved high and low values with the where clause from the SQL, and if the WHERE clause excludes all the rows of the storage region, it can avoid reading the rows entirely.
Unlike row and column projections, storage indexes have an additional advantage in that they actually avoid disk I/O entirely, as opposed to simply reducing data transfers to database servers.
Because storage indexes are based on physical regions on disk, they depend on the on-disk data having some sort of physical order. Fortunately, many real-world data sets have this ordering naturally, especially based on the timestamp of the data load. Table 1 shows an example of how storage indexes might be laid out for BIG_TABLE if each storage region contained 100 rows.
Storage Indexes in Action
Again here we build on the previous example, but we look only at values of NUM over 900,000. Because all the rows with these row numbers are physically located at the end of the table, we can use storage indexes to help us. There’s no special action required to create storage indexes. They were already created when we selected the first count from BIG_TABLE.
The count should return 10000 because we’re selecting 10 percent of the last 100,000 rows of the table:
The data volume returned to storage servers has dropped again, from 14MB to 2.5MB.
Leave a Reply