Clustering is a method of storing tables that are intimately related and often joined together into the same area on disk. For example, instead of the BOOKSHELF table being in one section of the disk and the BOOKSHELF_AUTHOR table being somewhere else, their rows could be interleaved together in a single area, called a cluster. The cluster key is the column or columns by which the tables are usually joined in a query (for example, TITLE for the BOOKSHELF and BOOKSHELF_AUTHOR tables). To cluster tables, you must own the tables you are going to cluster together.
The following is the basic format of the CREATE CLUSTER command:
The clustername follows table-naming conventions, and column datatype is the name and datatype you will use as the cluster key. The column name may be the same as one of the columns of a table you will put in this cluster, or it may be any other valid name. Here’s an example:
This creates a cluster (disk space is set aside, as it would be for a table) with nothing in it. The use of COL1 for the cluster key is irrelevant; you’ll never use it again. However, its definition should match the primary key of the table to be added. Next, tables are created to be included in this cluster:
Prior to inserting rows into BOOKSHELF, you must create a cluster index:
Recall that the presence of a CLUSTER clause here precludes the use of a TABLESPACE or STORAGE clause. Note how this structure differs from a standard CREATE TABLE statement:
In the first CREATE TABLE statement, the cluster BOOKANDAUTHOR (TITLE) clause follows the closing parenthesis of the list of columns being created in the table. BOOKANDAUTHOR is the name of the cluster previously created. TITLE is the column in this table that will be stored in the cluster key COL1. It is possible to have multiple cluster keys in the CREATE CLUSTER statement and to have multiple columns stored in those keys in the CREATE TABLE statement. Notice that nowhere does either statement say explicitly that the TITLE column goes into the COL1 cluster key. The match-up is done by position only: COL1 and TITLE were both the first objects mentioned in their respective cluster statements. Multiple columns and cluster keys are matched first to first, second to second, third to third, and so on. Now a second table is added to the cluster:
When these two tables are clustered, each unique title is actually stored only once, in the cluster key. The columns from both of these tables are attached to each title. The data from both of these tables is actually stored in a single location, almost as if the cluster were a big table containing data drawn from both of the tables that make it up.
An additional cluster option, a hash cluster, uses the cluster column values to determine the physical location in which the row is stored. See the entry for the CREATE CLUSTER command in the Alphabetical Reference.