Family Trees and CONNECT BY in Oracle 12c

on July 30, 2015


One of Oracle’s more interesting but little used or understood facilities is its CONNECT BY clause. Put simply, this method is used to report, in order, the branches of a family tree. Such trees are encountered often—the genealogy of human families, livestock, horses; corporate management, company divisions, manufacturing; literature, ideas, evolution, scientific research, theory; and even views built on views.

The CONNECT BY clause provides a means to report on all of the family members in any of these many trees. It lets you exclude branches or individual members of a family tree, and it allows you to travel through the tree either up or down, reporting on the family members encountered during the trip.

The earliest ancestor in the tree is technically called the root node. In everyday English, this is called the trunk. Extending from the trunk are branches, which have other branches, which have still other branches. The forks where one or more branches split away from a larger branch are called nodes, and the very end of a branch is called a leaf, or a leaf node. Figure 1 shows a picture of such a tree.

f0267-01

FIGURE 1. Eve’s descendants

The following is a table of cows and bulls born between January 1900 and October 1908. As each offspring is born, it is entered as a row in the table, along with its sex, parents (the cow and bull), and birth date. If you compare the cows and offspring in this table with Figure 1, you’ll find they correspond. Eve has no recorded cow or bull parent because she was born on a different farm, and Adam and Bandit are bulls brought in for breeding, again with no parents in the table.

p0268-01

Next, a query is written to illustrate family relationships visually. You do this using LPAD and a special column, known as a pseudo-column, called LEVEL, that you use with CONNECT BY. LEVEL is a number, from 1 for Eve to 5 for Della, that is really the generation. If Eve is the first generation of cattle, then Della is the fifth generation. Whenever the CONNECT BY clause is used, the LEVEL column can be used in the SELECT statement to discover the generation of each row. LEVEL is a pseudo-column like SYSDATE and USER. It’s not really a part of the table, but it is available under specific circumstances. The next listing shows an example of using LEVEL.

The results of this query are apparent in the following table, but why did the SELECT statement produce this? How does it work?

p0268-02

Note that this is really Figure 1 turned clockwise onto its side. Eve isn’t centered, but she is the root node (trunk) of this tree. Her children are Betsy, Cindy, Greta, Mandy, and Poco. Betsy’s children are Ginny, Novi, and Teddi. Ginny’s children are Ruth and Suzy. And Suzy’s child is Della. Mandy also has two children, Duke and Paula.

This tree started with Eve as the first “offspring.” If the SQL statement had said start with Mandy, only Mandy, Duke, and Paula would have been in the output. START WITH defines the beginning of that portion of the tree that will be displayed, and it includes only branches stretching out from the individual that START WITH specifies. START WITH acts just as its name implies.

The LPAD in the SELECT statement is probably somewhat confusing.

p0269-01

That is, take the specified string and left-pad it for the specified length with the specified set of characters. If no set is specified, left-pad the string with blanks.

Compare this syntax to the LPAD in the SELECT statement shown earlier:

p0269-02

In this case, the string is a single character (space, indicated by the literal space enclosed in single quotation marks). Also, 6*(LEVEL–1) is the length, and because the set is not specified, spaces are used. In other words, this tells Oracle to take this string of one space and left-pad it to the number of spaces determined by 6*(LEVEL–1), a calculation made by first subtracting 1 from the LEVEL and then multiplying this result by 6. For Eve, the LEVEL is 1, so 6*(1–1), or 0 spaces, is used. For Betsy, the LEVEL (her generation) is 2, so an LPAD of 6 is used. Thus, for each generation after the first, six additional spaces are concatenated to the left of the OFFSPRING column. The effect is obvious in the result just shown. The name of each OFFSPRING is indented by left-padding with the number of spaces corresponding to its LEVEL or generation.

Why is this done, instead of simply applying the LPAD directly to OFFSPRING? There are two reasons. First, a direct LPAD on OFFSPRING would cause the names of the offspring to be right-justified. The names at each level would end up having their last letters lined up vertically. Second, if LEVEL–1 is equal to 0, as it is for Eve, the resulting LPAD of Eve will be 0 characters wide, causing Eve to vanish:

p0270-01

Therefore, to get the proper spacing for each LEVEL, to ensure that EVE appears, and to make the names line up vertically on the left, use LPAD with the concatenation operator or function not directly on the OFFSPRING column.

Now, how does CONNECT BY work? Look again at Figure 1. Starting with Novi and traveling downward, which cows are the offspring prior to Novi? The first is Betsy, and the offspring just prior to Betsy is Eve. Even though it is not instantly readable, the clause

p0270-02

tells Oracle to find the next row in which the value in the COW column is equal to the value in the OFFSPRING column in the prior row. Look at the contents of the BREEDING table and you’ll see that this is true.

Related Posts

Leave a Reply