[…] Implementing Oracle RAC on Extended … – NOTE. For more information on stretch clusters, consult the Oracle white paper, “Oracle Real Application Clusters on Extended Distance Clusters,” by Erik Peterson … […]
A special implementation of Oracle RAC lets you add an extended distance cluster, also called a stretched cluster, metro cluster, campus cluster, or geo cluster. With an extended distance cluster, components are deployed across two or more data center locations, allowing them to continue to function if one location fails. In normal operation, all nodes at all locations will be active. The distance for an extended Oracle RAC is determined by the type failure against which the Oracle RAC should be protected.
To make the Oracle RAC survive a fire in the server room, for example, it would be sufficient to have half of the nodes and the shared storage mirror in another server room, in another fire protection sector. To protect the Oracle RAC against an airplane crash or a terrorist attack, the distance between the Oracle RAC parts should be increased to at least a couple of miles. When making the decision regarding how far to stretch the cluster, remember that every mile adds latency for the network and the I/O operations, which can result in slow performance on a daily basis; this might not be a worthy tradeoff against the rare chance of losing the entire data center.
In most cases, Oracle Data Guard might be the better choice for protection. The new features of Oracle Data Guard integrated with Fast Start Fail Over (FSFO) will provide near seamless failover. In this section, we will look at the challenges of implementing an extended distance cluster.
Stretching a cluster
In considering how far you can stretch a cluster, remember physics classes and recall how fast light can travel in a vacuum. For our calculation, we don’t need the exact number (186,182 miles per second), because in optical fiber, the speed is slowed by approximately 30 percent and additional delays are introduced by electronic switches, for example. In addition, consider that any message requires a confirmation, so round-trip distances must be used for calculations.
We round up to 200 miles per millisecond (note we are now using milliseconds as opposed to seconds) and divide it by two because of the round trip. The result shows that a 100 mile distance of separation between nodes adds 1 millisecond delay. Being conservative, we add our 30 percent optical fiber delay, plus the additional delay for the switches and so on and end up using a value of 50 percent of that round trip delay; this results in a rule of thumb of 1 millisecond delay for every 50 miles of distance. So, for example, stretching a cluster over 1000 miles would result in 20 milliseconds additional latency of network and disk I/O. Normal disk access latency is around 10 milliseconds, so the extended design would triple the disk access latency, resulting in poor performance and other issues. Therefore, we recommend that an extended Distance Cluster architecture fits best when the two data centers are relatively close (equal to or less than 20 miles apart).
When selecting the data centers to host an extended Oracle RAC, only locations with existing direct cable connections with dedicated channels should be considered (which can be extremely costly).
Stretching network connections
Interconnect, Storage Area Network (SAN), and IP networking need to be kept on separate channels, each with required redundancy. Redundant connections must not share the same dark fiber (if used), switch, path, or even building entrances. Remember that cables can be cut.
Traditional networks can cope with a distance of approximately 6 miles before a repeater has to be installed. Because the use of repeaters introduces latency, network distances greater than 6 miles require dark fiber with Wavelength Division Multiplexing (WDM). Dark fiber is so named because it emits no light until a signal is added to it. WDM enables a single optical fiber to carry multiple signals simultaneously, thus increasing the capacity of a fiber. Each signal travels within its unique color band, which is modulated by the data. WDM distinguish between Dense Wavelength Division Multiplexing (DWDM) and Coarse Wavelength Division Multiplexing (CWDM) systems. CWDM systems have less than eight active wavelengths per fiber; DWDM systems provide more active wavelengths and can support more than 150 wavelengths, each carrying up to 10 Gbps.
The SAN and Interconnect connections need to be on dedicated point-to-point connections. For SAN networks, make sure you are using SAN buffer credits if the distance is more than 10km. The use of SAN buffer credits is a method of storage distance extension to limit the impact of distance at the I/O layer.
NOTE
Do not implement the Oracle RAC Interconnect over a WAN. This is the same as doing it over the public network, which is not supported.
Shared storage
Although in the network considerations, distance is the only fact that distinguishes a “local” Oracle RAC from an “extended“ Oracle RAC, when it comes to storage, other aspects must be considered. As the cluster will need to be able to survive a complete loss of one data center, the storage must be mirrored to both sides. The purpose of surviving a data center failure would be achieved only if the location hosting the storage survives. As this defeats the purpose of having an extended Oracle RAC, mirroring storage is essential for an extended cluster. The most common technologies for clusters are host-based mirroring, array-based mirroring, and ASM redundancy. ASM redundancy is the only supported storage mirroring for extended Oracle RAC in Oracle Database 11g Release 2.
Array-based mirroring
Array-based mirroring is a primary-secondary storage site solution. All I/O goes to one site and is mirrored to the other side. If a primary storage failure occurs, all nodes will crash and will have to be restarted after the secondary storage is made active. You must be aware as well that the instances from the secondary location will see slower I/O as it writes across the distance.
Host-based mirroring
From an availability viewpoint, host-based mirroring is preferred over array-based mirroring because it does not require a manual restart if one location fails. From a performance perspective, host-based mirroring requires CPU cycles from the host machines. Using host-based mirroring (software) for ASM disks can result in a major performance degradation and is not supported. Hardware mirroring of disks is not affected, so disk arrays that are not controlled by the host OS but by the array controller itself are supported.
ASM
As mentioned, ASM redundancy is the only supported option for the extended cluster shared storage mirroring from Oracle Database 11g Release 2 on. One failure group has to exist on each site. For a two-site setup, this means normal-redundancy disk groups have to be used. High-redundancy disk groups could keep the mirror on the same site and not in the failure group on the remote site. High-redundancy disk groups should be used only when the stretched cluster has three sites. With 11g Release 1, Oracle introduced two ASM improvements that are beneficial for extended cluster configurations: the preferred read feature and the fast disk resync.
ASM Preferred Read Prior to 11g Release 1, ASM has always read the primary copy of a mirrored extent set. For stretched clusters, this resulted in possible reads from the remote site, even though a copy of the extent set existed on the local storage. To overcome this performance issue, Oracle introduced the ASM_PREFERRED_READ_FAILURE_GROUP initialization parameter, which allows you to specify a list of failure groups from which the local instances can read.
ASM Fast Mirror Resync If the connection between the two sites is lost, one of the failure groups will be marked invalid. Restoring the redundancy of an entire ASM failure group is usually a time-consuming process. ASM fast mirror resync significantly reduces the time to resynchronize a failed disk in such situations. ASM fast resync keeps track of pending changes to extents on an offline disk during an outage. The extents are resynced when the disk is brought back online.
Voting disks
The number and location of clusterware voting disks have a much bigger impact in an extended cluster than on a local cluster. You may be aware of the best practice needed to have an odd number of voting disks to ensure that a tie-breaking disk exists when the interconnect between nodes fails, but all voting disks reside on the same storage medium.
With the majority of extended clusters that reside on two sites, the odd number of voting disks introduces the question of where to place the tie-breaking voting disk. If one site hosts two, a primary-secondary setup is introduced. If the site with the majority of voting disks fails, the entire cluster will go down.
To address this issue and follow MAA best practices, a third alternative, independent site is recommended to be used for the stretch cluster. This site hosts the tie-breaking voting disk, which can reside on NFS. Beginning with Oracle Database 11g Release 2, along with the recommendation to store voting disks on ASM, Oracle introduced a failure group type called quorum. Disks in a quorum failure group do not contain user data and are not considered when determining redundancy requirements. During the clusterware installation, you are not offered the option of creating a quorum failure group. Thus, to create a disk group with a fail group on the second site and a quorum fail group on a third location, you must do the following:
SQL> CREATE DISKGROUP VOTE NORMAL REDUNDANCY FAILGROUP fg1 DISK '<a disk in SAN1>' FAILGROUP fg2 DISK '<a disk in SAN2>' QUORUM FAILGROUP fg3 DISK '<disk or file on a third location>' ATTRIBUTE 'compatible.asm' = '11.2.0.0';
NOTE
For more information on stretch clusters, consult the Oracle white paper, “Oracle Real Application Clusters on Extended Distance Clusters,” by Erik Peterson, et al. In addition, refer to the paper “Extended Distance Oracle RAC,” by Mai Cutler, Sandy Gruver, and Stefan Pommerenk. These papers served as a reference for the above section.
Bobby says
Thank You are the article.
One query regarding Extended RAC recovery,I have 2 Sites(A and B) and 1 Quorum Site(C).
My first Site ,Site A, failed and now running on Site B.After few hours Site B is also down and later Site C
.Now I managed to bring up Site A.How Oracle will perform the recovery as this site was down when Site B and C went down