Increase Oracle Database Availability with Fast-Start Failover

By: Scott Jesse, Bill Burton, Bryan Vongray


Fast-Start Failover is a feature that allows the Oracle Data Guard broker to failover a failed primary database automatically to a predetermined standby database. This feature increases the availability of the database by eliminating the need for DBA involvement as part of the failover process.

Let’s consider what happens in a typical failover scenario: First, the DBA will receive a notification that the primary database has failed. At that point, the DBA must log into the system and execute the required commands to failover the database and if possible reinstate that failed primary as a standby. At best, we are looking at a 15 minute outage—that is, if a DBA is on site, monitoring every move of the standby. Fast-Start Failover enables the broker to perform these tasks automatically in the same time that it would typically take a DBA to log into the system.

The key to this feature is a monitoring process appropriately named the Observer. The Observer is a component of the DGMGRL interface that is configured on a system outside the systems actually running the Oracle Data Guard configuration, which monitors the availability of the primary database. Should it detect that the primary database (all instances in an Oracle RAC environment) has become unavailable or a connection with the primary database is not able to be made, it will issue a failover after waiting the number of seconds specified by the FastStartFailoverThreshold property. Under this circumstance for failover, the Oracle Data Guard Broker will attempt to reinstate the failed primary database as a standby when connectivity is re-established. Note that Fast-Start Failover will not be initiated when a database is shut down using normal, transactional, or immediate options.

Oracle 11g introduced user-configurable failover conditions. When one of the following user configurable conditions are met, the broker will bypass the FastStartFailoverThreshold , immediately initiating failover:

  • Datafile Offline Failover is initiated if a datafile on the primary database experiences an I/O error resulting in a datafile being taken offline. This option is enabled by default.
  • Corrupted Dictionary Failover is initiated if corruption of a critical database object is found. This option is enabled by default.
  • Corrupted Controlfile Enabled by default, the detection of controlfile corruption will result in immediate failover.
  • Inaccessible Log File This parameter, disabled by default, allows for failover to be initiated in the event that LGWR is unable to write to a member of a log group.
  • Stuck Archiver Failover is initiated should the archiver on the primary database become hung. The default setting of this parameter is disabled.
  • Application Induced Failover This type of failover is induced by calling the dbms_dg.initiate_fs_failover function, allowing for applications to invoke the failover operation.

When failover is induced from one of these user-configurable conditions, the failed primary database will be left in the down state. Reinstatement of the failed primary database will not be attempted by the broker.

To enable Fast-Start Failover, your Oracle Data Guard configuration must meet the following requirements:

  • Flashback Database must be enabled for both the primary and standby database target.
  • An observer server (server outside of the Oracle Data Guard configuration) should be established to monitor the environment. This requires installation of the Oracle Database 11g Release 2 RDBMS binaries or the 11g Release 2 Administrator Client binaries on the monitoring server.

One notable change from 10g is that the requirement to run in Maximum Availability or Maximum Protection has been lifted. To control the amount of data loss when in a Maximum Performance configuration, you should set the FastStartFailoverLagLimit property to the number of seconds of data you are willing to lose. (I know, you are not willing to lose any data. If that were truly the case, you wouldn’t be running in Maximum Protection mode.)

The following are the high-level steps necessary to implement Fast-Start Failover for your Oracle Data Guard configuration:

  1. Establish connectivity to primary and standby database sites from the system that will be running the Observer.
  2. If you have multiple standby databases in your Oracle Data Guard configuration, you must set the FastStartFailoverTarget property on the primary and standby databases. We are promoting good habits here, so even though we have a single standby database in our configuration, we will set this parameter. The setting for the primary database will be the target standby database. The setting for the target standby database will be the primary database. This property can be set as follows:
    DGMGRL> edit database 'pitt' set property 'FastStartFailoverTarg
     et'='cosp';
     DGMGRL> edit database 'cosp' set property 'FastStartFailoverTarg
     et'='pitt';
  3. Configure the Fast-Start Failover database properties to meet the needs of your application. Available properties for Fast-Start Failover are as follows:
  • FastStartFailoverThreshold Used to specify the number of seconds to delay failover after the detection of a primary database failure. This parameter defaults to 30 seconds.
  • FastStartFailoverPmyShutdown In its default setting of true, this parameter causes the primary database to shut down when FastStartFailoverThreshold has been reached for a particular database. This parameter setting is ignored in the case of a user-configurable condition failover.
  • FastStartFailoverLagLimit Allows for definition of the number of seconds the standby database is able to fall behind. When this threshold is exceeded, automatic failover will not be allowed.
  • FastStartFailoverAutoReinstate When set to its default of true, this parameter enables the automatic reinstatement of a failed primary database as a standby. Automatic reinstatement is not possible for user-configurable failover conditions regardless of this parameter setting.

These parameters can be set on the primary and standby databases using the edit database set property command; here’s an example:

DGMGRL> edit database 'cosp' set property
 ' FastStartFailoverLagLimit '='60';

4. Configure the desired Fast-Start Failover user configurable conditions. As previously stated, these conditions include Datafile Offline, Corrupted Dictionary, Corrupted Controlfile, Inaccessible Logfile, and Stuck Archiver. These user-configurable conditions can be set as follows:

DGMGRL> ENABLE FAST_START FAILOVER CONDITION 'Inaccessible
 Logfile';

To unset a user configurable condition, do this:

DGMGRL> DISABLE FAST_START FAILOVER CONDITION 'Inaccessible
 Logfile';

To view the current settings for the user configurable conditions, use the show command as follows:

DGMGRL> show fast_start failover

5. Start the Observer on the designated server. The Observer should be run on a system within the same network segment that the application or application middle tiers run on to ensure that the Observer and the application have the same view of the database (in terms of connectivity). The requirement to run the Observer on a server other than the primary or standby systems is to have the 11g Release 2 RDBMS binaries or 11g Release 2 Administrator Client binaries on the system and to define tnsnames.ora entries for connectivity to the databases participating in the configuration. Multiple Observers can run from a single 11gRelease 2 (Admin Client or RDBMS) installation as long as the configuration file is uniquely identified on startup.

DGMGRL> start observer file='/tmp/fsfo_mydg.dat'

The Observer is a foreground process; therefore, control will not be returned to the user until the Observer has been stopped. For this reason, it is recommended that the Observer be run in the background and that its actions are logged to a file. On a Linux/Unix system, you can do the following:

# nohup dgmgrl -logfile/tmp/fsfo_mydg.log sys/oracle123@pitt
”start observer file='/tmp/fsfomydg.dat'” &

How do we make the Observer resilient towards failure to enhance our MAA goals? The answer is Grid Control (or custom scripts if you are so inclined), which gives us out-of-the-box functionality to restart a failed Observer as well as the ability to failover an Observer to an alternate host. Should you not be running Grid Control to provide high availability for the Observer, don’t worry, because a failed Observer is not a show-stopper for your overall Oracle Data Guard configuration. In such a situation, the broker will report a ORA-16658 or ORA-16820, letting you (the MAA DBA) know that Fast-Start Failover is not operational and manual intervention is required.

6. Finally, we enable Fast-Start Failover for the configuration:

DGMGRL> enable fast_start failover

You are probably wondering why we took the time and effort to dive so deeply into Oracle Data Guard without the broker when the broker clearly makes things much more simple while enhancing availability. Understanding the details of how Oracle Data Guard works is a valuable asset when you’re troubleshooting issues in high-pressure situations. If we explained a switchover process as “switchover to standby,” you would never see what steps are actually performed. Should the broker not be available for one reason or another, you will know what to do. Should the broker throw an error, you will know how to resolve it. Those who are successful are those who have prepared.

Comments

Leave a Reply