The Importance of Media Backups in an Oracle MAA

By: Scott Jesse, Bill Burton, Bryan Vongray

An analysis of backup and recovery service requests that are opened with Oracle Support Services revealed an interesting point: When backups existed for recovery, a full restore of the entire database was initiated 40 percent of the time. In other words, when a hardware failure or a data corruption occurred, the DBA initiated a restore of every datafile in the database. Certainly, in some cases, a full restore of the database may be required due to the nature of the problem. But a survey of the reported issues showed that, in most cases, recovery of a single datafile, subset of datafiles or set of data blocks (yes, data blocks) would have sufficed to resolve the issue, saving hours of lost time with the entire database down.

Why restore the entire database? Well, typically, this is a trained reaction by the DBA who has been conditioned to believe that there is only one method to recover the database, and that is to recover the entire database. Quite often, these database restores are scripted to restore the entire database, and the DBA is simply dot-slashing a shell script that was built back in the days of Oracle 7. (To be honest, I have a few sets of my own scripts from back in the old days.)

A defense of this technique is very simple. No “gray area” surrounds restore and recover decisions. A clear roadmap exists to completion of the recovery, after which the database will be up and running. But if that were really the case, if it were that simple, why do DBAs keep calling Oracle Support for help?

First, we must accept the fact that recovery situations are sticky, sweaty, nervous times for the DBA. Business is at a standstill, and dollars are being lost by the minute. When caught in this situation, a DBA will often go with what he or she understands best—a vanilla, full database restore looks very appealing, regardless of downtime exceptions. So what is to be done?

First, we must realize that media backups are a required component of any MAA implementation. Oracle RAC provides scalability, performance, and protection against node failure, but Oracle RAC nodes still share the same datafiles, so the possibility of datafile corruption still impacts all nodes in the cluster. Oracle Data Guard guarantees our system against site failure and complete disaster scenarios, but Oracle Data Guard failover is expensive, and we generally try to avoid it at all costs.

Do we failover when a single datafile goes belly up or when a single database block has become corrupt? The correct answer here is absolutely not. Why burden ourselves and our coworkers by having to repoint application servers, reconfigure database jobs (such as backups), reconfigure system monitoring, and so on, not to mention the fact that we probably have to take a full outage for the majority of these tasks to occur. This is the exact niche where a proper database backup and recovery strategy fits, like that last missing piece of the puzzle. For that piece to fit properly, we must first instill in our minds that a full database restore and recovery must be avoided at all costs!

For MAA environments requiring those famous “5 9s,” a full database restore from backup is the kiss of death. Better to use Oracle Data Guard to failover to the standby system, and then reinstate the primary, than to waste valuable time with a full database restore and recovery. But having a sound backup strategy means having access to files for restoration of individual datafiles or even single data blocks. Given a specific recovery scenario, the best approach may be a datafile or block-level recovery instead of an Oracle Data Guard failover. Given the recoverability enhancements in 11g, restoration of database availability can be achieved faster than ever before, which leads us to Oracle Recovery Manager.

Oracle Recovery Manager (RMAN) is a requirement for any true MAA implementation. RMAN is no longer the painful little utility best eschewed by seasoned DBAs empowered with tried-and-true shell scripts. RMAN now comes equipped with the kind of functionality that makes it a critical component in an overall MAA implementation. No other backup utility packs the features that RMAN has to offer. RMAN has been developed specifically to assist in MAA environments, so it not only integrates with Oracle RAC, Oracle Data Guard, and Flashback, but it complements the entire MAA stack, making all components greater than their sum. Did I mention it is free?

Leave a Reply