What does high availability mean to the business? What is the level of risk tolerance? How much data loss is acceptable? Are there current issues with backups or reporting? These are all questions that need to be asked to start mapping out the components that are needed. You may decide that absolutely no data loss can be tolerated or, alternatively, that it is fine if the application is down for a day or two.
Also, it helps to look at what kind of outages can happen and then build in fault tolerance for these situations. Examples of unplanned outages are hardware failures, such as disk or server failures; human error, such as dropping a data file or making a bad change; and network and site failures. Then, add on to these examples the planned outages needed for applying patches, database changes and migrations, and application changes that might include table and database object changes and upgrades. Look for the areas in the system with single points of failure and then match up solutions to start to eliminate those areas.
This chapter just touches on a couple of areas necessary for building a highly available environment: Real Application Clusters, Automatic Storage Management, Data Guard, and Transaction Guard. Transaction Guard is the newest tool from these components being introduced with Oracle Database 12c. Understanding these components, plus researching other Oracle options such as Flashback Query, Transaction and Database, Flash Recovery Area, Data Recovery Advisor, and Secure Backups, will assist in synching the environment with the business needs in the area of availability. The goal of these options is to achieve Maximum Availability Architecture (MAA), and it is important to understand the options available and what they can provide.
Depending on application and the business needs, if there are planned outages for maintenance to allow for downtime to patch the environment, rolling patches might not be as much of a concern. Instead, testing application changes as well as the patches might be possible via Flashback technologies or a production-like server. If the business doesn’t allow for downtime or a regular maintenance window, and you know each minute down will cost the company a serious amount of money, you can use a combination of components for the solution: rolling patches, prevention of outages from hardware failures, having failover servers through clusters, and Data Guard.
Working with the business teams and having some understanding of different options available for architecting a solution that meets budget restrictions and business needs will take some discussions and planning. Policies can also be defined to manage the server pools and allow for flexibility in the highly available architecture.
Working with the application team enables you to take advantage of the application continuity and the use of the logical transaction identifier (LTXID). Using the globally unique identifier will allow a transaction to be submitted only once. If there was a failure of the database during a transaction, the LTXID would be used to recover the transaction, and the application can be coded to handle the transactions if it is to be resubmitted based on the states.
Leave a Reply