Oracle RAC environments should first be created for testing and proving concepts for how the failover works for different applications. The RAC test environments should continue to be available for testing for production systems because different workloads and failover can’t be tested against a single node. After the installs of the Oracle Clusterware and database nodes, it’s useful to test several different scenarios of failover in order to verify the install as well as to determine how the application fails over.
Create a test list and establish the pieces of the application that need to be part of that testing. Testing should include hardware, interconnect, disk, and operating system failures. Simulations of most of these failures are possible in the environment. Here is an example test list:
- Interconnect Disconnect network cable from network adapter.
- Transaction failover when node fails Connect to one node, run the transaction, and then shut down the node; try with selects, inserts, updates, and deletes.
- Backups and restores Perform an RMAN backup of the database and verify that the database can be restored.
- Loss of a data file or disk Delete a data file or simulate losing access to the storage. This will test how the system will react, and the backup can be used to restore the database.
- Test load balancing of transactions Verify that the services are valid and are allowing for workload balancing.
Using RAC databases on the back end of applications doesn’t necessarily mean that the application is RAC aware. The application as a whole may not fail over, even though the database is failing over current transactions and connections. There might be a small outage when one of the nodes needs to fail over. However, with server calls about the failover, these events can be used to trigger automated processes for reconnecting or restarting application pieces. In previous releases, these are the Fast Application Notification events and can be used for failover and for workload balancing. Now in Oracle 12c, these events are coming from Transaction Guard, which handles the transactions on failover, including providing the application the needed information to know the state of the transaction and how it needs to be handled.
In earlier RAC releases, having different pieces such as reporting tables or materialized views of the application connect to different nodes helped with load balancing in some ways. Oracle Clusterware and the Load Balancing Advisor distribute the workload across the RAC environment more effectively. Along with better workload distribution, Oracle 12c RAC environments provide simplified ways to provision new nodes. You can create policy-managed databases around the workloads, and use the nodes in the cluster for off-loading workloads such as backups and maintenance, a high demand period for an application, or a timezone type of activity based on region. Server pools allow for different sized hardware to be used and added as demand is needed in the RAC environment. Categories are assigned to each of the servers so that they can be used according to the different server pools. A minimum number of servers can be required for the pool, and servers can be moved from one pool to another. These resources remain transparent to the applications, and the application connects via a service to the available server pools. The services are designed to be integrated with several areas of the database, and not just CPU resources.
What is even better about the server pools is that there is a way to do “what if” scenarios to assess the impact of downtime on different nodes, validate policies, and plan for nodes being added, moved, or removed. This is done through the cluster administrator (CRCTL) command or in a DBA service (SRVCTL) command with the parameter of EVAL.
As mentioned with the RAC environment, Automatic Storage Management (ASM) is both the file and disk manager for the Oracle database files. Now you might be thinking that this seems like a more advanced topic, and possibly too detailed for a beginner’s guide. However, ASM is an important component in a highly available database environment and for addressing performance issues and management of
Ask the Expert
Q: Is tuning an RAC environment different from tuning single instances?
A: When tuning the RAC environment, the beginning steps are the same as with a single instance. The next step is to look at interconnect performance and issues. The tools available to tune single instances are RAC aware, such as Automatic Workload Repository (AWR) reports, which detail the information collected about performance statistics, waits, and long running SQL statements, along with other details about how the database resources are being used. Oracle Enterprise Manager also provides good insight into issues, enabling you to see things at a cluster level. The dictionary views are available at the node level as well as at the global level, so for RAC environments, look for the V$ and GV$ views.
Oracle files. Even after years of working with applications and databases, there is always I/O contention, and reading and writing to disk is a main part of what a database does. So, database administrators end up understanding more about disks, mirroring, and striping than they might really want to know. Debating different RAID strategies and optimizing I/O may seem like intermediate topics, but even some of these areas are handled by ASM and have simplified the management of the Oracle disk and file needs.
Previous to Oracle 12c, ASM shared storage connectivity, Global Lock Manager had intense communication during node failure, and there was instance overhead on each node. Now 12c Flex ASM provides a more stable environment that is a set of clusters to provide the storage to ASM clients. The ASM clients can use network connectivity to access the storage, and if an ASM instance fails, the other nodes in the ASM cluster take over and allow all of the databases to continue to access the ASM cluster. There is no longer the need to have an ASM instance on every database node.
With this new flexible architecture, there are more reasons to have an ASM cluster supporting the databases. As you can see, there is even more stability in providing highly available systems. The database data files are more easily managed and are more centralized in the ASM cluster. In large databases, the number of data files for an instance can grow out of control; even a tablespace for a large environment can become unmanageable. Then, any disk migrations or moving of tablespaces becomes a very difficult task and leaves areas of vulnerabilities open due to the sheer number of data files. ASM manages the files using disk groups, and disks are added to the disk groups even while the database is open and running and the ASM instance is being accessed.