Internal Workings of the Oracle RAC Systems

By Richard Niemiec on April 21, 2013

Oracle 11g uses Global Cache Services to coordinate activity. A lock is treated as a held resource. RAC is a multi-instance database. Multiple instances access the same database concurrently. In terms of structure, the difference between a RAC instance and a stand-alone Oracle instance is miniscule. Besides all the usual Oracle background processes, many special processes are spawned to coordinate interinstance communication and to facilitate resource sharing among nodes in a cluster. The Oracle documentation goes through all of the processes if you are interested in knowing more. Here is a brief description of some of the main ones:

ACMS The Atomic Controlfile to Memory Service (ACMS) is an agent on a per-instance basis that helps to ensure a distributed SGA memory update is globally committed on success and globally aborted on failure.
LMON The Global Enqueue Service Monitor (LMON) monitors the entire cluster to manage global enqueues and resources. LMON manages instance and process expirations and the associated recovery for the Global Cache Service.
LMD The Global Enqueue Service Daemon (LMD) is the lock agent process that manages enqueue manager service requests for Global Cache Service enqueues to control access to global enqueues and resources. The LMD process also handles deadlock detection and remote enqueue requests.
LMSn These Global Cache Service processes (LMSn) are processes for the Global Cache Service (GCS). RAC software provides for up to ten Global Cache Service processes. The number of LMSn processes varies depending on the amount of messaging traffic among nodes in the cluster. The LMSn processes do these things:
- Handle blocking interrupts from the remote instance for Global Cache Service resources.
- Manage resource requests and cross-instance call operations for shared resources.
- Build a list of invalid lock elements and validate lock elements during recovery.
- Handle global lock deadlock detection and monitor lock conversion timeouts.
LCK0 process The Instance Enqueue Process manages global enqueue requests and cross-instance broadcast. Manages non-cache fusion and library/row cache requests.
RMSn RAC management processes include tasks like the creation of resources as nodes are added.
RSMN Remote Slave Monitor (RSMN) performs remote instance tasks for a coordinating process.
GTX0-j The Global Transaction Process supports global XA transactions.

Global Cache Service (GCS) and Global Enqueue Service (GES)

GCS and GES (which are basically RAC processes) play the key role in implementing Cache Fusion. GCS ensures a single system image of the data even though the data is accessed by multiple instances. The GCS and GES are integrated components of Real Application Clusters that coordinate simultaneous access to the shared database and to shared resources within the database and database cache. GES and GCS together maintain a Global Resource Directory (GRD) to record information about resources and enqueues. GRD remains in memory and is stored on all the instances. Each instance manages a portion of the directory. This distributed nature is a key point for fault tolerance of the RAC.

The coordination of concurrent tasks within a shared cache server is called synchronization. Synchronization uses the private interconnect and heavy message transfers. The following types of resources require synchronization: data blocks and enqueues. GCS maintains the modes for blocks in the global role and is responsible for block transfers between instances. LMS processes handle the GCS messages and do the bulk of the GCS processing.

An enqueue is a shared memory structure that serializes access to database resources. It can be local or global. Oracle uses enqueues in three modes: null (N) mode, share (S) mode, and exclusive (X) mode. Blocks are the primary structures for reading and writing into and out of buffers. An enqueue is often the most requested resource.

GES maintains or handles the synchronization of the dictionary cache, library cache, transaction locks, and DDL locks. In other words, GES manages enqueues other than data blocks. To synchronize access to the data dictionary cache, latches are used in exclusive (X) mode and in single-node cluster databases. Global enqueues are used in cluster database mode.

Cache Fusion and Resource Coordination

Because each node in a Real Application Cluster has its own memory (cache) that is not shared with other nodes, RAC must coordinate the buffer caches of different nodes while minimizing additional disk I/O that could reduce performance. Cache Fusion is the technology that uses high-speed interconnects to provide cache-to-cache transfers of data blocks between instances in a cluster. Cache Fusion functionality allows direct memory writes of dirty blocks to alleviate the need to force a disk write and reread (or ping) of the committed blocks. This is not to say that disk writes do not occur; disk writes are still required for cache replacement and when a checkpoint occurs. Cache Fusion addresses the issues involved in concurrency between instances: concurrent reads on multiple nodes, concurrent reads and writes on different nodes, and concurrent writes on different nodes.

Oracle only reads data blocks from disk if they are not already present in the buffer caches of any instance. Because data block writes are deferred, they often contain modifications from multiple transactions. The modified data blocks are written to disk only when a checkpoint occurs. Before I go further, you need to be familiar with a couple of concepts introduced in Oracle 9i RAC: resource modes and resource roles. Because the same data blocks can concurrently exist in multiple instances, there are two identifiers that help to coordinate these blocks:

Resource mode The modes are null, shared, and exclusive. The block can be held in different modes, depending on whether a resource holder intends to modify data or merely read it.
Resource role The roles are locally managed and globally managed.

Global Resource Directory (GRD) is not a database. It is a collection of internal structures and is used to find the current status of the data blocks. Whenever a block is transferred out of a local cache to another instance’s cache, the GRD is updated. The following information about a resource is available in GRD:

Data Block Identifiers (DBA)
Location of most current versions
Data blocks modes (N, S, X)
Data block roles (local or global)

Past Image

To maintain data integrity, a new concept of past image was introduced in the 9i version of RAC. A past image (PI) of a block is kept in memory before the block is sent and serves as an indication of whether it is a dirty block. In the event of failure, GCS can reconstruct the current version of the block by reading PIs. This PI is different from a CR block, which is needed to reconstruct read-consistent images. The CR version of a block represents a consistent snapshot of the data at a point in time.

For example, Transaction-A of Instance-A has updated row-2 on block-5, and later another Transaction-B of Instance-B has updated row-6 on the same block-5. Block-5 has been transferred from Instance-A to B. At this time, the past image (PI) for block-5 is created on Instance-A.