Investigating Oracle RAC Interconnect Performance

on October 1, 2013


The interconnect in an Oracle RAC environment is the backbone of your cluster. A highly performing, reliable interconnect is a crucial ingredient in making Cache Fusion perform well. Remember that the assumption in most cases is that a read from another node’s memory via the interconnect is much faster than a read from disk—except perhaps for Solid State Disks (SSDs). The interconnect is used to transfer data and messages among the instances. An Oracle RAC cluster requires a high-bandwidth solution with low latency wherever possible. If you find that the performance of the interconnect is subpar, your Oracle RAC cluster performance will also most likely be subpar. In the case of subpar performance in an Oracle RAC environment, the interconnect configuration, including both hardware and software, should be one of the first areas you investigate.

UDP buffers

You want the fastest possible network to be used for the interconnect. To maximize your speed and efficiency on the interconnect, you should ensure that the User Datagram Protocol (UDP) buffers are set to the correct values. On Linux, you can check this via the following command:

   sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net
   .core.wmem_default
   net.core.rmem_max = 4194304
   net.core.wmem_max = 1048576
   net.core.rmem_default = 262144
   net.core.wmem_default = 262144

Alternatively, you can read the associated values directly from the respective files in the directory /proc/sys/net/core. These values can be increased via the following SYSCTL commands:

   sysctl -w net.core.rmem_max=4194304
   sysctl -w net.core.wmem_max=1048576
   sysctl -w net.core.rmem_default=262144
   sysctl -w net.core.wmem_default=262144

The numbers in this example are the recommended values for Oracle RAC on Linux and are more than sufficient for the majority of configurations. Nevertheless, let’s talk about some background of the UDP buffers. The values determined by rmem_max and wmem_max are on a “per-socket” basis. So if you set rmem_max to 4MB, and you have 400 processes running, each with a socket open for communications in the interconnect, then each of these 400 processes could potentially use 4MB, meaning that the total memory usage could be 1.6GB just for this UDP buffer space. However, this is only “potential” usage. So if rmem_default is set to 1MB and rmem_max is set to 4MB, you know for sure that at least 400MB will be allocated (1MB per socket). Anything more than that will be allocated only as needed, up to the max value. So the total memory usage depends on the rmem_default, rmem_max, the number of open sockets, and the variable piece of how much buffer space each process is actually using. This is an unknown—but it could depend on the network latency or other characteristics of how well the network is performing and how much network load there is altogether. To get the total number of Oracle-related open UDP sockets, you can execute this command:

   netstat -anp -udp | grep ora | wc -l

NOTE

Our assumption here is that the UDP is being used for the interconnect. Although that will be true in the vast majority of cases, there are some exceptions. For example, on Windows, TCP is used for Cache Fusion traffic. When InfiniBand is in use (more details on InfiniBand are provided later in the section “Interconnect Hardware”), the Reliable Datagram Sockets (RDS) protocol may be used to enhance the speed of Cache Fusion traffic. However, any other proprietary interconnects protocols are strongly discouraged, so starting with Oracle Database 11g, your primary choices are UDP, TCP (Windows), or RDS (with InfiniBand).

Jumbo frames

Another option to increase the performance of your interconnect is the use of jumbo frames. When you use Ethernet, a variable frame size of 46–1500 bytes is the transfer unit used between all Ethernet participants. The upper bound is 1500 MTU (Maximum Transmission Unit). Jumbo frames allows the Ethernet frame to exceed the MTU of 1500 bytes up to a maximum of 9000 bytes (on most platforms—though platforms will vary). In Oracle RAC, the setting of DB_BLOCK_SIZE multiplied by the MULTI_BLOCK_READ_COUNT determines the maximum size of a message for the global cache, and the PARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query. These message sizes can range from 2K to 64K or more, and hence will get fragmented more so with a lower/ default MTU. Increasing the frame size (by enabling jumbo frames) can improve the performance of the interconnect by reducing the fragmentation when shipping large amounts of data across that wire. A note of caution is in order, however: Not all hardware supports jumbo frames. Therefore, due to differences in specific server and network hardware requirements, jumbo frames must be thoroughly tested before implementation in a production environment.

Interconnect hardware

In addition to the tuning options, you have the opportunity to implement faster hardware such as InfiniBand or 10 Gigabit Ethernet (10 GigE). InfiniBand is available and supported by two options. Reliable Datagram Sockets (RDS) protocol is the preferred option because it offers up to 30 times the bandwidth advantage and 30 times the latency reduction over Gigabit Ethernet. IP over InfiniBand (IPoIB) is the other option, which does not do as well as RDS, since it uses the standard UDP or TCP, but it does still provide much better bandwidth and much lower latency than Gigabit Ethernet.

Another option to increase the throughput of your interconnect is the implementation of 10 GigE technology, which represents the next level of Ethernet. Although it is becoming increasingly common, note that 10 GigE does require specific certification on a platform-by-platform basis, and as of the writing of this book, it was not yet certified on all platforms. Check with Oracle Support to resolve any certification questions that you may have on your platform.

Related Posts

Leave a Reply