How to Determine If the Oracle Redo Log File Size Is a Problem

By Richard Niemiec on April 19, 2013


Two potential problems are possible that should be addressed when considering whether to increase the size of Oracle log files: batch jobs that don’t have enough total redo space to complete and long-running jobs that are spending a large amount of time wsitching online redo logs.

The first concerns batch jobs that do not have enough total redo space to complete or are so fast that the online redo logs wrap (cycle through all the logs and start writing to the first one again) before they are archived to the offline redo logs. Because an online redo log cannot be overwritten until it is archived (when archiving is enabled), DML and DDL activity has to wait until an online log becomes available. By listing the online redo logs with their last update date and time at the operating system level, you can determine how often they are switching. You can also query V$LOG_HISTORY for the last 100 log switches. If you increase the size of the online redo logs, it may provide the space for large batch jobs doing large INSERT, UPDATE, and DELETE transactions. A better solution may be to increase the number of online redo logs so the additional space is provided while also having a frequent log switch (smaller but more online redo logs).

The second potential problem concerns long-running jobs that are spending a large amount of time switching online redo logs. Long-running jobs are often much faster when the entire job fits into a single online redo log. For the online transaction processing (OLTP) type of environment, smaller online redo logs are usually better. My rule of thumb is for online redo logs to switch every half hour (not counting the long-running batch jobs that shorten this time). By monitoring the date and time of the online redo logs at the operating system level (or querying V$LOG_HISTORY), you can determine whether to increase the size or number of online redo logs to reach an optimum switching interval.

Here is a query that shows you the time between log switches. It can be handy in determining if you have a problem:

select  b.recid,
        to_char(b.first_time, ’dd-mon-yy hh:mi:ss’) start_time, 
        a.recid,
        to_char(a.first_time, ’dd-mon-yy hh:mi:ss’) end_time,
        round(((a.first_time-b.first_time)*25)*60,2) minutes
from    v$log_history a, v$log_history b
where   a.recid = b.recid + 1
order   by a.first_time asc

Related Posts

Comments

  1. Everything in this post is text that can be copied, except the SQL … the only thing one would want to copy.
    Thanks

  2. select b.recid,
    to_char(b.first_time, ’dd-mon-yy hh:mi:ss’) start_time,
    a.recid,
    to_char(a.first_time, ’dd-mon-yy hh:mi:ss’) end_time,
    round(((a.first_time-b.first_time)*25)*60,2) minutes
    from v$log_history a, v$log_history b
    where a.recid = b.recid + 1
    order by a.first_time asc

  3. Not sure why you are getting negative numbers? What version of Oracle are you running? What is the size of your log files? Here is another query that you can try that will tell you how may log switches per hour and per day. You will need to ‘alter session set nls_date_format = ‘mm/dd/yy hh24′ before you run it:

    WITH row_source AS
    ( select
    trunc(first_time) DAY,
    trunc(first_time,’HH24′) DAY_HOUR,
    count(*) LOG_SWITCHES_PER_HOUR
    from v$log_history
    group by trunc(first_time) , trunc(first_time,’HH24’) )
    select DAY, DAY_HOUR, LOG_SWITCHES_PER_HOUR,
    sum(LOG_SWITCHES_PER_HOUR) OVER (PARTITION BY DAY ORDER BY DAY_HOUR) “TOTAL_SWITCHES_SOFAR_TODAY”
    from row_source
    order by 1,2
    /

  4. Thanks for the article. The query to obtain the time between log file switches could be re-written in a more efficient way (with analytic functions, thus getting rid of the self join):

    select
    recid,
    to_char(first_time, ‘dd-mon-yy hh:mi:ss’) start_time,
    to_char(lead(first_time) over (order by recid), ‘dd-mon-yy hh:mi:ss’) as end_time,
    round((lead(first_time) over (order by recid) – first_time)*60, 2) as minutes
    from v$log_history a
    order by a.first_time asc

  5. Hello,
    I’m kind out of topic, but I take the chance to ask.
    I am running an environment where I have configured Streams. My problem is that when the connection between the sites goes down, the archive logs are growing exponentially, during the connection fail. I have put this on the Capture Queue Spill table which is starting to fill, but I’m not sure. Is this possible?

  6. Hi Dan,
    If you are using the Capture Queue Spill table to buy you some time in the event of a failure, you’ll need to ensure that you’ve sized the buffered queue appropriately as well as ensure that you have enough space to spill to the table in the event the outage is over a long duration. The number of spilled messages should be kept as low as possible for performance purposes. Also, a high number of spilled messages can result in a problem with an apply process that applies messages captured by the capture process. When this happens, the number of messages can build in a queue because they are not being consumed. In the case of an outage, make sure the relevant propagations and apply processes are enabled. If you haven’t already, you may want to create an alert to watch the queue to make sure the apply processes are working/running.

    Also in the event of a failure, the streams pool may be too small to hold the captured messages. In this case, increase the size of the streams pool and make sure you configure Automatic Shared Memory Management to manage the size of the Streams pool automatically. Also, bump up the SGA_TARGET initialization parameter to use Automatic Shared Memory Management if you haven’t already done this. Hope this helps.

Leave a Reply