The Oracle “log file sync” wait event is triggered when a user session issues a commit (or a rollback). The user session will signal or post the LGWR to write the log buffer to the redo log file. When the LGWR has finished writing, it will post the user session. The wait is entirely dependent on LGWR to write out the necessary redo blocks and send confirmation of its completion back to the user session. The wait time includes the writing of the log buffer and the post, and is sometimes called “commit latency”.
The P1 parameter in <View:V$SESSION_WAIT> is defined as follows for the log file sync wait event:
P1 = buffer# All changes up to this buffer number (in the log buffer) must be flushed to disk and the writes confirmed to ensure that the transaction is committed and will be kept on an instance crash. The wait is for LGWR to flush up to this buffer#.
Reducing Oracle waits / wait times
If a SQL statement is encountering a significant amount of total time for this event, the average wait time should be examined. If the average wait time is low, but the number of waits is high, then the application might be committing after every row, rather than batching COMMITs. Oracle applications can reduce this wait by committing after “n” rows so there are fewer distinct COMMIT operations. Each commit has to be confirmed to make sure the relevant REDO is on disk. Although commits can be “piggybacked” by Oracle, reducing the overall number of commits by batching transactions can be very beneficial.
If the SQL statement is a SELECT statement, review the Oracle Auditing settings. If Auditing is enabled for SELECT statements, Oracle could be spending time writing and commit data to the AUDIT$ table.
If the average wait time is high, then examine the other log related waits for the session, to see where the session is spending most of its time. If a session continues to wait on the same buffer# then the SEQ# column of V$SESSION_WAIT should increment every second. If not then the local session has a problem with wait event timeouts. If the SEQ# column is incrementing then the blocking process is the LGWR process. Check to see what LGWR is waiting on as it may be stuck.
If the waits are because of slow I/O, then try the following:
- Reduce other I/O activity on the disks containing the redo logs, or use dedicated disks.
- Try to reduce resource contention. Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.
- Alternate redo logs on different disks to minimize the effect of the archiver on the log writer.
- Move the redo logs to faster disks or a faster I/O subsystem (for example, switch from RAID 5 to RAID 1).
- Consider using raw devices (or simulated raw devices provided by disk vendors) to speed up the writes.
- See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options in order to reduce the amount of redo being written.
- See if any of the processing can use the COMMIT NOWAIT option (be sure to understand the semantics of this before using it).
- Check the size of the log buffer as it may be so large that LGWR is writing too many blocks at one time.
Log file sync wait event: other considerations
There may be a problem with LGWR’s ability to flush redo out quickly enough if Oracle “log file sync” waits are significant for the entire system. The overall wait time for “log file sync” can be broken down into several components. If the system still shows high “log file sync” wait times after completing the general tuning tips above, break down the total Oracle wait time into the individual components. Then, tune those components that take up the largest amount of time.
The “log file sync” wait event may be broken down into the following components:
1. Wakeup LGWR if idle
2. LGWR gathers the redo to be written and issues the I/O
3. Wait time for the log write I/O to complete
4. LGWR I/O post processing
5. LGWR posting the foreground/user session that the write has completed
6. Foreground/user session wakeup
Tune the system based on the “log file sync” component with the most wait time. Steps 2 and 3 are accumulated in the “redo write time” statistic. (i.e. as found under STATISICS section of Statspack) Step 3 is the “log file parallel write” wait event. (See Metalink Note 34583.1:”log file parallel write”) Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take some time for the OS to schedule it to run.
Data Guard note
If Data Guard with synchronous transport and commit WAIT defaults is used, the above tuning steps will still apply. However step 3 will also include the network write time and the redo write to the standby redo logs. The “log file sync” wait event and how it applies to Data Guard is explained in detail in the MAA OTN white paper – Note 387174.1:MAA – Data Guard Redo Transport and Network Best Practices.
When a user session waits on the “log file sync” event, it is actually waiting for the LGWR process to write the log buffer to the redo log file and return confirmation/control back to it. If the total wait time is significant, review the average wait time. If the average wait time is low but the number of waits is high, reduce the number of commits by batching (or committing after “n”) rows.
If slow I/O, investigate the following:
- Reduce contention on existing disks.
- Put log files on faster disks.
- Put alternate redo logs on different disks to minimize the effect archive processes (log files switches).
- Review application design, use NOLOGGING operations where appropriate, and avoid changing more data than required.
If wait times are still significant, review each component of the “log file sync” and tune separately.