This article talks about the logs that are included in Oracle 11g Release 2. In Oracle 11g Release 1, RDBMS instances, listeners, and so on, moved to using the Automatic Diagnostic Repository (ADR). The default repository home location is the ORACLE_BASE/diag directory, and the various process types have directories under that, such as asm, rdbms, and tnslsnr. When an error occurs, the ADR framework creates an incident and gives you the option of packaging up that incident using the ADR Command Interpreter (ADRCI) to send to Oracle Support.
Although I will not go into ADR here, because ASM is now under our GI home and may be a critical part of the clusterware, you need to know where to go for troubleshooting. The clusterware has not as yet followed the ADR model; however, the tracing has evolved significantly since 10 g Release 1. Most of the trace files are fairly low level and for use by Oracle Support to diagnose problems—only the cluster alert log is meant for user consumption in the same way as the RDBMS alert log is.
It is, of course, never a bad thing to look at the traces to get an understanding of what is going on so long as you don’t get paranoid about the contents and log Support Requests for every line that does not look right. (I guess I’m saying don’t go looking too hard for problems in these logs because many of the messages look like a problem but are quite normal.)
The cluster alert log is a good one to look at to look for problems. The cluster alert log is found at $GI_HOME/log/<hostname>alert<hostname>.log and included only the top-level information such as startups, shutdowns, Cluster Verification Utility (CVU) checks, evictions, and reconfigurations due to eviction or shutdown of other nodes, as well as actual problems encountered by other processes. Even some of these reported problems may not be of concern—for example, errors from processes on startup may occur simply because they need resources that are not yet online. Having said that, you may find it helpful to check out these logs.
Here is an example of a message in the cluster alert log:
2010-09-22 02:35:11.474 [/u01/app/11.2.0/grid/bin/oraagent.bin(4696)]CRS-5016: Process /u01/app/11.2.0/grid/bin/lsnrctl" spawned by agent /u01/app/11.2.0/grid/bin/oraagent.bin" for action "check" failed: details at"(:CLSN00010:)" in "/u01/app/11.2.0/grid/log/racassur- pfix01/agent/crsd/oraagent_grid/oraagent_grid.log"
You can see that CRSD’s oraagent tries to do a check on the listener using the lsnrctl command and it failed. Let’s take a look at the file mentioned to see what happened. (I show only the more interesting lines and remove the date and time stamps.)
NOTE
If you do not find the timestamp you expect in the file, check the other files in the directory, because the log files are aged out when they reach a certain size, so there can be up to ten old versions of the file named (in this case) oraagent_grid.l01 to l10.
[ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] lsnrctl status LISTENER_SCAN3 [ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] (:CLSN00010:)Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC) (KEY=LISTENER_SCAN3))) [ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] (:CLSN00010:)TNS-12541: TNS:no listener [ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] (:CLSN00010:) TNS-12560: TNS:protocol adapter error [ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] (:CLSN00010:) TNS-00511: No listener [ora.LISTENER_SCAN3.lsnr][1579252032] {1:63693:62933} [check] (:CLSN00010:) Linux Error: 2: No such file or directory
These errors show that the listener was down when the check ran, but the trace file also shows that, just before this, a stop for that listener was called, so it is expected it would be down.
2010-09-22 02:35:11.365: [ AGFW][1646369088] {1:63693:62933} Com- mand: stop for resource: ora.LISTENER_SCAN3.lsnr 1 1 completed with status: SUCCESS
Nothing to worry about there, so long as the listener can be restarted when needed.
If you do happen to find a real problem that requires the help of Oracle Support, you can zip all the required trace files using $GI_HOME/bin/diagcollection.sh, which is a wrapper for diagcollection.pl. It can collect ADR incident data, Cluster Health Monitor O/S data, as well as clusterware trace files:
[root]# /u01/app/11.2.0/grid/bin/diagcollection.sh -h Production Copyright 2004, 2010, Oracle. All rights reserved Cluster Ready Services (CRS) diagnostic collection tool diagcollection --collect [--crs] For collecting crs diag information [--adr] For collecting diag information for ADR; specify ADR location [--chmos] For collecting Cluster Health Monitor (OS) data [--all] Default.For collecting all diag information. [--core] UNIX only. Package core files with CRS data [--afterdate] UNIX only. Collects archives from the specified Date. Specify in mm/dd/yyyy format [--aftertime] Supported with -adr option. Collects archives after the specified time. Specify in YYYYMMDDHHMISS24 format [--beforetime] Supported with -adr option. Collects archives before the specified date. Specify in YYYYMMDDHHMISS24 format [--crshome] Argument that specifies the CRS Home location [--incidenttime] Collects Cluster Health Monitor (OS) data from the specified time. Specify in MM/DD/YYYY24HH:MM:SS format If not specified, Cluster Health Monitor (OS) data generated in the past 24 hours are collected [--incidentduration] Collects Cluster Health Monitor (OS) data for the duration after the specified time. Specify in HH:MM format. If not specified, all Cluster Health Monitor (OS) data after inci- denttime are collected
NOTE
For 11.2.0.1, check the valid arguments with the –h flags as the arguments shown in this example are the 11.2.0.2 options.
Leave a Reply