Surprisingly, many Oracle database administrators often miss two critical checks when starting to troubleshoot problems: the alert log and v$resource_limit. Database administrators should always start with these fundamentals and progress methodically through a troubleshooting process. Getting caught up in the emotional “panic” of others or focusing on vague user reports are often distractions that lead database administrators away from root cause and the solution, causing critical time loss.
I’m not advocating ignoring error messages provided by users, as the error messages will be essential to rectifying the problem; I am recommending caution when “slowness” or another vague failure report is ticketed. How many times have database administrators heard, “The database is down and all users are not working,” only to discover what that really means is that three people at a remote location that was just slammed by a storm that caused a power outage can no longer work?
First, check the alert log
The alert log is a fountain of critical information chronologically captured so that finding important data is quick and easy. The alert log journals general configuration facts such as instance startup settings, run time events like redo log switches and warnings encompassing deadlocks, failed segment expansions due to tablespace storage shortages and ORA-nnnnn errors. The latter events are what database administrators must seek out to find the problem and to begin working toward resolution.
Second, check the v$resource_limit values
Next, by connecting to the database from a client workstation to check the v$resource_limit values, database administrators can quickly assess three operational metrics: status of the listener, accessibility of the database and resource availability. Doing a ‘select * from v$resource_limit;’ yields vital database health metrics (see below.) To keep the output neat, use ‘set linesize 200 pages 40’.
SQL> l 1* select * from v$resource_limit SQL> / RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU ------------------------------ ------------------- --------------- ---------- ---------- processes 234 296 800 800 sessions 243 307 885 885 enqueue_locks 220 298 10832 10832 enqueue_resources 51 95 4112 UNLIMITED ges_procs 231 292 801 801 ges_ress 0 0 17202 UNLIMITED ges_locks 0 0 26471 UNLIMITED ges_cache_ress 3585 6986 0 UNLIMITED ges_reg_msgs 271 809 2350 UNLIMITED ges_big_msgs 42 530 2350 UNLIMITED ges_rsv_msgs 0 0 1000 1000 gcs_resources 120338 396511 414309 414309 gcs_shadows 80088 276865 414309 414309 dml_locks 16 16 3892 UNLIMITED temporary_table_locks 0 3 UNLIMITED UNLIMITED transactions 102 144 973 UNLIMITED branches 0 0 973 UNLIMITED cmtcallbk 2 10 973 UNLIMITED max_rollback_segments 21 22 973 65535 sort_segment_locks 0 50 UNLIMITED UNLIMITED k2q_locks 0 0 1770 UNLIMITED max_shared_servers 1 1 UNLIMITED UNLIMITED parallel_max_servers 2 9 80 3600 23 rows selected.
Any resource shortage will inhibit optimal performance, and in many cases cause significant problems for users. It’s a good practice to check these values weekly so that you can recognize abnormalities quickly.
On one system, for example, the max_rollback_segments value had never exceeded 200, but one day after hearing of a performance problem, I found the value was at 1500. The root cause of the problem ended up being a report that wasn’t really a report but a batch job with lots of data changes followed by a report. After tracking down the submitting user and asking about the job, the response was, “We knew that report was not working but we decided to run it anyway.”
The good news is that by checking v$resource_limit immediately, finding the problem was relatively easy. Note the rollback level, check for a process consuming a lot of rollback, find out who submitted the job and then work with the submitter concerning what to do about the offending job. In this case the job was terminated and the submitter was “asked” to not resubmit the job until after the performance team reviewed the code.
Remember, focus on the fundamentals first. Doing these two critical checks immediately will take five minutes, saving you from a major headache that will come if you overlook these initial troubleshooting steps.