Critical elements of an Oracle maintenance plan

By: SolarWinds on May 19, 2014


What does it take to maintain an Oracle database? Anything that keeps the database stable and reliable, from patching to regularly checking error logs to disaster recovery planning and testing.

To find out more about Oracle maintenance, we convened a Twitter chat (#datachat) with Oracle and performance experts, including: Dominic DelMolino (Oaktable member and Vice President of Systems Architecture at Agilex Technologies),  Arup Nanda (Oracle ACE and Principal Database Architect at Starwood Hotels & Resorts), Craig Mullins (IBM Champion and Founder at Mullins Consulting), Leighton Nelson (Oracle ACE Director and Principal DBA at Sisters of Mercy ) and Dean Richards (Manager of Sales Engineering, SolarWinds).

Read the full Twitter chat transcript here

Start with baseline metrics

Our expert panel agreed that first and foremost, you should start by understanding the database’s baseline metrics, including settings, versions, patch levels and more, s guest host Dominic DelMolino summarized:

Delmolino DB Stability

 

Make sure maintenance starts in development not production

Several on our panel pointed out that maintenance must start well before the production environment, all the way back in development as DelMolino pointed out:

 

Delmolino Backwards DB Maintenance

 

Use canned jobs or custom scripts, but understand how they work

Though many DBAs prefer custom scripts for maintenance jobs, Leighton Nelson noted that both canned jobs and custom scripts can aid in database maintenance, depending on requirements:

Nelson Canned Jobs Fine

 

Arup Nanda concurred and reiterated that the central focus of maintenance should be to script it, test it in non-production environments, monitor it, understand its funcitons well and, above all else, use the right tools for the right jobs:

Nanda Recovery Strategy Critical

Make sure you test your recovery strategy

Finally, the most critical, and often overlooked, task of maintenance should be to develop and test (and retest) a recovery plan that is tied to business requirements, as Arup Nanda noted:

Nanda Right Tools For DB Maintenance

 

Full transcript of Twitter chat about Oracle Database Maintenance

 

SWI_Database Hello #oracle fans! Today we’re talking about “Maintenance” and will bring Q1 next – Always use #datachat for your tweets.
SWI_Database Q1. What does the term “maintenance” mean? #datachat #oracle
ddelmoli A1. I think maintenance refers to what it takes to keep an Oracle database stable and reliable #datachat
ddelmoli A1. This might mean patching, backing up, restoring, tuning, implementing policies #datachat
arupnanda #datachat q1 maintenance is same as eating, brishing, flushing, showering as you wud do to your body
SWI_Database Q2. Who is responsible for deciding what (and when) maintenance tasks are performed? #datachat #oracle
arupnanda #datachat q2 DBAs and application maintenance folks jointly, depending on the sweet spot of timing. Focus on “joint”
ddelmoli q2 “Joint” is key — especially if maint activities could require an outage #datachat
SWI_Database Are those clear responsibilities? RT @ddelmoli: q2 “Joint” is key — especially if maint activities could require an outage #datachat
arupnanda @SWI_Database @ddelmoli #datachat it depends on the clear policy of the org. Most orgs don’t have a clear policy. and that *is* the problem.
SWI_Database @arupnanda @ddelmoli What best practices would you recommend to have clear policies in a ever changing environment? #datachat
ddelmoli q2 Maintenance changes may need to be treated like application changes in terms of who needs to be informed #datachat
ddelmoli Well publicized schedules of maintenance activities #datachat
SWI_Database RT @arupnanda: @SWI_Database @ddelmoli it’s just common sense. As an org matures, these policies are better cast in stone. #datachat
ddelmoli Maintenance activities should have some form of business value #datachat
SWI_Database Q3. Which do you prefer and why: canned database jobs for things like stats collection, or do you use custom jobs #datachat
arupnanda #datachat q3 custom scripts: flexible, understandable, no surprises (we know what precisely will happen)
arupnanda @ddelmoli #datachat not necesaarily. Trimming logs has no business value; but it’s important for a db’s health. clear objective is the key.
ddelmoli As long as you understand what the canned job is doing and can control it. I’m fine with them #datachat
arupnanda @ddelmoli #datachat and *that* is the problkm. Most folks have little clue on those “canned” jobs.
arupnanda @ddelmoli #datachat and burn themselves real bad and blame it on those jobs
ddelmoli If custom scripts are objectively verifiable (hint: source code control) then they’re very useful #datachat
ddelmoli Are there good ways to automate / orchestrate custom db maint scripts #datachat #database #chef #cron #puppet
SWI_Database #datachat RT @leight0nn If canned jobs work fine. If not use custom scripts. Seen folks rewrite existing canned jobs due to lack of know.
arupnanda @SWI_Database @leight0nn #datachat true; but then we need to define it. EM scripts are canned; something I wrote and put on the internet is not.
arupnanda #datachat Q3 I think we are confusing between canned and reusable scripts. they are different. The latter is transparent; former is not.
ddelmoli Is it easy to have a framework for custom scripts that does retries and notifications? #datachat
arupnanda @ddelmoli #datachat I would think one exists already altho I don’t know.
leight0nn @arupnanda @SWI_Database Understanding of the job regardless of source is crucial. I consider EM/database jobs canned. #datachat.
SWI_Dean #datachat Q3: how many people use DBMS_SCHEDULER as a framework. Not many as it is way to laborious IMHO
SWI_Dean #datachat Q3: any good tools on top that more poeple could use
arupnanda @SWI_Dean #datachat I do. It’s diffy but once used to, it’s fine. the biggest adv: it doesn’t run when the DB is down. No need to check.
SWI_Database Q4. What maintenance items you would advise DBAs start doing immediately? #datachat
arupnanda #datachat q4 1. check for ITL shortages, easy to fix. http://t.co/GqHb2dVDk1
arupnanda #datachat q4 2. check and increase the cache of sequences and (in RAC) consider NOORDER
arupnanda #datachat q4 3. trim all logs-listener.log, alert.log, CRS, CSS logs, etc.
arupnanda #datachat q4 4. set the admin_restriction to on in listener. http://t.co/WlkaDF1c5B
arupnanda @SWI_Dean #datachat Adv #2: no need for a password.
leight0nn Q4 @SWI_Database Collect performance and other metrics. Increase default AWR retention settings #datachat
ddelmoli Q4. Understand your DB baselines: settings, versions, patch levels, landscape. Helpful when deciding what to do next #datachat
arupnanda @SWI_Database #datachat Q4 just thought about it: it could fill a book, easily 🙂
SWI_Database @arupnanda Any good books recommendations or should someone write it? #datachat
ddelmoli @arupnanda @SWI_Database #datachat — lots of stuff to do, hard to prioritize
arupnanda @SWI_Database #datachat Seems such a book should exist already, altho I don’t know of one.
craigmullins @SWI_Database Q4.  DBAs should start testing their backup/recovery plans  immediately? #datachat #dba
ddelmoli #datachat I think the Expert OEM12c book is a good resource for maintenance ideas http://t.co/du1TXMpN8s
ddelmoli @craigmullins @SWI_Database #datachat most importantly their recovery processes 🙂
arupnanda @craigmullins @SWI_Database #datachat Completely agree. Particualrly recovery plans
ddelmoli q5 #datachat I think recovery requirements should drive backup strategies
SWI_Database Q5. Which is more important and why: backup strategy or recovery strategy? #datachat
arupnanda #datachat q5 recovery strategy is more important. Backup is driven off that; not the other way around.
SWI_Dean #datachat Q5. Both backup and recovery are important. Can’t have one without the other
SWI_Dean #datachat Q5 However, recovery requirements should drive backup strategy
ddelmoli q5 #datachat definitely recovery strategy
SWI_Database RT @ddelmoli: #datachat I think the Expert OEM12c book is a good resource for maintenance ideas http://t.co/du1TXMpN8s
craigmullins “@ddelmoli: q5 #datachat I think recovery requirements should drive backup strategies” <– totally agree
arupnanda #datachat Q5 thinking about possible failures and having a plan for all possibilities is important
ddelmoli q5.1 How creative can you get with a backup strategy? #datachat
arupnanda @ddelmoli #datachat if recovery is not important, I can be very creative. No backup at all. performs well.
ddelmoli @arupnanda #datachat Ah yes, the old /dev/null backup 🙂
SWI_Dean #datachat Q5: Sometimes export/import is valid backup strategy. Depends on recoverability needs
ddelmoli Q5.2 How important is selective or targeted recovery? #datachat
arupnanda @ddelmoli #datachat very, very important. your business could come up on a subset. plan for that
adnanmkm RT @arupnanda: #datachat q4 2. check and increase the cache of sequences and (in RAC) consider NOORDER
ddelmoli @arupnanda #datachat Agreed — especially if you’ve been consolidating apps into bigger databases
SWI_Dean #datachat Q5: selective recovery could be very important, based on business needs.
arupnanda @ddelmoli #datachat or, if you have histirical data not importnat for immediate business
ddelmoli @arupnanda #datachat Exactly — might have a phased recovery approach
arupnanda #datachat have a follow up question: which one is the most important to protect: datafile, db backup, archivedlogs?
ddelmoli Q5.3 How important is it to routinely test your recovery plans and approaches? #datachat
ddelmoli @arupnanda #datachat Do I have to pick only one? 🙂
arupnanda @ddelmoli #datachat Very important; but the exact approach may not be feasible in some cases. RMAN validate and PREVIEW restore are useful.
arupnanda @ddelmoli #datachat. Yes, Budget.
ddelmoli @arupnanda Ok — I’ll go with datafiles. #datachat
arupnanda #datachat Q5A I think archive log is. You can recreate a datafile; but a missing archlog will stop recovery cold.
arupnanda #datachat Q5 Everyone should make it it s point of using RMAN validate and restore preview after every backup. I do it religiously.
SWI_Database Great #datachat -Here is our last question for today: Q6 What are common mistakes made with regards to maintenance? (besides not doing any!)
arupnanda #datachat q6 mis-correlating cause and effect, e.g. increase buffer cache->perf goes up
ddelmoli @arupnanda #datachat Agreed — make every effort to validate the integrity of your backup.
ddelmoli q6 Assuming the change “shouldn’t affect that” #datachat
craigmullins @SWI_Database q6 failure to automate #datachat
arupnanda #datachat q6 overdoing it e.g. index rebuilds; but not checking for bottlenecks, e.g. wait events
arupnanda #datachat q6 assuming that only one person or group is responsible for maintenance. It’s always a joint effort.
ddelmoli q6 Failure to validate that the desired effect was achieved #datachat
arupnanda #datachat q6 assuming it will work, e.g. not testing backups or recovery strategies. Not scripting it.
ddelmoli q6 Doing maintenance “backwards” — implementing in Prod first #datachat
kevinclosson .@SWI_Database Q4: Do whatever it takes to ensure you don’t get screwed in license audit (resulting in more software spend quid pro quo) #datachat
SWI_Database Wrapping up another great #datachat – Thank you everyone for participating! and welcome @KevinClosson
ddelmoli @arupnanda #datachat No testing to see the impact of maintenance change if it’s done in Prod first
arupnanda #datachat I think the central points of maintenance: script it, test in lower envs, monitor, understand it well and use the right tool.
oracle_em We couldn’t agree more RT @ddelmoli #datachat I think the Expert #EM12c book is a good resource for maintenance ideas http://t.co/W1DPAiyCcA

Leave a Reply