What does it take to maintain an Oracle database? Anything that keeps the database stable and reliable, from patching to regularly checking error logs to disaster recovery planning and testing.
To find out more about Oracle maintenance, we convened a Twitter chat (#datachat) with Oracle and performance experts, including: Dominic DelMolino (Oaktable member and Vice President of Systems Architecture at Agilex Technologies), Arup Nanda (Oracle ACE and Principal Database Architect at Starwood Hotels & Resorts), Craig Mullins (IBM Champion and Founder at Mullins Consulting), Leighton Nelson (Oracle ACE Director and Principal DBA at Sisters of Mercy ) and Dean Richards (Manager of Sales Engineering, SolarWinds).
Read the full Twitter chat transcript here
Start with baseline metrics
Our expert panel agreed that first and foremost, you should start by understanding the database’s baseline metrics, including settings, versions, patch levels and more, s guest host Dominic DelMolino summarized:
Make sure maintenance starts in development not production
Several on our panel pointed out that maintenance must start well before the production environment, all the way back in development as DelMolino pointed out:
Use canned jobs or custom scripts, but understand how they work
Though many DBAs prefer custom scripts for maintenance jobs, Leighton Nelson noted that both canned jobs and custom scripts can aid in database maintenance, depending on requirements:
Arup Nanda concurred and reiterated that the central focus of maintenance should be to script it, test it in non-production environments, monitor it, understand its funcitons well and, above all else, use the right tools for the right jobs:
Make sure you test your recovery strategy
Finally, the most critical, and often overlooked, task of maintenance should be to develop and test (and retest) a recovery plan that is tied to business requirements, as Arup Nanda noted:
Full transcript of Twitter chat about Oracle Database Maintenance
SWI_Database | Hello #oracle fans! Today we’re talking about “Maintenance” and will bring Q1 next – Always use #datachat for your tweets. |
SWI_Database | Q1. What does the term “maintenance” mean? #datachat #oracle |
ddelmoli | A1. I think maintenance refers to what it takes to keep an Oracle database stable and reliable #datachat |
ddelmoli | A1. This might mean patching, backing up, restoring, tuning, implementing policies #datachat |
arupnanda | #datachat q1 maintenance is same as eating, brishing, flushing, showering as you wud do to your body |
SWI_Database | Q2. Who is responsible for deciding what (and when) maintenance tasks are performed? #datachat #oracle |
arupnanda | #datachat q2 DBAs and application maintenance folks jointly, depending on the sweet spot of timing. Focus on “joint” |
ddelmoli | q2 “Joint” is key — especially if maint activities could require an outage #datachat |
SWI_Database | Are those clear responsibilities? RT @ddelmoli: q2 “Joint” is key — especially if maint activities could require an outage #datachat |
arupnanda | @SWI_Database @ddelmoli #datachat it depends on the clear policy of the org. Most orgs don’t have a clear policy. and that *is* the problem. |
SWI_Database | @arupnanda @ddelmoli What best practices would you recommend to have clear policies in a ever changing environment? #datachat |
ddelmoli | q2 Maintenance changes may need to be treated like application changes in terms of who needs to be informed #datachat |
ddelmoli | Well publicized schedules of maintenance activities #datachat |
SWI_Database | RT @arupnanda: @SWI_Database @ddelmoli it’s just common sense. As an org matures, these policies are better cast in stone. #datachat |
ddelmoli | Maintenance activities should have some form of business value #datachat |
SWI_Database | Q3. Which do you prefer and why: canned database jobs for things like stats collection, or do you use custom jobs #datachat |
arupnanda | #datachat q3 custom scripts: flexible, understandable, no surprises (we know what precisely will happen) |
arupnanda | @ddelmoli #datachat not necesaarily. Trimming logs has no business value; but it’s important for a db’s health. clear objective is the key. |
ddelmoli | As long as you understand what the canned job is doing and can control it. I’m fine with them #datachat |
arupnanda | @ddelmoli #datachat and *that* is the problkm. Most folks have little clue on those “canned” jobs. |
arupnanda | @ddelmoli #datachat and burn themselves real bad and blame it on those jobs |
ddelmoli | If custom scripts are objectively verifiable (hint: source code control) then they’re very useful #datachat |
ddelmoli | Are there good ways to automate / orchestrate custom db maint scripts #datachat #database #chef #cron #puppet |
SWI_Database | #datachat RT @leight0nn If canned jobs work fine. If not use custom scripts. Seen folks rewrite existing canned jobs due to lack of know. |
arupnanda | @SWI_Database @leight0nn #datachat true; but then we need to define it. EM scripts are canned; something I wrote and put on the internet is not. |
arupnanda | #datachat Q3 I think we are confusing between canned and reusable scripts. they are different. The latter is transparent; former is not. |
ddelmoli | Is it easy to have a framework for custom scripts that does retries and notifications? #datachat |
arupnanda | @ddelmoli #datachat I would think one exists already altho I don’t know. |
leight0nn | @arupnanda @SWI_Database Understanding of the job regardless of source is crucial. I consider EM/database jobs canned. #datachat. |
SWI_Dean | #datachat Q3: how many people use DBMS_SCHEDULER as a framework. Not many as it is way to laborious IMHO |
SWI_Dean | #datachat Q3: any good tools on top that more poeple could use |
arupnanda | @SWI_Dean #datachat I do. It’s diffy but once used to, it’s fine. the biggest adv: it doesn’t run when the DB is down. No need to check. |
SWI_Database | Q4. What maintenance items you would advise DBAs start doing immediately? #datachat |
arupnanda | #datachat q4 1. check for ITL shortages, easy to fix. http://t.co/GqHb2dVDk1 |
arupnanda | #datachat q4 2. check and increase the cache of sequences and (in RAC) consider NOORDER |
arupnanda | #datachat q4 3. trim all logs-listener.log, alert.log, CRS, CSS logs, etc. |
arupnanda | #datachat q4 4. set the admin_restriction to on in listener. http://t.co/WlkaDF1c5B |
arupnanda | @SWI_Dean #datachat Adv #2: no need for a password. |
leight0nn | Q4 @SWI_Database Collect performance and other metrics. Increase default AWR retention settings #datachat |
ddelmoli | Q4. Understand your DB baselines: settings, versions, patch levels, landscape. Helpful when deciding what to do next #datachat |
arupnanda | @SWI_Database #datachat Q4 just thought about it: it could fill a book, easily 🙂 |
SWI_Database | @arupnanda Any good books recommendations or should someone write it? #datachat |
ddelmoli | @arupnanda @SWI_Database #datachat — lots of stuff to do, hard to prioritize |
arupnanda | @SWI_Database #datachat Seems such a book should exist already, altho I don’t know of one. |
craigmullins | @SWI_Database Q4. DBAs should start testing their backup/recovery plans immediately? #datachat #dba |
ddelmoli | #datachat I think the Expert OEM12c book is a good resource for maintenance ideas http://t.co/du1TXMpN8s |
ddelmoli | @craigmullins @SWI_Database #datachat most importantly their recovery processes 🙂 |
arupnanda | @craigmullins @SWI_Database #datachat Completely agree. Particualrly recovery plans |
ddelmoli | q5 #datachat I think recovery requirements should drive backup strategies |
SWI_Database | Q5. Which is more important and why: backup strategy or recovery strategy? #datachat |
arupnanda | #datachat q5 recovery strategy is more important. Backup is driven off that; not the other way around. |
SWI_Dean | #datachat Q5. Both backup and recovery are important. Can’t have one without the other |
SWI_Dean | #datachat Q5 However, recovery requirements should drive backup strategy |
ddelmoli | q5 #datachat definitely recovery strategy |
SWI_Database | RT @ddelmoli: #datachat I think the Expert OEM12c book is a good resource for maintenance ideas http://t.co/du1TXMpN8s |
craigmullins | “@ddelmoli: q5 #datachat I think recovery requirements should drive backup strategies” <– totally agree |
arupnanda | #datachat Q5 thinking about possible failures and having a plan for all possibilities is important |
ddelmoli | q5.1 How creative can you get with a backup strategy? #datachat |
arupnanda | @ddelmoli #datachat if recovery is not important, I can be very creative. No backup at all. performs well. |
ddelmoli | @arupnanda #datachat Ah yes, the old /dev/null backup 🙂 |
SWI_Dean | #datachat Q5: Sometimes export/import is valid backup strategy. Depends on recoverability needs |
ddelmoli | Q5.2 How important is selective or targeted recovery? #datachat |
arupnanda | @ddelmoli #datachat very, very important. your business could come up on a subset. plan for that |
adnanmkm | RT @arupnanda: #datachat q4 2. check and increase the cache of sequences and (in RAC) consider NOORDER |
ddelmoli | @arupnanda #datachat Agreed — especially if you’ve been consolidating apps into bigger databases |
SWI_Dean | #datachat Q5: selective recovery could be very important, based on business needs. |
arupnanda | @ddelmoli #datachat or, if you have histirical data not importnat for immediate business |
ddelmoli | @arupnanda #datachat Exactly — might have a phased recovery approach |
arupnanda | #datachat have a follow up question: which one is the most important to protect: datafile, db backup, archivedlogs? |
ddelmoli | Q5.3 How important is it to routinely test your recovery plans and approaches? #datachat |
ddelmoli | @arupnanda #datachat Do I have to pick only one? 🙂 |
arupnanda | @ddelmoli #datachat Very important; but the exact approach may not be feasible in some cases. RMAN validate and PREVIEW restore are useful. |
arupnanda | @ddelmoli #datachat. Yes, Budget. |
ddelmoli | @arupnanda Ok — I’ll go with datafiles. #datachat |
arupnanda | #datachat Q5A I think archive log is. You can recreate a datafile; but a missing archlog will stop recovery cold. |
arupnanda | #datachat Q5 Everyone should make it it s point of using RMAN validate and restore preview after every backup. I do it religiously. |
SWI_Database | Great #datachat -Here is our last question for today: Q6 What are common mistakes made with regards to maintenance? (besides not doing any!) |
arupnanda | #datachat q6 mis-correlating cause and effect, e.g. increase buffer cache->perf goes up |
ddelmoli | @arupnanda #datachat Agreed — make every effort to validate the integrity of your backup. |
ddelmoli | q6 Assuming the change “shouldn’t affect that” #datachat |
craigmullins | @SWI_Database q6 failure to automate #datachat |
arupnanda | #datachat q6 overdoing it e.g. index rebuilds; but not checking for bottlenecks, e.g. wait events |
arupnanda | #datachat q6 assuming that only one person or group is responsible for maintenance. It’s always a joint effort. |
ddelmoli | q6 Failure to validate that the desired effect was achieved #datachat |
arupnanda | #datachat q6 assuming it will work, e.g. not testing backups or recovery strategies. Not scripting it. |
ddelmoli | q6 Doing maintenance “backwards” — implementing in Prod first #datachat |
kevinclosson | .@SWI_Database Q4: Do whatever it takes to ensure you don’t get screwed in license audit (resulting in more software spend quid pro quo) #datachat |
SWI_Database | Wrapping up another great #datachat – Thank you everyone for participating! and welcome @KevinClosson |
ddelmoli | @arupnanda #datachat No testing to see the impact of maintenance change if it’s done in Prod first |
arupnanda | #datachat I think the central points of maintenance: script it, test in lower envs, monitor, understand it well and use the right tool. |
oracle_em | We couldn’t agree more RT @ddelmoli #datachat I think the Expert #EM12c book is a good resource for maintenance ideas http://t.co/W1DPAiyCcA |
Leave a Reply