The Importance of Performance Certainty

By: Gerardo Dada


Uptime is assumed. Performance is the new black. You can’t hope for performance, you need to be certain of the performance of the system. You need to know what drives performance and how any changes to the system change (or could change) the performance profile. This is, in summary, what performance certainty means.

It’s a new world, performance matters

Not too long ago, operations teams were focused on system or sub-system uptime. Today, the focus has shifted to applications and application  performance, reflected in two areas: end user experience and system efficiency – cost.

In the cloud and in the world of chargebacks, there is more obvious correlation between performance, resource utilization and cost. For some companies, especially SaaS startups, infrastructure costs represent a significant percentage of their expenses, therefore, system performance can have a direct impact on company profitability, health and sustainability.

 

We have been operating without performance certainty

Operations teams too often base decisions on estimations and hope. Not because they want to operate that way, simply because they had no choice. Let me illustrate with two examples:

The first one is databases and virtualization. For many years, DBAs resisted moving databases to virtualized environments because the uncertainty about how a database server would perform in a VM. They were uncomfortable with the potential performance impact from running on a hypervisor versus running on bare metal and also uncomfortable with the new metrics and operational unknowns of optimizing a VM. There was no performance certainty.

Today, 80% of databases are running in virtual environments. In 2015, Amazon shared publicly they are making a billion dollars a year on DBaaS (RDS, Aurora, Dynamo, etc.), not including databases running on EC2 instances. Now that they are on a dynamic environment, performance can change at any time – a noisy neighbor, an administrator moves the database to another VM, etc.

The second example is a personal story, when I was working at one of the top cloud service providers helping a customer operating a SaaS application. The customer told us they added a cluster every time they launched a certain number of customers. I asked, why?

The customer said it was the model that was working. I asked what was the driver of performance, maybe we could help double the number of customers every cluster could handle. Was it because the cluster ran out of memory, database performance, storage space, or something else? The customer did not know, they just new what worked and did not have time to experiment and did not have the visibility to know what drove performance and where are their bottlenecks.

With so many changes and so many variables, the old way of figuring this out, trial-and-error, does not work anymore. What we need, what the business expects from our systems, is performance certainty.

 

Performance Visibility is a Prerequisite to Performance Certainty

Systems are dynamic, always changing. You move VMs around,  replace hosts, add applications to the storage system, load changes in each application, etc.  Take the ongoing shifts to flash-based storage systems, converged architectures, and migration to cloud as examples. It is important to have a baseline of performance, and to understand how much each component impacts performance.

The 7 DevOps Principles establish not only a performance orientation but also the requirement  to monitor everything to get the needed visibility, across teams, about how every change impacts the system, specifically from a performance and throughput perspective.

You can’t fix what you can’t see. Before talking to finance about that AFA storage device you should know exactly how much is your existing storage system contributing to performance and what will be the improvement in terms of resource utilization and end user experience that will result from the shift to faster storage.

Before you move your next workload to the cloud you must understand what are the key aspects of the system that impact performance and have a reasonable expectation of how they will perform in the cloud, what resources they need, and how much will those resources cost per hour.

 

7 Steps to Achieve Performance Certainty

How do you get to performance certainty? Here are a few ideas, each of which will take you one step closer:

  1. Embrace performance as a discipline. This means uptime is no longer the key metric how you measure the quality of work. Uptime is assumed. How fast can you make the system work? How often do the teams talk about performance? What tools do you have to understand and improve performance?
  2. Adopt a response-time analysis mindset. The focus must shift from resource metrics, logs and health to time. Time spent on every process, query, wait state, and contribution to time from storage (i/o, latency), networking and other components supporting the database and the application. Here’s is how response time applies to database performance.
  3. Establish a baseline. Define what are the key metrics that matter, ideally centered around application throughput and end-user experience.  (again, not CPU utilization or theoretical IOPS). Statistical baselines help understand what is normal, and how/when performance has changed. Alerts based on baselines based on relevant performance metrics then allow you to focus on what matters.
  4. Don’t guess. Before moving to faster hardware or provisioning more, understand the performance contribution of each component, which indicates its potential contribution to performance improvement too
  5. Become the performance guru of your team. Knowledge is power. With the shift in IT towards performance, he (or she) who better understands performance, what drives it and how to improve it, quickly becomes more valuable to the organization.
  6. Share performance dashboards. Take credit for the performance improvements you have achieved. Educate management about cost savings resulting from reclaimed hardware or delayed investments. Share performance data. Be the authority. Report the performance impact and improvement (or not) of each infrastructure component, and each team member. “Joe, the code you wrote this week, sucks, it is 25% slower than last week’s. Here’s the data.
  7. Plan performance changes. You will know when you have performance certainty when you can accurately predict application performance before the changes occur, and when you can guide your organization towards better performance.

In summary: It’s all about the application. Performance is the new black. Performance certainty, when you are not guessing, but you know how will a system perform and how to fix it, will be, very soon, a job requirement.

Leave a Reply