A blocked session is a blocked session, regardless of the database solution.
Last week, a client asked me to assist with a performance issue impacting the help desk ticketing system. Long wait times were causing increased call handle times to the point where the caller’s data was being written down and then applied to a ticket after the call ended. Since company team members from all of the help desk locations were impacted and the infrastructure included multiple web and application servers driving traffic to a single database server, it made sense to start looking at the database tier. Besides, the school of hard knocks has taught me to always make sure my backyard looks sharp before I start complaining about my neighbor’s yard.
Using a tool the client had recently installed – the fact that the tool was available had not been communicated yet – and had just been configured to include the production servers for the ticketing platform, we quickly discovered that 5 – 10% of nearly 1000 sessions (Figure 1.) were being blocked. Sessions become blocked when a needed resource is not available and in many cases, including this case, results in high CPU levels. The database server was running at nearly 100% CPU used.
Figure 1. Blocked Sessions.
Next step, identify which wait events were occurring most frequently to determine root cause. One of the company’s database administrators quickly found that the LATCH_EX and CXPACKET wait types needed to be assessed first. See Figure 2.
Figure 2. Wait Types Needing Assessment.
To move this story along, here’s how the triage team summarized the situation: highly-parallelized queries were encountering contention for objects (LATCH_EX)—tables and indexes—resulting in delayed parallel processing (CXPACKET), creating a bottleneck by blocking query execution for other sessions (blocked sessions).
The team decided that the contention problem: 1) needed more analysis; and 2) could not be addressed live during production. Therefore, we all agreed that the best immediately addressable move was to minimize the impact of the waiting parallel queries by reducing their CPU footprint. The DBA was granted change approval to reduce the MAX_DOP value by one-third. See Figure 1 where the blocked session count falls off around 10:45 a.m. The CPU usage also dropped immediately to 60% used. The ticketing system users have reported that the system is performing better than ever!
You’ll read it here if this Oracle DBA has to contend again with SQL Server to bring about additional enhancement for this client.
Be sure to read Part 2 of this story, to learn how we were able to provide root cause remediation of this problem.