Troubleshooting High CPU Issues in SQL Server (Part 1)

By Pinal Dave on December 4, 2014


Performance troubleshooting is one of the unique challenges every developer and DBA has to face. One of the major complaints about performance troubleshooting is utilization of resources. Few organizations resort to putting muscle power to scale up the system without analyzing the root cause of the performance problem. For example, a typical scenario might involve a SQL Server instance with memory that is running continuously slow; frequently, the DBA responds to this by recommending that the RAM on the server be increased. Because the root cause of the memory issues is never fully investigated, this process is likely to repeat itself regularly over time. On the contrary, if the CPU is consistently running at a 70-80%, it isn’t always easy to increase the CPU instantaneously (provided we are on a physical machine).

Whenever there is a high CPU issue reported on SQL Server machine, we should first determine whether the issue was caused by SQL Server or some process outside of SQL. Essentially, it’s necessary to narrow down the high CPU problem to the lowest possible level–the component which is causing high CPU.

Exploring ring buffers to get historical data

Since SQL Server keeps historical data about CPU usage in ring buffers, we can query and find the history available on the system at any point in time. The example below is taken from Performance Dashboard reports query:

DECLARE @ms_ticks_now BIGINT

SELECT @ms_ticks_now = ms_ticks
FROM sys.dm_os_sys_info;

SELECT TOP 15 record_id
	,dateadd(ms, - 1 * (@ms_ticks_now - [timestamp]), GetDate()) AS EventTime
	,SQLProcessUtilization
	,SystemIdle
	,100 - SystemIdle - SQLProcessUtilization AS OtherProcessUtilization
FROM (
	SELECT record.value('(./Record/@id)[1]', 'int') AS record_id
		,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS SystemIdle
		,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLProcessUtilization
		,TIMESTAMP
	FROM (
		SELECT TIMESTAMP
			,convert(XML, record) AS record
		FROM sys.dm_os_ring_buffers
		WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
			AND record LIKE '%<SystemHealth>%'
		) AS x
	) AS y
ORDER BY record_id DESC

 

The query above shows 15 minutes worth of data. We can change top 15 in the query if we want to get more details.

Understanding the relation between CPU and PerfMon

Once we have identified that SQL Server process is consuming CPU, we have to next find which inside SQL Server process is consuming this CPU. We need to remember that CPU consumes time in two modes:

  • User mode
  • Kernel mode

This can be seen via Performance Monitor by monitoring the “% Privileged Time” and “% User Time” counters under the “Process” node. If “% Privileged Time” value is more than 30%, it’s generally caused by faulty system drivers or antivirus software. In such situations, make sure the BIOS and filter drivers are up-to-date, and then try disabling the antivirus software temporarily to see the change.

If “% User Time” is high then there is something consuming the user mode of SQL Server. There are several known patterns which can cause high CPU for processes running in SQL Server, including:

  1. Query executing causing high CPU
  2. System tasks are consuming CPU
  3. Excessive Compilation and Recompilation of queries

Examining SQL Server processes

Now let’s examine what is happening in our system. The query below can help in finding the currently executing queries in SQL Server:If a SQL Server process is consuming high CPU, then executing the above query can help in finding the various requests currently getting executed inside SQL Server.

SELECT 
	r.session_id
	,st.TEXT AS batch_text
	,SUBSTRING(st.TEXT, statement_start_offset / 2 + 1, (
			(
				CASE 
					WHEN r.statement_end_offset = - 1
						THEN (LEN(CONVERT(NVARCHAR(max), st.TEXT)) * 2)
					ELSE r.statement_end_offset
					END
				) - r.statement_start_offset
			) / 2 + 1) AS statement_text
	,qp.query_plan AS 'XML Plan'
	,r.*
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
ORDER BY cpu_time DESC

 

The output shows the data sorted by CPU. Once the query is identified, we have several options to try in tuning the query consuming the CPU, including:

  1. Make sure that the statistics are up-to-date for the underlying tables used.
  2. Check if the optimizer is suggesting any missing indexes in XML plan. If yes, evaluate and then create them.
  3. Check if there are scan of big tables which can be avoided, and if data can be filtered based on access.
  4. Tune the query using Database Engine Tuning Advisor and evaluate the recommendations given.

Sometimes it’s good to look at all queries executed so far and get the top CPU consumers from the query plans available in plan cache. The CPU might be normal at this point, but we want to get historical data. This can be achieved using query stats dynamic management views. Below query gives us an overview of cached batches or procedures which have used most CPU historically:

select top 50 
    sum(qs.total_worker_time) as total_cpu_time, 
    sum(qs.execution_count) as total_execution_count,
    count(*) as  number_of_statements, 
    qs.plan_handle 
from 
    sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc

 

It’s important to remember that above query gets data from the cached plan. This means that, if the plan is evicted from cache for one of the top CPU consumers,  we may miss the same.

If the top CPU consumer is one of the system processes, check the kind of process and see if there is any known issue about that process on support.microsoft.com. Some of the common searches that show high CPU on SQL Server because of system processes are:

And, of course, third-party performance tuning solutions such as SolarWinds Database Performance Analyzer can be of use in determining the root cause of performance issues such as high CPU usage. More suggestions for troubleshooting high CPU issues can also be found in the second part of this article here.

Related Posts

Comments

  1. One of our systems engineers ended up in this website and found the examination query pretty useful. Although we ended up going in a different direction, take a look at how we solved a storage performance issue by identifying rogue transactions coming from an SAP business Object.

    The article can be found here:
    http://bit.ly/2nG2V1b

Leave a Reply