Monitoring SQL Server Health

The estimated reading time for this post is 15 minutes

It is important to monitor SQL Server instances to identify potential problems early, and to help with ongoing planning. SQL Server includes built-in features that aid administrators in monitoring large numbers of instances.

A healthy server has adequate resources to support the databases and applications that it hosts. If these resources are not adequate, the health of the server is diminished. The resources that are typically regarded as the main indicators of server health are CPUs, memory, and storage.

To stay in control of resources, and ultimately in control of performance, you need to monitor resources.  The main challenges of monitoring server health include:

  • The number of SQL Server instances that require monitoring, and the distribution of these instances and databases across the servers, both virtual and physical.
  • Isolating the cause of a problem beyond the level of the server or instance.
  • Identifying potential problems before they become serious.
  • Identify areas where you are making inefficient use of resources and reallocate them accordingly.

The monitoring solution that you choose will depend on various factors, including the size and type of your SQL Server infrastructure, and the level of detail that you require.

Data Collector and Management Data Warehouse

Data Collector and the Management Data Warehouse (MDW) enable administrators to remotely capture detailed performance data from multiple SQL Server instances. You can use the data in the MDW to perform diagnostic monitoring and to make comparisons against historical data.

Data Collector captures data in one or more data collection sets, and then sends it to the server that hosts the MDW. A collection set is a defined set of collection items, such as performance counters, SQL Server Profiler traces, and Transact-SQL statements. Collection sets use SQL Server Agent jobs to gather data. Each collection set has three schedules that you can define to control the following events:

  • Frequency of data capture.
  • Upload frequency.
  • Data retention.

You can use the built-in reports in SSMS to view and assess the collected data. The Management Data Warehouse Overview report is the main one, providing links to detailed reports for each monitored instance.

Data Collector enables you to simplify health monitoring by centralizing the storage and analysis of performance data. It includes a central data warehouse for holding performance data, jobs for collecting and uploading the data to the data warehouse, and a set of built-in reports that you can use to analyze the data. Data Collector also includes built-in reports that enable you to analyze the data. You can write your own custom reports by using SQL Server Reporting Services, using custom reports in SSMS, or running Transact-SQL queries. This lesson describes how to set up and configure the Data Collector, and the built-in reports that you can use to analyze the data collected by Data Collector.

A Data Collector installation includes the following components:

  • Targets. A target is the object from which data will be collected in a Data Collection configuration.
  • Target types. These define the specific type of target, which include instances of SQL Server and SQL Server databases.
  • Collection sets. Collection sets define the type of data to collect, the frequency of collection, and the data retention period.
  • Collection mode. You can configure Data Collector to use either cached or non-cached mode. For continuous collection, you should use cached mode. For periodic collection, you can use non-cached mode, which uploads all collected data directly to the MDW.

Data Collection Sets

SQL Server includes three pre-configured system data collection sets that you can use to cover most common data collection requirements include:

  • Disk Usage has two collection items, called Disk Usage – Data Files and Disk Usage – Log Files. This collection set gathers data about disk usage by database data and log files. SQL Server uploads the Disk Usage collection set to the MDW every six hours, and the MDW retains the data for 730 days by default.
  • Server Activity has two collection items, called Server Activity – DMV Snapshots and Server Activity – Performance Counters. This collection set gathers data about the resource usage by SQL Server instances and the host server. SQL Server uploads the cached Server Activity collection set to the MDW every 15 minutes, and the MDW retains the data for 14 days by default.
  • Query Statistics has a single collection item called Query Statistics – Query Activity. This collection set gathers data about queries, including statistics, query plans, and individual queries. SQL Server uploads the cached Query Statistics collection set to the MDW every 15 minutes, and the MDW retains the data for 14 days by default.
  • Transaction Performance is intended for use with the Analyze, Migrate, and Report (AMR) tool. You can use the AMR tool to help you assess which tables would be best suited for migration to the new In-Memory online transaction processing (OLTP) feature.

In addition to the System Data Collection Sets, you can extend the SQL Server Data Collector by creating user-defined Data Collection Sets. This functionality enables you to specify the data you wish to collect and to use the infrastructure provided by the SQL Server Data Collector to collect and centralize the data. The Data Collector can collect information from several locations:

  • It can query dynamic management views (DMVs) and dynamic management functions (DMFs) to retrieve detailed information about the operation of the system.
  • It can retrieve performance counters that provide metrics about the performance of both SQL Server and the entire server.
  • It can capture SQL Trace events that have occurred.

Considerations for using Data Collector

When planning to use Data Collector and the MDW, consider the following points:

  • You should create the MDW on a separate server to the ones you plan to monitor. This has several benefits, including:

o   You can access reports that combine information for all server instances in your enterprise.

o   You can offload the need to hold collected data and to report on it from the production servers. 

o   You avoid collecting statistics related to the MDW itself.

  • The MDW can grow quite quickly, although the speed of growth will depend on the frequency of data sampling and the amount of activity on the monitored servers. You should use a test system to obtain a realistic estimate of data growth for the MDW, and ensure that the SQL Server instance that hosts the MDW in the production environment has adequate storage.
  • Configure data upload and retention schedules to suit your requirements to ensure that you do not collect too much or too little data. If you need to retain data beyond the retention period, you can extract it to another location.

image-3817

Configuring data collection

You can set up and configure data collection on SQL Server by using the built-in wizards in SSMS:

  1. Configure Management Data Warehouse. Before running the wizard, you should check that you have sufficient disk space available to support the needs of the MDW. In a typical configuration, you should allow at least 300 MB per day for each managed instance.
  2. Configure Data Collector. You can use the Configure Data Collection wizard to configure each server instance to collect and upload the required data. The only processes that are run on the local instances are the jobs used to collect and upload the data to the MDW.

Security for Data Collector

There are two security aspects that you need to consider for data collector:

  • Access to the MDW.
  • Access to configure and use the Data Collector.

The roles that are associated with data collector and MDV are described as follows:

  • dc_admin. Full administrator access to configuration.
  • dc_operator. Read and update access to configuration.
  • dc_proxy. Read access to configuration.
  • mdw_admin. Full access to the MDW.
  • mdw_writer. Write and read access, required by data collectors.
  • mdw_reader. Read access, required by users accessing reports.

Options for Monitoring Data Collector

You can monitor the Data Collector itself to ensure that it is running efficiently and to troubleshoot any issues that arise. You can do this by:

  • Viewing SQL Server Agent job history
  • Accessing Data Collector information in the msdb database. You can access this information by using either the log file viewer or by querying the following functions and views:

o    fn_syscollector_get_execution_details() 

o   fn_syscollector_get_execution_stats()

o   syscollector_execution_log 

o   syscollector_execution_log_full 

o   syscollector_execution_stats

SQL Server Utility

SQL Server Utility uses the same underlying mechanisms to gather performance data as the Data Collector. However, SQL Server Utility stores the collected data in a database called the Utility Management Data Warehouse (UMDW), which has a different schema to that of the MDW used by Data Collector. The data that SQL Server Utility collects is less detailed than that in Data Collector. Data Collector is useful for looking deep into statistics to isolate the causes of performance issues. SQL Server Utility, on the other hand, is more useful for everyday monitoring of resource usage at a high level. The dashboard view makes it easy for administrators to quickly identify problems before they arise, enabling a more proactive approach to health monitoring.

You should give careful consideration to the planning of a UCP to ensure that your implementation has adequate resources and scope for growth.

Considerations for SQL Server Utility

When selecting an instance for the role of the UCP, you should consider the following factors:

  • The instance must be a Database Engine instance.
  • The minimum supported SQL Server version is SQL Server 2008 R2. If the instance is running SQL Server 2012, this must be SQL Server 2012 Enterprise Edition.
  • The SQL Server Agent account on the instance must have read permission on Active Directory User objects.
  • It is recommended that the SQL Server instance to be the UCP is configured to be case-sensitive. If it is configured to be case-insensitive, all managed instances should also be configured in that way.
  • If the selected instance has ever previously functioned as a UCP, you should remove all managed instances and all UCP components before creating the new UCP.

The UMDW database is created automatically, at the same time as the UCP. The database name is sysutility_mdw. When planning storage for the UMDW, consider the following points:

  • Data collection for the enrolled instances occurs every 15 minutes and the default retention period for the collected data is one year by default. You can change the retention period to one month, three months, six months or two years.
  • The average annual growth for a UMDW database is 2 GB. You should test the UCP in your own environment to obtain a more accurate figure.
  • If you enroll more managed instances, this will increase the amount of storage required. On average, each managed instance requires 20 MB of storage. Again, you should test this in your own environment.

Additional factors to consider are:

  • If the UCP will monitor servers across multiple Windows domains, the domains must have two-way trust relationships.
  • SQL Server Utility does not support data collection for FILESTREAM data.

image-3818

SQL Server Utility Configuration

1.      Utility Control Point

To configure SQL Server Utility, you must first create a Utility Control Point (UCP)—this is at the heart of the SQL Server Utility. UCP hosts the UMDW database and hosts the policies that you can use to specify server health criteria.

You can create a UCP by opening the Utility Explorer in SSMS, and running the Create Utility Control Point wizard from the Getting Started tab. The wizard requires you to perform the following actions:

  1. Specify the instance of SQL Server and provide a name for the UCP.
  2. Specify a Windows domain account to run the utility collection set. You can use a specially created account, which is the recommended configuration, or use the SQL Server Agent service account.
  3. The wizard then validates the SQL Server instance, and you can view the results and make any required changes. The validation tests include the version of SQL Server in use, the presence of a database called sysutility_mdw, and whether or not the server is already part of a SQL Server Utility configuration—either as a UCP or a managed instance.
  4. The wizard provides a summary of the specified configuration, and you can then create the UCP.

image-3819
2.      Server Enrollment

After you create the UCP, you can enroll servers to it. Enrolled servers are referred to as managed instances. Each managed instance uses SQL Server Agent jobs to collect data into a Utility Collection Set, which they upload to the UMDW on the UCP. These jobs are very similar to the ones that the Data Collector uses for the same purpose. Data collection occurs every 15 minutes, and you cannot configure this. Utility Collection Sets include data about CPU usage and database file storage by instances and datatier applications, as well as CPU usage by the host computer.

When planning to enroll a SQL Server instance to a UCP, the same basic requirements about the server version, the SQL Server Agent account, and case sensitivity apply. You can enroll a server to a UCP by opening the Utility Explorer in SSMS, and running the Enroll instance wizard from the Getting Started tab. The wizard requires you to perform the following actions:

  1. Specify the SQL Server instance that you want to enroll.
  2. Specify a Windows domain account to run the utility collection set. You can use a specially-created account, which is the recommended configuration, or use the SQL Server Agent service account.
  3. The wizard then validates the SQL Server instance, and you can view the results to make any required changes. The validation tests include the version of SQL Server in use, and whether or not the server is already enrolled with a UCP.
  4. The wizard provides a summary of the specified configuration, and you can then enroll the instance to the UCP. Because the wizard runs in the context of a specific UCP, you do not need to specify the name of the UCP when enrolling an instance.

image-3820
3.      Health Policies

Health policies specify the thresholds that determine when a resource is marked as overutilized or under-utilized. You can define global health policies and policies for individual instances and data-tier applications (DACs).

A DAC is a pre-packaged object including all the database and instance objects that a given application uses. DACs simplify application development, deployment, and management.

You can configure global health policies by using the Utility Explorer in SSMS. Global health policies enable you to define utilization threshold values for CPU and data storage resources by all managed instances and all DACs.

For a managed instance, you can configure the following policies:

  • CPU over-utilization.
  • CPU under-utilization.
  • Database data file over-utilization.
  • Database data file under-utilization.
  • Database log file over-utilization.
  • Database data file under-utilization.
  • CPU over-utilization by the computer.
  • CPU under-utilization by the computer.
  • Disk space of storage volumes over-utilization.
  • Disk space of storage volumes under-utilization.

For a DAC, you can configure the following policies:

  • CPU over-utilization.
  • CPU under-utilization.
  • Database data file over-utilization.
  • Database data file under-utilization.
  • Database log file over-utilization.
  • Database data file under-utilization.

image-3821
By default, all managed instances and DACs use the global health policies. You can define health policies for individual managed instances and DACs, and when you do so, these settings override the global policy settings. This enables you to fine-tune the utilization thresholds to better suit your individual requirements.

Policy evaluation for volatile resources

Disk space utilization is cumulative; when a threshold value is reached, disk space usage will not shrink without administrative intervention. On the other hand, CPU utilization is volatile; it will drop and increase depending on current conditions and workloads. To avoid situations where a single spike in CPU utilization causes a CPU resource to be labeled as under-utilized or over-utilized, you can define thresholds for two additional policies. 

The first policy defines how frequently CPU over-utilization should reach the threshold before the resource is marked as over-utilized. You can specify two settings for this policy—a time window that defines the evaluation period, and a percentage figure that outlines the proportion of policy evaluations that are allowed to reach the threshold during the evaluation period, before the resource is marked as over-utilized.

The second policy works in the same way, but defines under-utilization instead of over-utilization.

If the settings for volatile resource policy evaluation are not well-planned, you can end up recording useless and misleading policy violations, sometimes referred to as ‘noise’. To reduce unwanted noise, consider the following options:

  • Make the evaluation period longer. The default evaluation period is one hour. Data collection occurs every 15 minutes, so by default there are four data collection events per evaluation period. The default percentage of evaluations that are allowed to be in violation is 20 percent. If you increase the evaluation period to six hours, there will be 24 data collection events per evaluation period. To exceed the 20 percent threshold would require five violations.
  • Increase the percentage of evaluations that are allowed to be in violation. If this threshold is higher, the system will allow more violations, which will reduce the number of times a resource is marked as over-utilized or under-utilized.
  • Do not use policy thresholds for CPU utilization that are too low or too high. If you select lower or higher percent as the threshold figure, this will cause more violations.
  • If you do not change the default threshold value, the system will never record under-utilization.

Viewing Server Health

You can use the Utility Dashboard in Utility Explorer to view server health information. To open the Utility Dashboard, click on the UCP name in Utility Explorer, and view the Utility Explorer Content tab. The Utility Dashboard displays utilization statistics in graphical format for managed instances and DACs. The values displayed include:

  • CPU utilization for the instance of SQL Server.
  • Database file utilization.
  • Storage volume space utilization.
  • CPU utilization for the computer.

Each of these values is indicated as over-utilized, well-utilized, under-utilized, or no data available. The latter indicator is present if no managed instance (or DAC) is enrolled or if the first data collection event has not yet occurred. This value will also be displayed if the uploading of data to the UMDW has failed. You can also view storage utilization history, with options of one day, one week, one month, and one year.  To investigate reported utilization figures in more depth, you can click the specific item, such as Overutilized Database Files, that you want to investigate. This opens up the managed instance or DAC and displays CPU Utilization, Storage Utilization, Policy Details, and Property Details tabs, which you can use to investigate the issue further.

image-3822

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

avatar
1000
  Subscribe  
Notify of