SQL Server Advance Monitoring

The estimated reading time for this post is 1 minutes

SQL Server and Windows platforms provide some features for administrative alerts and notifications. SQL Server DBAs should take this advantage to perform pro-active monitoring on SQL Server, Databases, Windows Server and/or Failover Cluster setup. In this post, I will explain some of critical SQL Server and Windows Server errors.

SQL Server Alerts and Notifications

SQL Server errors are categorized between Severity 0 and Severity 24, which is the criticality of the error. By using SQL Server Database Mail feature, DBAs can monitor the instance activities near real-time via Email notifications. The followings are the critical severity errors that should be monitored actively.

Severity LevelDescription
0 - 9Informational messages that return status information or report errors that are not severe. The Database Engine does not raise system errors with severities of 0 through 9.
10Informational messages that return status information or report errors that are not severe. For compatibility reasons, the Database Engine converts severity 10 to severity 0 before returning the error information to the calling application.
11Indicates that the given object or entity does not exist.
12A special severity for queries that do not use locking because of special query hints. In some cases, read operations performed by these statements could result in inconsistent data, since locks are not taken to guarantee consistency.
13Indicates transaction deadlock errors.
14Indicates security-related errors, such as permission denied.
15Indicates syntax errors in the Transact-SQL command.
16Indicates general errors that can be corrected by the user.
17Indicates that the statement caused SQL Server to run out of resources (such as memory, locks, or disk space for the database) or to exceed some limit set by the system administrator.
18Indicates a problem in the Database Engine software, but the statement completes execution, and the connection to the instance of the Database Engine is maintained.
19Indicates that a non-configurable Database Engine limit has been exceeded and the current batch process has been terminated. Error messages with a severity level of 19 or higher stop the execution of the current batch.
20Indicates that a statement has encountered a problem. Because the problem has affected only the current task, it is unlikely that the database itself has been damaged.
21Indicates that a problem has been encountered that affects all tasks in the current database, but it is unlikely that the database itself has been damaged.
22Indicates that the table or index specified in the message has been damaged by a software or hardware problem.
23Indicates that the integrity of the entire database is in question because of a hardware or software problem.
24Indicates a media failure. The system administrator may have to restore the database. You may also have to call your hardware vendor.

SQL Server errors with severity level between 16 and 24 should be monitored and administrator needs to be notified once such error occurs.

Reference: https://msdn.microsoft.com/en-us/library/ms164086.aspx?f=255&MSPPError=-2147217396

Windows Alerts and Notifications

Windows Event Log is the source of information for us to troubleshoot SQL Server or Windows related issues. The following Events are critical to monitor and once any of them occurs, Administrator needs to respond and investigate regarding the event(s).

Event IDSourceLevelDescription
1001BugCheckErrorThe computer has rebooted from a bugcheck.
257DefragErrorThe volume was not defragmented because an error was encountered: This shrink size specified is too big.
55NTFSErrorThe filesystem structure on the disk is corrupt and unusable. Please run the CHKDSK utility on the volume %S.
1057Microsoft-Windows-FailoverClusteringErrorThe cluster database could not be loaded. The file may be missing or corrupt. Automatic repair might be attempted
1090Microsoft-Windows-FailoverClusteringErrorThe Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error
1574Microsoft-Windows-FailoverClusteringErrorThe failover cluster database could not be unloaded. If restarting the cluster service does not fix the problem, please restart the machine.
1575Microsoft-Windows-FailoverClusteringErrorAn attempt to forcibly start the cluster service has failed because the cluster configuration data on this node is either missing or corrupt.
1593Microsoft-Windows-FailoverClusteringErrorThe failover cluster database could not be unloaded and any potentially incorrect changes in memory could not be discarded. The cluster service will attempt to repair the database by retrieving it from another cluster node.
1046Microsoft-Windows-FailoverClusteringErrorCluster IP address resource '%1' cannot be brought online because the subnet mask value is invalid. Please check your IP address resource properties.
1047Microsoft-Windows-FailoverClusteringErrorCluster IP address resource '%1' cannot be brought online because the address value is invalid. Please check your IP address resource properties.
1557Microsoft-Windows-FailoverClusteringErrorCluster service failed to update the cluster configuration data on the witness resource. Please ensure that the witness resource is online and accessible.
1558Microsoft-Windows-FailoverClusteringError The cluster service detected a problem with the witness resource. The witness resource will be failed over to another node within the cluster in an attempt to reestablish access to cluster configuration data.
1563Microsoft-Windows-FailoverClusteringError File share witness resource '%1' failed to come online. Please ensure that file share '%2' exists and is accessible by the cluster.
1573Microsoft-Windows-FailoverClusteringError Node '%1' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.
7000Service Control ManagerErrorThe %1 service failed to start due to the following error:
%2
7022Service Control ManagerErrorThe %1 service hung on starting.
7038Service Control ManagerErrorThe %1 service was unable to log on as %2 with the currently configured password due to the following error:
%3

How to Notify Administrator?!

To notify an administrator regarding SQL Server related errors, The database mail feature is super useful, for more information about Database Mail read this post. When it comes to Windows errors, Administrators are need to use some powershell scripts or some customized applications to run. The following PowerShell scripts emails the specific Event Log to administrator.

$log = get-eventlog -logname System| where-object {$_.EventID -eq 55} | Format-list
$result = $log | select-object -first 3
$filename = "C:\Temp\55.txt"
$result | out-file $filename
send-mailmessage -From "alert@Fard-Solutions.com" -To "SQLAdmin@fard-solutions.com" -Subject "SQL01-PROD Event ID 55" -SMTPServer "172.18.110.87" -Attachment $filename

By using Windows Task Scheduler, Administrator is able to create trigger on each Windows Event ID such as the following figure.

task-scheduler

 

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

Be the First to Comment!

Notify of
avatar
1000
wpDiscuz