Database Storage Options

The estimated reading time for this post is 14 minutes

One of the most important role of database administrators is working with Microsoft SQL Server to manage database and storage. It is important to know how data is stored in databases, how to create databases, how to manage database files and how they are transmitted.

SQL Server Storage Performance

I/O Performance

I/O performance has a direct impact on the overall performance of SQL Server, and it is important that the storage layer is designed and implemented so that it provides the best performance for the server’s expected workload.

When designing storage for your SQL Server, it is important to consider the possible workload. A read-heavy database has different requirements to a write-heavy database. You can use built-in dynamics management perspectives, such as sys.dm_io_virtual_file_stats, to understand the performance of SQL Server.

Number of Spindles

Hard disks are mechanical devices with platters that can only spin at a certain speed, and heads that can only move at a certain speed, reading a single piece of data at a time. These performance limiting characteristics can be an issue for SQL Server, which in a production environment, may need to deal with hundreds or even thousands of requests per minute.

RAID Disks

Many storage solutions use RAID hardware to provide fault tolerance through data redundancy, and in some cases, to improve performance.

Commonly used types of RAID include:

  • RAID 0, disk striping. A stripe set consists of space from two or more disks that is combined into a single volume. The data is distributed evenly across all of the disks, which improves I/O performance. RAID 0 offers no redundancy, and if a single disk fails, the volume becomes inaccessible.
  • RAID 1, disk mirroring. A mirror set is a logical storage volume based on space from two disks, with one disk storing a redundant copy of the data on the other. Mirroring can provide good read performance, but write performance can suffer. RAID 1 is expensive for storage because 50 percent of the available disk space is used to store redundant data.
  • RAID 5, disk striping with parity. RAID 5 offers fault tolerance using parity data that is written across all the disks in a striped volume that is comprised of space from three or more disks. RAID 5 typically performs better than RAID 1. However, if a disk in the set fails, performance degrades. In terms of disk space, RAID 5 is less costly than RAID 1 because parity data only requires the equivalent of one disk in the set to store it. For example, in an array of five disks, four would be available for data storage, which represents 80 percent of the total disk space.
  • RAID 10, mirroring with striping. In RAID 10, a non-fault tolerant RAID 0 stripe set is mirrored. This arrangement delivers the excellent read/write performance of RAID 0, combined with the fault tolerance of RAID 1. However, RAID 10 can be expensive to implement because, like RAID 1, 50 percent of the total space is used to store redundant data.

Database Storage Options - RAID
image-3847

Storage Spaces

You can use storage spaces to virtualize storage by grouping industry-standard disks into storage pools. You can then create virtual disks, called storage spaces, from the available capacity of the storage pool. You can create storage pools from hard disks and solid-state disks of either IDE, SATA, SAS, or USB formats.

resiliency of storage spaces is available with three different levels, depending on your storage needs.

  • Simple: Stripes data across all disks as a single copy with no parity and is technically similar to a RAID 0 disk set. Simple maximizes storage capacity, giving high performance but offering no resiliency. Losing a disk will mean data is lost.
  • Mirroring: Writes multiple copies of data across multiple disks in a similar way to a RAID 1 disk set. In the event of disk failure, mirroring offers maximum protection for your data, giving good read and write performance, though disk capacity is reduced.
  • Parity: Writes a single copy of data striped across the disks, along with a parity bit to assist with data recovery. Parity gives good capacity and read performance, but write performance is generally slower, due to the need to calculate parity bits. Parity is similar to a RAID 5 disk set.

Data and Log Storage

SQL Server uses two types of database file: data files and transaction log files.

  • Data Files. Every database has at least one data file. The first data file usually has the filename extension .mdf; in addition to holding data pages, the file holds pointers to the other files used by the database. Additional data files usually have the file extension .ndf, and are useful for spreading data across multiple physical disks to improve performance.
  • Transaction Log Files. Transaction log files hold details of transactions on a database. The information in a transaction log file is useful if you need to recover a database to a point in time between backups. There must be at least one transaction log file for each database. All transactions are written to the log file using the write-ahead logging (WAL) mechanism to ensure the integrity of the database in case of a failure, and to support rollbacks of transactions. The recommended extension for log files is .ldf.

database storage options - Data and Log Storage
image-3848

Write performance of transaction log files is critical to database performance; the files are usually placed on fast resilient disks, such as a RAID 1 or RAID 10 disk set. You should avoid putting transaction log files onto RAID 5 disk sets because the time taken to calculate parity bits can impact SQL Server performance.

tempdb Storage

The performance of the tempdb database is critical to the overall performance of most SQL Server installations. The tempdb database consists of the following objects:

  • Internal objects. SQL Server uses these objects for its own operations, including work tables for cursor or spool operations, temporary large object storage, work files for hash join or hash aggregate operations, and intermediate sort results.
  • Row versions. Transactions that are associated with snapshot-related transaction isolation levels can cause alternate versions of rows to be briefly maintained in a special row version store in tempdb. Other features can also produce row versions, such as online index rebuilds, Multiple Active Result Sets (MARS), and triggers.
  • User objects. Consist of temporary tables, table variables, result sets of multi-statement table-valued functions, and other temporary row sets.

Planning tempdb Location and Size

By default, tempdb data and log files are stored in the same location as the files for all other system databases. If your SQL Server instance must support database workloads that make extensive use of temporary objects, you should consider moving tempdb to a dedicated volume, to avoid fragmentation of data files, and set its initial size based on how much it is likely to be used. You can leave autogrowth enabled, but set the growth increment to be quite large, to ensure that performance is not interrupted by frequent growth of tempdb. You can choose the location of tempdb files during installation, or move them later if required.

Running out of disk space in the tempdb database can cause significant disruptions in the SQL Server production environment, and prevent applications that are running from completing their operations. You can use the sys.dm_db_file_space_usage DMV to monitor the disk space usage of files. Additionally, to monitor the page allocation or deallocation activity in tempdb at the session or task level, you can use the sys.dm_db_session_space_usage and sys.dm_db_task_space_usage DMVs. By default, MAXSIZE of the tempdb files is set to UNLIMITED. Therefore, tempdb can continue growing until it fills all the space on the disk.

Using Multiple Files

Increasing the number of files in tempdb can overcome I/O restrictions and avoid latch contention during page free space (PFS) scans, when temporary objects are created and dropped, resulting in improved overall performance. You should not create too many files, because this can degrade the performance. As a general rule, it is advised to have 0.25-1 file per processing core, with the ratio lower as the number of cores on the system increases. However, you can only identify the optimal configuration for your system by doing real live tests.

Other Storage Options

Network Attached Storage (NAS) is a low cost and easy to administer. While it is possible to use NAS as storage for SQL Server, it is not usually recommended because NAS can rarely offer the performance that SQL Server requires. NAS I/O is limited by network connectivity and network bandwidth is usually shared with other traffic. Some NAS devices use RAID disk sets internally.

Direct Attached Storage (DAS) is a traditional storage solution where disks are attached directly to the physical server. DAS, when combined with appropriate RAID configuration, generally offers excellent performance and low latency. DAS is not shared, and therefore can only be accessed by or via the server to which it is attached. It can be difficult to expand.

Storage Area Network (SAN) generally comprises a separate network used specifically for attaching dedicated storage devices. SANs are fast, reliable, and scalable, but can be expensive to implement. SANs generally come in two formats, Fiber Channel and iSCSI, with the latter being cheaper and slower. SAN implementations vary by vendor and you should check their documentation carefully before choosing a SAN technology. A SAN will generally give performance that is almost as good as DAS, although latency is usually higher. SAN administration can be difficult and it is not unusual for large organizations to employ specialists just to look after a SAN.

SMB Fileshare

Server Message Block (SMB) is a network protocol developed by Microsoft built into the Microsoft Windows operating system. SMB has traditionally been used for file sharing and printer access. However, improvements made to recent versions of SMB and advances in networking technologies have dramatically improved performance and reliability. It is a viable option for storage in areas where it may not have previously been considered, such as SQL Server data storage.

SMB file locations are addressed using standard UNC paths. For clients and servers running Windows 8, Windows server 2012, or later, an SMB file cluster can be built making cost effective, highly available storage solutions.

SMB can rival the performance of fiber channel storage solutions, at a fraction of the cost. Using SMB storage for SQL Server offers considerable management benefits when compared to DAS and SAN solutions. Moving a database to a new SQL Server instance becomes a simple process of detaching the database from one instance of SQL Server then attaching it to the other.

Some SMB shares which cannot be used for SQL Server storage:

  • Administrative shares (for example, \\servername\c$).
  • Loopback shares (for example, \\127.0.0.1\sharename).
  • Mapped network drives.

SQL Server Storage in Microsoft Azure

With Microsoft SQL Server 2016, you can create databases which store data files in Microsoft Azure storage as Microsoft Azure page blobs. On-premise and Microsoft Azure virtual machines both support this storage format, natively providing an entirely independent and dedicated storage location for your SQL Server data.

To use SQL Server data files in Microsoft Azure, you create a storage account, along with a container. You then create a SQL Server credential containing policies and the shared access signature essential to access the container. You can store page blobs inside the container, with each having a maximum size of 1 TB. There is no limit to the number of containers that a storage account can have, but the total size of all containers must be under 500 TB.

database storage options - Storage in Microsoft Azure
image-3849

The main benefits of storing SQL Server data files in Microsoft Azure are:

  • Simpler Migration: The migration of databases between on-premise and the cloud is greatly simplified. You can move data to the cloud without changing your applications.
  • Separate Compute and Storage Nodes: Using on-premise compute nodes, along with SQL Server data files in Azure, separates compute nodes from storage nodes. If you lose a compute node, you can quickly start up another without any data movement and simply attach the data files stored in Microsoft Azure to new compute node.
  • Simpler Disaster Recovery: When storing SQL Server data files in Microsoft Azure, you can greatly simplify disaster recovery. If a physical or virtual SQL Server instance fails, you can quickly start up a new instance and attach the SQL Server data files stored in Microsoft Azure.
  • Security: By separating the compute node from the storage node, you can have an encrypted database where decryption only occurs on the compute node. All the data in the cloud is encrypted using Transparent Data Encryption (TDE) certificates stored in the master database in the on-premise instance.
  • Backup Strategy: With SQL Server data files in Azure, you can use Azure snapshots for almost instantaneous backups of your data.

Implementing SQL Server Data Files in Azure

Implementing SQL Server data files in Microsoft Azure requires the following steps:

  1. Create an Azure storage account.
  2. Add a container to the account and apply an access policy.
  3. Generate an SAS key for accessing the data.
  4. Create a credential on the required SQL Server instance with a name that matches the Azure storage account container, like following example:

CREATE CREDENTIAL [https://myStorage.blob.core.windows.net/data] WITH IDENTITY = ‘SHARED ACCESS SIGNATURE’, SECRET = ‘CONTAINER SAS KEY’

  1. Create your database with the data and log files in the Microsoft Azure container, like following example:

CREATE DATABASE myDB on ( NAME = myDB_data,  FILENAME = ‘https://myStorage.blob.core.windows.net/data/myDB_data.mdf’ } LOG ON ( NAME = myDB_log, FILENAME = ‘https://myStorage.blob.core.windows.net/data/myDB_log.ldf’ )

Stretch Database

Stretch Database is a feature of SQL Server 2016 which means data can be split between on-premise storage and cloud storage. With Stretch Database, cold, historical data is kept in the cloud and, hot, active data is kept on-premise for maximum performance. Stretch Database requires no changes to client applications or existing Transact-SQL queries, so you can implement it seamlessly for existing applications.

Stretch Database Architecture

With Stretch Database, the storage and query processing for cold data is offloaded to the Microsoft Azure cloud. When you enable Stretch Database for a database, SQL Server creates a secure linked server that uses the remote Microsoft Azure instance as the endpoint.

When you execute a query against a database with Stretch Database, it results in the query being run against both the on-premise and the secure linked server. Microsoft Azure rewrites the query, which is then shown in the local execution plan as a remote query operator.

database storage options - stretch database
image-3850

Stretch Database Security

Before you can use Stretch Database, you must enable the instance level remote data archive configuration option. You can do this by using the sp_configure stored procedure with either sysadmin or serveradmin privileges. 

EXEC SP_CONFIGURE N’remote data archive’, N’1′

User with the control database permission can enable a database for Stretch Database, although they also need to provide administrator credentials for the Azure remote endpoint. You can only configure a table for Stretch Database if it belongs to a Stretch Database enabled database. Any user with ALTER privilege for the table can configure Stretch Database for it.

Data Security

Only server processes can access Stretch Database linked server definition. It is not possible for a user login to query the Stretch Database linked server definition directly.

Enabling a database for Stretch Database does not change the existing database or table permissions. A user’s permissions remain the same, regardless of where the data is physically stored.

Stretch Database Advisor

The Stretch Database Advisor helps you identify databases and tables that are candidates for Stretch Database. It will also identify any issues preventing a database or table from being eligible for Stretch Database. The Stretch Database Advisor is part of the Microsoft SQL Server Upgrade Advisor. Analyzing a database for compatibility involves the following steps:

  1. Start the Microsoft SQL Server Upgrade Advisor application.
  2. On the Scenarios tab, click Run Stretch Database Advisor.
  3. Click Select Databases to Analyze and enter appropriate server and authentication details.
  4. Click Connect, select the databases you wish to analyze, and then click Select.
  5. Click Run to start the analysis.

Implement a Stretch Database

Implementing Stretch Database involves the following steps:

  1. Start Microsoft SQL Server Management Studio and connect to the instance of SQL Server.
  2. In Object Explorer, expand Databases.
  3. Right-click the database, point to Tasks, point to Stretch, and then click Enable.
  4. Complete the steps in the Enable Database for Stretch wizard to create a Database Master Key; identify the appropriate tables and configure the Microsoft Azure deployment.

After implementing Stretch Database, you can monitor it from SQL Server Management Studio.

  • In Object Explorer, expand Databases.
  • Right-click the stretch-enabled database, point to Tasks, point to Stretch, and then click Monitor to open the Stretch Database Monitor.

This monitor shows information about both the local and Azure SQL instances, along with data migration status.

Note: If, after implementing Stretch Database, you later decide to disable Stretch Database for a table, you must migrate the data to the local instance before disabling, otherwise you will lose any data in the stretch table.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

avatar
1000
  Subscribe  
Notify of