SQL Server Hardware Sizing

The estimated reading time for this post is 14 minutes

Planning the hardware infrastructure for a OLAP or OLTP solution that is based on Microsoft SQL Server requires an understanding of how the various SQL Server components work together, and how their typical workloads use hardware resources.

The core-balanced system architecture is based on the fact that most OLAP or OLTP workloads need to transfer small to large amounts of data (usually accessed by sequential or random read operations) across multiple system components, from where the data is stored to the requesting applications. Each component through which the data is transferred is a potential bottleneck that will limit the overall performance of the system. The data can only flow to the requesting application at the rate of the slowest component. Any components that can operate at a higher rate are underutilized, which unbalances the system and can represent significant wasted cost.

Maximum Consumption Rate (MCR)

MCR diagram

Most Database Administrators want to start estimating hardware requirements for a database by determining the volume of data to be stored and choosing the biggest and fastest possible storage solution. The core-balanced approach starts with the throughput of the CPU core, and then builds a balanced system that is based on that metric. It is important to realize that Maximum Consumption Rate (MCR) is purely a measure of SQL Server data throughput for a single core and does not include disk read operations or network I/O. The goal is to determine the core’s MCR and use that to calculate the number of cores that are required to support the target data throughput rate. You can then design the storage and network subsystem with a balanced set of components that will feed the data to the CPU cores at a sufficient rate to maintain the MCR.

In summary:

  • MCR is not definitive of actual results for a customer workload.
  • MCR provides a maximum data processing rate baseline for SQL Server.
  • MCR is specific to a CPU and server. In general, rates for a given CPU do not vary greatly by server and motherboard architecture but final MCR should be determined by actual testing.

MCR is specific to a combination of a CPU and motherboard, and SQL Server. It is not a measure of pure processing speed or an indication of the performance that you can expect for all solution queries. Instead, MCR is a system-specific benchmark measure of maximum throughput per core for a OLAP or OLTP query workloads. Calculating MCR requires executing a query that can be satisfied from cache while limiting execution to a single core, and reviewing the execution statistics to determine the number of megabytes of data processed per second.

This metric measures the maximum SQL Server data processing rate for a standard query and data set for a specific server and CPU combination. This is provided as a per-core rate, and it is measured as a query-based scan from memory cache. MCR is the initial starting point for system design. It represents an estimated maximum required I/O bandwidth for the server, CPU, and workload. MCR is useful as an initial design guide because it requires only minimal local storage and database schema to estimate potential throughput for a given CPU. It is important to reinforce that MCR is used as a starting point for system design – it is not a measure of system performance.

Calculating MCR

A baseline CPU consumption rate for the SQL Server application is established by running a standard SQL query. This query is designed to be a relatively simple representation of a typical query for the workload type (in this case OLTP) and is run from buffer cache. The resulting value is specific to the CPU and the server the query is being executed against.

Notes:

  1. The query needs to maintain a minimum one-second execution time.
  2. Ensure that Resource Governor settings are at default values.
  3. Ensure that the query is executing from the buffer cache.
  4. Set STATISTICS IO and STATISTICS TIME to ON to output results.
  5. Run the query multiple times, at MAXDOP = # Core per Processor Socket. (For Example: 4). 

The following query is used to calculate MCR:-

SET STATISTICS IO ON;
SET STATISTICS TIME ON;
 
SELECT        pvt.[SalesPersonID], pvt.[FullName], pvt.[JobTitle], pvt.[SalesTerritory], pvt.[2011], pvt.[2012], pvt.[2013],pvt.[2014]
FROM            (SELECT        soh.[SalesPersonID], p.[FirstName] + ' ' + COALESCE (p.[MiddleName], '') + ' ' + p.[LastName] AS [FullName], e.[JobTitle], st.[Name] AS [SalesTerritory], soh.[SubTotal], YEAR(DATEADD(m, 6, 
                                                    soh.[OrderDate])) AS [FiscalYear]
                          FROM            [Sales].[SalesPerson] sp INNER JOIN
                                                    [Sales].[SalesOrderHeader] soh ON sp.[BusinessEntityID] = soh.[SalesPersonID] INNER JOIN
                                                    [Sales].[SalesTerritory] st ON sp.[TerritoryID] = st.[TerritoryID] INNER JOIN
                                                    [HumanResources].[Employee] e ON soh.[SalesPersonID] = e.[BusinessEntityID] INNER JOIN
                                                    [Person].[Person] p ON p.[BusinessEntityID] = sp.[BusinessEntityID]) AS soh PIVOT (SUM([SubTotal]) FOR [FiscalYear] IN ([2011], [2012], [2013],[2014])) AS pvt OPTION (MaxDop 4)
GO 5

 

The following is the result of I/O and Time statistics of above query.

Beginning execution loop
SQL Server parse and compile time: 
   CPU time = 312 ms, elapsed time = 313 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
(14 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesTerritory'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesPerson'. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 43769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
 SQL Server Execution Times:
   CPU time = 2328 ms,  elapsed time = 2347 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
(14 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesTerritory'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesPerson'. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 43769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
 SQL Server Execution Times:
   CPU time = 2203 ms,  elapsed time = 2212 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
(14 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesTerritory'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesPerson'. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 43769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
 SQL Server Execution Times:
   CPU time = 2235 ms,  elapsed time = 2304 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
(14 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesTerritory'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesPerson'. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 43769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
 SQL Server Execution Times:
   CPU time = 2234 ms,  elapsed time = 2271 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
 
(14 row(s) affected)
Table 'Employee'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesTerritory'. Scan count 0, logical reads 28, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesPerson'. Scan count 0, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Person'. Scan count 0, logical reads 51, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 43769, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 
 SQL Server Execution Times:
   CPU time = 2266 ms,  elapsed time = 2293 ms.
Batch execution completed 5 times.

 

Lets calculate MCR by using the following formula:

(Average Logical Reads / Average CPU Time (Sec) ) * 8 / 1024


((28 + 28 + 34 + 51 + 0 + 0 + 43769) * 5) / 5) = 43910 Average Logical Reads

(2328 + 2203 + 2235 + 2234 + 2266) / 5 = 2253.2 Average CPU Time (Milliseconds)

( 43910 / 2.253 ) * 8 / 1024 = 152.26 MB/S Per Core

Emphasize that MCR is specific to the system on which it was calculated. However, similar processor and motherboard architectures are likely to exhibit comparable MCRs. If students intend to use systems from Fast Track Data Warehouse hardware partners, the MCR for these systems is published.

Determining the Processor Cores Requirements

After finding the MCR value for each CPU cores, you may start estimating number of cores that is required to support query workloads. Remember that MCR indicates the amount of data that can be processed by one processor core in one second.

To determine the number of required processor cores, the following factors are required to be known:-

  • The amount of data is returned by average query workload.
  • Number of concurrent users.
  • Target response time for the query workload.

The following formula needs to be applied:-

( ( Average Query Result Size (MB) / MCR ) * Concurrent Users ) / Target Time (Sec) 

The MCR of the processor that we are going to use is 152.26 MB/s, and the expected average query result is 343 MB. Due to OLTP characteristic, the number of concurrent users are 200 users and our target time is 5 seconds. The following calculation finds the number of processor cores that required for such environment.

( ( 343 / 152.26 ) * 200) / 5 = 90 Processor Cores

CPU architecture does not include exactly 90 cores, so you round up to a requirement of 96.

Remember that you need to balance the number of CPUs to closely match the number of storage arrays that will be used, which in turn may depend on the volume of data that your data warehouse must support.

Determining the Memory (RAM) Requirements

Calculating the amount of RAM that is required is difficult because many workloads can utilize memory to increase overall performance. You should generally consider a minimum figure for a small to medium sized OLTP or OLAP system to be 4 GB per core, or 64 to 128 GB per CPU socket. If you intend to use columnstore indexes or support tabular data models on the data warehouse server, you should favor the higher end of these estimates. Another way to estimate memory requirements is to consider that, in an average data warehouse workload, users regularly need to access approximately 20 percent of the data that is stored.

Determining Database Storage Requirements

Before you can fully determine CPU, memory, and storage hardware requirements, you must assess the volume of data that the system must support. Most databases consist predominantly of data. Determining the volume of data that the database must store is the most significant factor in assessing overall storage requirements.

To start estimating data volumes, determine the number of rows that will be initially loaded into the database and multiply that by the average size of a row. If you don’t know the average row size at this stage, use a conservative estimate such as 100 bytes per row. For example, a database that will contain 200,000,000 rows, each 100 bytes in length, will have an initial fact data volume of approximately 20 GB.

After estimating the initial data, add approximately 30 to 40 percent to allow for indexes. So, to continue the example with 20 GB of data, you would add approximately 8 GB (40 percent of 20 GB), giving an initial data volume of approximately 28 GB.

To be sure that your storage solution will support the database in the future (say, three years from now), you must factor in the anticipated incremental data that will be loaded. For example, suppose the fact data in our database represents individual items that have been ordered in sales transactions, and the company typically sells 5,000,000 items a month, you can expect to load 5,000,000 rows (each containing 100 bytes of data), or approximately 500 MB each month. That equates to a data growth rate of 6 GB per year, so in three years, the example database would need to support the initial 28 GB of data plus another 18 GB (6 GB per year multiplied by three years), giving a total of 46 GB. 

You can plan to compress the data in your database. Typically, SQL Server provides a compression factor of approximately 3:1, so the 46 GB of data should compress to approximately 15.5 GB on disk

In addition to the database, you must include other data in your storage estimation. Additional storage is required for:

  • Configuration Databases: If databases that are used by other BI services, including the SSIS Catalog and Reporting Services databases, are to be installed on the database server, you should include them in your storage estimate. In addition, the SQL Server instance includes system databases, although in practice, these are usually stored separately from the database data files.
  • Transaction Log Files: Each database requires a transaction log. Typically, databases are configured to use the simple recovery model (for Data Warehouse) and Full recovery model (for OLTP).
  • TempDB: Many database queries require temporary storage space. It is generally recommended to locate tempdb on a suitable storage column and assign an appropriate initial size to avoid the system having it grow automatically as needed. 
  • Staging Tables: Whether data is staged in a dedicated staging database, in tables within the data warehouse database itself, or in a combination of both, you must allocate enough space to allow for data staging during ETL processes.
  • Backups: If you intend to back up the database and other databases to disk, you must ensure that the storage design provides space for backup files.  
  • Analysis Services Models: If you intend to host multidimensional Analysis Services data models on the data warehouse server, you must allocate sufficient disk space for them.

Storage System Considerations

The optimal storage hardware solution for a data warehouse depends on several factors, including the volume of data and the system MCR that data throughput from the storage system must support. When planning a storage solution, consider the following guidelines:

  • Disk Size: Use more, smaller disks instead of fewer, larger disks. Although it is possible to create a database that stores all of its data on a single, large hard disk, it is usually possible to achieve a better balance of throughput (and therefore overall system performance) by distributing the data across multiple small disks. This enables multiple disk reads to be performed in parallel and reduces wait times for I/O operations.
  • Disk Speed: Use the fastest disks that you can afford. Disk technologies have advanced dramatically in recent years, with the speed of mechanical disks increasing and the advent of solid state disks with no moving parts. However, a faster disk means greater throughput when reading data. Solid state disks are typically more expensive than mechanical disks, but if disk performance is critical, you may decide that the additional cost is worth paying. The lack of moving parts makes them particularly effective for random I/O data access, which is typical of queries against database.
  • RAID: Use RAID 10, or minimally RAID 5. RAID 10 (in which data is both mirrored and striped) provides the best balance of read performance and protection from disk failure, and this should usually be the first choice for a databases. However, the requirement for a complete set of redundant disks per array can make this an expensive option. As an alternative, you can use RAID 5, which provides striping for high read performance and parity-based data redundancy to protect against disk failure.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

Be the First to Comment!

Notify of
avatar
1000
wpDiscuz