SQL Server Partitioning

The estimated reading time for this post is 8 minutes

SQL Server provides partitioning feature to partition the data and distribute them within few data files to provide better manageability and also performance increment. SQL Server only provides ‘Range Partitioning’ which there are other few partitioning techniques such as:-

  • List Partitioning
  • Hash Partitioning
  • Composite Partitioning

Partitioning Methods

Horizontal partitioning involves putting different rows into different tables. For example, customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called “row splitting” (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

SQL Server supports table and index partitioning. The data of partitioned tables and indexes is divided into units that can be spread across more than one filegroup in a database. The data is partitioned horizontally, so that groups of rows are mapped into individual partitions. All partitions of a single index or table must reside in the same database. The table or index is treated as a single logical entity when queries or updates are performed on the data. Partitioned tables and indexes are not available in every edition of Microsoft SQL Server. For a list of features that are supported by the editions of SQL Server.

Partitioning Definitions

Range partitioning: Selects a partition by determining if the partitioning key is inside a certain range. An example could be a partition for all rows where the column zip code has a value between 70000 and 79999.

List partitioning: A partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen. For example all rows where the column Country is either Iceland, Norway, Sweden, Finland or Denmark could build a partition for the Nordic countries.

Hash partitioning: The value of a hash function determines membership in a partition. Assuming there are four partitions, the hash function could return a value from 0 to 3.

Composite partitioning: allows for certain combinations of the above partitioning schemes, by for example first applying a range partitioning and then a hash partitioning. Consistent hashing could be considered a composite of hash and list partitioning where the hash reduces the key space to a size that can be listed.

Partitioning Info Zone

To maximize performance with parallel operations, we recommend that you use the same number of partitions as processor cores, up to a maximum of 64 (which is the maximum number of parallel processors that SQL Server can utilize).
We recommend that you use at least 16 GB of RAM if a large number of partitions are in use. If the system does not have enough memory, Data Manipulation Language (DML) statements, Data Definition Language (DDL) statements and other operations can fail due to insufficient memory.
Beginning with SQL Server 2012, statistics are not created by scanning all the rows in the table when a partitioned index is created or rebuilt. Instead, the query optimizer uses the default sampling algorithm to generate statistics.

Partitioning Samples

Range Partition:

CREATE DATABASE Range_Partition;
go
ALTER DATABASE Range_Partition ADD FileGroup [Range1To1000];
ALTER DATABASE Range_Partition ADD FileGroup [Range1001To2000];
ALTER DATABASE Range_Partition ADD FileGroup [Range2001ToN];
Go
ALTER DATABASE Range_Partition ADD FILE (Name='Range1To1000',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Range1To1000.ndf') TO Filegroup [Range1To1000];
Go
ALTER DATABASE Range_Partition ADD FILE (Name='Range1001To2000',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Range1001To2000.ndf') TO Filegroup [Range1001To2000];
Go
ALTER DATABASE Range_Partition ADD FILE (Name='Range2001ToN',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Range2001ToN.ndf') TO Filegroup [Range2001ToN];
Go
USE Range_Partition;
Go
CREATE Partition FUNCTION PF_RangeID (BIGINT)
AS Range RIGHT FOR VALUES (1000,2000);
Go
CREATE Partition Scheme PS_RangeID  AS Partition PF_RangeID
TO ([Range1To1000],[Range1001To2000],[Range2001ToN]);
Go
 
CREATE TABLE Items (ID BIGINT IDENTITY(1,1) PRIMARY KEY, Padding BINARY(10) DEFAULT 0xFFF) ON PS_RangeID(ID);
Go
 
INSERT INTO Items DEFAULT VALUES
Go 3000
 
Go
 
SELECT $Partition.PF_RangeID(ID), * FROM Items;

 

Composite Partition (List + Range):

CREATE DATABASE List_Partition;
go
ALTER DATABASE List_Partition ADD FileGroup [RangeAToG];
ALTER DATABASE List_Partition ADD FileGroup [RangeHToN];
ALTER DATABASE List_Partition ADD FileGroup [RangeOToZ];
Go
ALTER DATABASE List_Partition ADD FILE (Name='RangeAToG',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\RangeAToG.ndf') TO Filegroup RangeAToG;
Go
ALTER DATABASE List_Partition ADD FILE (Name='RangeHToN',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\RangeHToN.ndf') TO Filegroup RangeHToN;
Go
ALTER DATABASE List_Partition ADD FILE (Name='RangeOToZ',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\RangeOToZ.ndf') TO Filegroup RangeOToZ;
Go
USE List_Partition;
Go
CREATE Partition FUNCTION PF_City(CHAR(1))
AS Range RIGHT FOR VALUES ('G','N','Z');
Go
CREATE Partition Scheme PS_City AS Partition PF_City
TO (RangeAToG,RangeHToN,RangeOToZ,[PRIMARY]);
Go
 
CREATE TABLE Customers (ID BIGINT IDENTITY(1,1), Padding BINARY(10) DEFAULT 0xFFF, City VARCHAR(10), PFCITY AS CAST(SUBSTRING(City,1,1) AS CHAR(1)) Persisted) ON PS_City(PFCITY);
 
Go
INSERT INTO Customers (City) VALUES ('New York'),('W. DC'),('K. Lumpur'),('Tehran'),('Torrento'),('London'),('Paris'),('Arizona'),('Texas');
 
Go
 
SELECT $Partition.PF_City(PFCITY), * FROM Customers;

 

Composite Partition (Hash + Range):

CREATE DATABASE Hash_Partition;
go
ALTER DATABASE Hash_Partition ADD FileGroup [Hash100];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash200];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash300];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash400];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash500];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash600];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash700];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash800];
ALTER DATABASE Hash_Partition ADD FileGroup [Hash900];
Go
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash100',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash100.ndf') TO Filegroup Hash100;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash200',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash200.ndf') TO Filegroup Hash200;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash300',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash300.ndf') TO Filegroup Hash300;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash400',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash400.ndf') TO Filegroup Hash400;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash500',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash500.ndf') TO Filegroup Hash500;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash600',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash600.ndf') TO Filegroup Hash600;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash700',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash700.ndf') TO Filegroup Hash700;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash800',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash800.ndf') TO Filegroup Hash800;
ALTER DATABASE Hash_Partition ADD FILE (Name='Hash900',Filename='C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Hash900.ndf') TO Filegroup Hash900;
 
USE Hash_Partition;
 
Go
 
CREATE Partition FUNCTION PF_HashName (INT)
AS Range RIGHT FOR VALUES (100,200,300,400,500,600,700,800,900);
 
Go
 
CREATE Partition Scheme PS_HashName AS Partition PF_HashName
TO (Hash100,Hash200,Hash300,Hash400,Hash500,Hash600,Hash700,Hash800,Hash900,[PRIMARY]);
 
Go
 
CREATE TABLE Customers (ID BIGINT IDENTITY(1,1), Firstname VARCHAR(25), Lastname VARCHAR(25), HashName AS ABS(CONVERT(INT,(hashbytes('SHA1',Firstname+Lastname))))%999 Persisted ) ON PS_HashName(HashName);
 
Go
 
INSERT INTO Customers (Firstname,Lastname) VALUES ('Hamid','J. Fard'),('John','Smith'),('Kevin','McMorphy'),('Bryan','Jackson');
Go
 
SELECT $Partition.PF_HashName(HashName),* FROM customers;

 

Conclusion

Partitioning large tables or indexes can have the following manageability and performance benefits.

  • You can transfer or access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. For example, an operation such as loading data from an OLTP to an OLAP system takes only seconds, instead of the minutes and hours the operation takes when the data is not partitioned.
  • You can perform maintenance operations on one or more partitions more quickly. The operations are more efficient because they target only these data subsets, instead of the whole table. For example, you can choose to compress data in one or more partitions or rebuild one or more partitions of an index.
  • You may improve query performance, based on the types of queries you frequently run and on your hardware configuration. For example, the query optimizer can process equi-join queries between two or more partitioned tables faster when the partitioning columns in the tables are the same, because the partitions themselves can be joined.
  • When SQL Server performs data sorting for I/O operations, it sorts the data first by partition. SQL Server accesses one drive at a time, and this might reduce performance. To improve data sorting performance, stripe the data files of your partitions across more than one disk by setting up a RAID. In this way, although SQL Server still sorts data by partition, it can access all the drives of each partition at the same time. In addition, you can improve performance by enabling lock escalation at the partition level instead of a whole table. This can reduce lock contention on the table.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

3 Comments on "SQL Server Partitioning"

Notify of
avatar
1000
Sort by:   newest | oldest | most voted
Evgeniy Vorobiev
Guest

If you want to store the first 1000 rows in the first file group “Range1To1000” then you need to use
“LEFT for values (1000,2000);”
because when we specified “RIGHT for values (1000,2000)”
then ID = 1000 is stored in the second file group, and Id=2000 in the third

The same for the “Composite Partition (List + Range)”:
if we use “RIGHT for values (‘G’,’N’,’Z’)” then city with name “G” stored in the second file group “RangeHToN”

Vithor da Silva
Guest

Very nice, congratulations!

wpDiscuz