Clustered Indexes in SQL Server

The estimated reading time for this post is 7 minutes

An index is a collection of pages associated with a table. Indexes are used to improve the performance of queries or enforce uniqueness. You can use indexes to point to the location of required data and to minimize the need for scanning entire tables.

At this point, it is useful to consider an analogy that might be easier to relate to. Consider a physical library. Most libraries store books in a given order, which is basically an alphabetical order within a set of defined categories. Note that even when you store the books in alphabetical order, there are various ways to do so. The order of the books could be based on the name of the book or the name of the author. Whichever option is chosen makes one form of access easy and other forms of access harder. For example, if books were stored in book name order, how would you locate books that were written by a particular author? Indexes assist with this type of problem.

Indexes in database systems are often based on binary tree (B-Tree) structures but SQL Server indexes can have a large number of children per node (level). The large number of children that each node in the index can have produces a very flat index structure. Indexes with only three or four layers are very common.

Index Fragmentation

Index fragmentation is the inefficient use of pages within an index. Fragmentation occurs over time as data is modified. There are two types of SQL index fragmentation, internal fragmentation and external fragmentation.

Internal fragmentation is when the records are not stored contiguously inside of the page. When there is unused space between records on a page, which occurs through data modification that is made against the table and to the indexes defined on the table, this is internal SQL index fragmentation.

The second type of SQL index fragmentation, external fragmentation, happens when physical storage of pages is un-contiguous which can cause higher disk rotations creating another form of SQL Server fragmentation.

Data Types and Indexes

Single-Column and Composite Indexes

The indexes that have been discussed so far have been based on data from single columns. Indexes can also be based on data from multiple columns and constructed in ascending or descending order. In business applications, composite indexes are often more useful than single-column indexes. The advantages of composite indexes are:

  • Higher selectivity.
  • The possibility of avoiding the need to sort the output rows.

When you are constructing composite indexes, in the absence of any other design criteria, you should typically index the most selective column first.

Each component of an index can be created in an ascending or descending order. For single-column indexes, ascending and descending indexes are equally useful. For composite indexes, specifying the order of individual columns within the index might be useful. Composite indexes can benefit from each component having a different order. Often this is used to avoid sorts.

SQL Server keeps statistics on indexes to assist when making decisions about how to access the data in a table. It is important to realize that SQL Server must make this decision before it begins to execute a query.

There are two kind of table structures in SQL:

  1. Heap: A heap is a table that has no enforced order for either the pages within the table or for the data rows within each page.
  2. Clustered Index: A table that has a clustered index has a predefined order for rows within a page and for pages within the table. The order is based on a key that consists of one or more columns. The key is commonly called a clustering key. The rows of a table can only be in a single order, so there can only be one clustered index on a table. Clustered indexes are always identified by using index id = 1

SQL Server must be able to uniquely identify any row in a table. Clustered indexes can be created as unique or non-unique. If you do not specify indexes as being unique, SQL Server will add another value to the clustering key where necessary to ensure that the values are unique for each row. This value is commonly called a “uniqueifier.”

Nonclustered Index: There are additional indexes on the tables to provide alternate ways to rapidly locate required data. These additional indexes are called nonclustered indexes.A table can have up to 999 nonclustered indexes. These indexes are assigned index IDs that are greater than or equal to 2. Nonclustered indexes can be defined on a table regardless of whether the table uses a clustered index or a heap.

Nonclustered Indexes over Heaps

Nonclustered Indexes over Clustered Indexes

Working With Clustered Indexes

Creating Clustered Indexes :

It is possible to create clustered indexes either directly by using the CREATE INDEX command or automatically in some situations where a PRIMARY KEY constraint is specified on the table. It is very important to understand the distinction between a primary key and a clustering key. Many users confuse the two terms or attempt to use them interchangeably. A primary key is a constraint. It is a logical concept that is supported by an index, but the index may or may not be a clustered index. When a PRIMARY KEY constraint is added to a table, the default action in SQL Server is to make it a clustered primary key if no other clustered index already exists on the table. You can override this action by specifying the word NONCLUSTERED when declaring the PRIMARY KEY constraint.

In the first example on the slide, the dbo.Article table was created. The ArticleID column has a PRIMARY KEY constraint associated with it. There is no other clustered index on the table, so the index that is created to support the PRIMARY KEY constraint will be created as a clustered primary key. ArticleID will be both the clustering key and the primary key for the table.

CREATE TABLE dbo.Articale
(ArticaleID INT IDENTITY(1,1) PRIMARY KEY,
ArticaleName NVARCHAR(50) not null,
PublicationDate DATE not null)

In the secocnd example on the slide, the dbo.LogData table is initially created as a heap. When the PRIMARY KEY constraint is added to the table, no other clustered index is present on the table, so SQL Server will create the index to support the PRIMARY KEY constraint as a clustered index.

CREATE TABLE dbo.LogData
(LogID INT IDENTITY(1,1) ,
LogData xml not null)
 
ALTER TABLE dbo.LogData
ADD CONSTRAINT PK_LogData
PRIMARY KEY(LogID)

If a table has been created as a heap, it can be converted to a clustered index structure by adding a clustered index to the table. In the fourth command shown in the examples on the slide, a clustered index named CL_LogTime is added to the dbo.LogTime table and the LogTimeID column is the clustering key.

CREATE CLUSTERED INDEX CL_LogTime
ON dbo.LogTime(LogTimeID)

Dropping a Clustered Index:

You can use the DROP INDEX command to drop clustered indexes that were created by using the CREATE INDEX command. Indexes that are created internally to support constraints need to be removed by removing the constraint.  

DROP INDEX CL_LogTime ON dbo.LogTime;
 
ALTER TABLE dbo.LogData
DROP CONSTRAINT PK_LogData;

Altering a Clustered Index:

A few maintenance operations are possible by using the ALTER INDEX statement, such as rebuilding or reorganizing an index. (Reorganizing an index only affects the leaf level of the index.). The CREATE INDEX command includes a WITH DROP_EXISTING clause that can enable the statement to change the structure of an index. Although the ALTER INDEX statement includes a DISABLE option that can be applied to any index, this option is of limited use with clustered indexes. After a clustered index is disabled, no access to the data in the table is then permitted until it is rebuilt.

The FILLFACTOR and PAD_INDEX options are used to provide free space within index pages. FILLFACTOR defaults to 0, which means “fill 100 percent.” Any other value (including 100) is taken as the percentage of how full each page should be. FILLFACTOR only applies to leaf-level pages in an index. PAD_INDEX is an option that, when it is enabled, causes the same free space to be allocated in the nonleaf levels of the index.

ALTER TABLE Person.Contact
ADD CONSTRAINT PK_Contact_ContactID
PRIMARY KEY CLUSTERED (ContactID)
WITH (pad_index=OFF,FILLFACTOR=70)

Characteristics of Good Clustering Keys

Typically, keys should be short, static, increasing (not necessarily monotonically) and unique. They don’t able to be more than 16 columns and 900 bytes.

I hope this blog would be informative for you, stay tuned for more blog posts.

Saeed Jabarpourfard

Saeed is a Microsoft Certified Professional and Microsoft Certified Solutions Associate: SQL Server 2016 Database Developer, with extensive experience in Database Development and Administration. He has been working with SQL Server since year 2010, and he used SQL Server 2008 R2 as his first RDBMS.

More Posts - Website


Leave a Comment

avatar
1000
  Subscribe  
Notify of