New theory to avoid Key Lookup and Scan operations in SQL Server Database Engine.

The estimated reading time for this post is 6 minutes

new-SQL-bannerIntroduction

Index is like many things in life; you may not appreciate it until you need it for the first time. Over the years we have all seen cases where there was no proper index when one was needed; it has resulted in accessing data very slow. Companies often have what seem like valid reasons for not tune their databases. We often hear quotes like: “We have a robust hardware, so we do not need to tune” or “We don’t need a tuning strategy, because we do not have an expert to do it every day.” What we have learned throughout the years is that you always need to tune databases by providing proper index and database structure.

Index Solution Overview

 

The most important most important index type in SQL server is the clustered index which changes the table structure to provide better performance. Clustered index has some limitations such as the index key length which is 900-byte and maximum 16 index key column can be accommodated. Every index in SQL server has “Edge” or “Lead Key” which is the first index key column. Below figure shows the list of index keys of the specific index.

index1

 

The edge of the above index is SalesOrderID column, SQL server optimizer does Clustered Index Seek when the query’s predicate includes the index edge key, and also SQL server optimizer does the Clustered Index Scan operation when developer needs to search based on other columns which are not in the index key list or they are not the edge of the index. For example:

index2

The SQL server does Clustered Index Seek operation when the index edge is included in the query predicate, take a look at below execution plan.

index3

 

The next query is without index edge as predicate.

index4

SQL server uses Clustered Index Scan operation for above query.

index5

SQL server uses Scan operation instead of Seek due to index edge, to resolve this issue SQL server provides NonClustered Index which it has some limitations such as the index key length and also it does not support additional columns other than index key column by default, unless the developer or administrator instruct the SQL server to include additional columns in the NonClustered Index by INCLUDE key word, which this technique can resolve most of the performance issues. For example below NonClustered Index resolves the Clustered Index Scan operation for above query.

index6

As you see in the below execution plan, SQL server uses NonClustered Index Seek instead of Clustered Index Scan for above query.

index7

NonClustered Index has some disadvantages such as duplicating the data, decrease the data modification operation’s performance. NonClustered Index also has same issue as Clustered Index about index edge key. SQL server uses Clustered Index Scan operation or totally ignore the NonClustered Index when the index edge key is not exists in the query predicate. Below statement creates new index on the SalesOrderDetail table to cover other type of queries.

index8

Let’s execute below query.

index9

SQL server does NonClustered Index Seek because the index edge key is in the query predicate as you see in the below execution plan.

index10

We change the above query and remove the SalesOrderDetailID predicate.

index11

SQL Server uses Clustered Index Scan operation.

index12

As bottom line, Clustered Index and NonClustered Index are good enough to cover up to 50% of the total table columns for query predications, due to index cost such as storage, data modification operation’s performance and complexity of each index, they are not recommended to use on all of the table columns.

Proposed Solution

As I observed SQL Server optimizer operation, it is very dependent on the index edge key which in most cases the query does not include edge key in the query predicates. To resolve above issue, I came with new theory to find the specific data regardless of index edge key or number of indexes.

Is it possible to take both clustered and non-clustered indexes advantage at one time? Or in other word, is it possible to search data without worrying about edge key in the index? I have come out with a kind of theory to search data based on every column without using index key. In this theory, SQL Server database engine creates statistics for every column which it keeps unique values of the column and number of the specific value plus PageID and SlotID.

For example, there is a table such below structure:

ObjectID = 101 , PageID = 990

table1

At this point of time, SQL Server database engine creates statistics object of every column in the above table, such below.

table2

 

At the next stage, SQL Server database engine creates below tracking objects for every statistics which it is already created.

table3

Now, SQL Server is able to find a specific data without having Key Lookup operation and Index Edge Key.

Let’s take a look at below scenarios.

Scenario 1: (Select * From dbo.Students Where Student_ID = 3 AND Firstname = ‘John’)

In this scenario, new engine checks the statistics of the Student_ID and Firstname columns first. Then it finds below information.

table4 table9

Then SQL Server needs to find out the location of the data by checking the Tracking object which has relationship with Statistics object. SQL Server does some Join operation to find out below information.

Table10

The result is obvious that record in Page 990 and Slot 03 is the matched record.

Note: SQL Server needs to terminate the query if the Count value in any statistics object is equal 0.

Scenario 2: (Select * From dbo.Students Where Student_ID = 1 OR Firstname = ‘John’)

In this scenario, SQL Server checks the statistics of the Student_ID and Firstname columns first. Then it finds below information.

table8table5

Then SQL Server needs to find out the location of the data by checking the Tracking object which has relationship with Statistics object. SQL Server does some Join operation to find out below information.

Table10

The result is obvious that record in Page 990 and Slot 01, 03 and 05 are the matched record.

Table11

Scenario 3: (Insert Into dbo.Students… )

In this scenario, SQL Server needs to insert new record into the table object. Below is the insert script.

Insert Into dbo.Students (Student_ID,First_Name,Last_Name,Age,IC_No)

Values (6,’Jordan’,’Clancy’,28,’J990899’);

The next step is to insert new record into the HEAP object and keep the PageID and SlotID for reference.

table12

The next step is to update the statistics objects for every column.

Yellow: Inserted.

Green: Updated.

table13 

The next step is to update the Tracking objects for every column.

table14

Scenario 4: (Delete from dbo.Students Where (Student_ID = 3 AND Firstname = ‘John’);)

 

In this scenario, SQL Server checks the statistics of the Student_ID and Firstname columns first. Then it finds below information.

table4  table5

The result is obvious that record in Page 990 and Slot 03 is the matched record to be deleted.

table7

The next step is to delete the record and update the statistics objects for every column.

Red: Deleted.

Green: Updated.

 table17

table18

The next step is to update the Tracking objects for every column.

table19

Download the Source Code from Here.

Download the Prototype from Here.

About Author

In the mid 2000’s, I, Hamid Jabarpour Fard, MCP, MCTS, MCPD, MCITP, MCSA, MCSE, MCT, MCC, MCM SQL Server 2008, MCSM Charter Data Platform, CIW Database Design Specialist and IBM Solution Advisor, learned to program in visual basic 6.0 on a personal computer, later I mastered visual C#.Net on the Microsoft Windows® operating system. I am programming on the Microsoft platform because I was impressed by the Visual Studio 2003, although I had dabbled in other operating systems. I had since focused on Windows® Development in Microsoft Visual C#.Net, ASP.Net with Microsoft Visual C#.Net, and Relational Database Development and Administration in Microsoft SQL Server 2000,2005,2008,2008 R2, 2012, and 2014. I also recently doing Hadoop (Big Data) administration and development.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

Be the First to Comment!

Notify of
avatar
1000
wpDiscuz