Recently this issue happened once we upgraded a database from SQL Server 2005 to SQL Server 2014 to meet the compliance. Once the upgrading process is done, users tried to do the User Acceptance Test (UAT) to make sure every single functionality in the application and also the database performance still remain intact. A part of the application is doing some BULK INSERT operation to move certain data rows from a table to historical table (I know it is not the best practice to do), this operation took less than a minute to be completed in SQL Server 2005 but it takes few minutes to be completed in SQL Server 2014, Client expected to see some performance gains instead of losing it. After my investigations I found that there is a User Defined Function in Select…Into statement. The UDF is doing some calculation for every single row once the row is moved to the archived/history table.
Based on the findings I tried to simulate the environment in our own server to compare the Transaction activities in the BULK Operation with UDF and without UDF to observe how SQL Server behaves toward this BULK Operation. The following statements are for regenerate the performance issue.
CREATE DATABASE Test; go USE Test; go CREATE TABLE Active (ID BIGINT IDENTITY(1,1), Padding BINARY(10) DEFAULT 0XFFF, LastUpdate DATETIME DEFAULT SysDateTime()); Go INSERT INTO Active DEFAULT VALUES Go 3000 go CREATE FUNCTION GetRecordAge(@LastUpdate AS DATETIME) RETURNS BIGINT AS BEGIN RETURN (SELECT DATEDIFF(MINUTE,@LastUpdate,Sysdatetime())); END go SELECT ID, Padding,LastUpdate FROM Active; --Total Records: 3000 go SELECT * FROM fn_dblog(null,null); --Current LSN: 0000002b:00000037:0002; go SELECT ID,Padding,dbo.GetRecordAge(LastUpdate) AS [MinutesOld] INTO History FROM Active;
As you see on the above statements, there is a UDF that does simple calculation and there are one table called active and also the Select…Into statements that does the data movement operation.
At the next step, I took a deep look into the Transaction Log file by using the following statements:-
SELECT * FROM fn_dblog(null,null);
The fn_dblog() function reads the entire transaction log and dump all the information, as you see at below , it shows that SQL Server does fully logged the entire transactions of the Select…Into with the UDF.
SELECT COUNT(*),MAX([CURRENT LSN]) FROM fn_dblog(null,null) WHERE [CURRENT LSN]>'0000002b:00000037:0002' --Transactions Log Record Counts: 3200 --Current LSN: 0000002c:0000010f:0121
And below shows that SQL Server does minimally logged the entire transactions of the Select…Into without UDF.
SELECT ID,Padding, DATEDIFF(MINUTE,LastUpdate,Sysdatetime()) AS [MinutesOld] INTO History2 FROM Active; SELECT COUNT(*),MAX([CURRENT LSN]) FROM fn_dblog(null,null) WHERE [CURRENT LSN]>'0000002c:0000010f:0121' --Transactions Log Record Counts:142 --Current LSN: 0000002c:0000015b:0071
Then what is the solution to rectify this performance issue? I thought of the following configurations before finding the real solutions:-
- Enable the Instant File Initialization (But it does not work for Transaction Log files)
- Enable MAXDOP (It did not work as well)
- Disable Non-Clustered Indexes on the Archive/History table (It did not improve that much)
- Move the Archive/History table into a filegroup with multiple files. (It did not improve that much)
After reading the entire MSDN regarding User Defined Functions and their behavior and the changes has been done in SQL Server 2014, I found that in SQL Server 2012 and later the Select…Into operation will be fully logged due to data integrity issue as the UDF might read/write data in the same object that cause data corruption, which was not enabled on the earlier versions. To rectify this issue we need to make sure that UserDataAccess option in the UDF is set to 0 by enabling Schemabinding option on the UDF.
[info]If the Select…Into statement still references at least one user-defined function that has UserDataAccess property set to 1, the operation is fully logged.[/info]
As you see in the below statements, after I have enabled the Schemabinding option on the scalar function, the number of transaction log records reduced significantly.
ALTER FUNCTION GetRecordAge(@LastUpdate AS DATETIME) RETURNS BIGINT WITH Schemabinding AS BEGIN RETURN (SELECT DATEDIFF(MINUTE,@LastUpdate,Sysdatetime())); END go SELECT ID,Padding, dbo.GetRecordAge(LastUpdate) AS [MinutesOld] INTO History3 FROM Active; SELECT COUNT(*),MAX([CURRENT LSN]) FROM fn_dblog(null,null) WHERE [CURRENT LSN]>'0000002c:0000015b:0071' --Transaction Log Record Counts: 194