Incorporating Data Files into Databases

The estimated reading time for this post is 7 minutes

Organizations often store and manage data files on the file system of the server operating system.  But organizations are increasingly moving to integrate data files into their relational databases to gain the benefits that this can bring. In this article we have an overview of the options for storing and incorporating data files into databases in Microsoft SQL Server, and the benefits of each storage type.

Considerations for Working with Data Files in SQL Server

Incorporating data files into a database brings new challenges to organizations, such as:

  • Increasing management and storage costs.
  • Ensuring that the data files are secure.
  • Identifying and accessing textual data effectively and reliably.
  • Maintaining referential and transactional integrity between data files and relational data.

Benefits of Integrating Data Files into Databases

Beside the considerations, storing data files alongside relational data in a database can have several benefits, such as:

  • Centralized storage for relational and nonrelational data to reduce the required efforts to manage and maintain the data.
  • Control access to both data file and relational data through database management security.
  • Using the build-in indexing features to perform more efficient searches.
  • Referential and transactional consistency between data files and relational data.
  • Enabling developers to use data files in the applications easily and efficiently.

Options for storing data files in SQL Server

SQL Server provides various mechanisms for integrating data files into databases.

1.      Store Data Files on the File System

this approach stores data files on the file system of the OS or on a dedicated binary large object (BLOB) storage, and adding a varchar() column which contains the path that points to the data in the appropriate SQL Server table.

The advantages of this approach:

  • minimal database configuration
  • better read performance depends on the size and extention of each file compare to storing data in the database by using a data type such as varbinary(max) or image.
  • Typically, better fragmentation handle for frequently modified files.

The disadvantages of this approach:

  • Separate maintenance and backup management as there is no single point of management for both relational data and data in the file system.
  • No transactional and relational integrity
  • Separate security maintenance.
  • Extra layer of complexity for developer due to maintaining two separate stores.
  • No advantage of using integrity services such as full-text search and statistical sematic search.

2.      Store Data Files as Binary Large Objects (BLOB)s in the Database

This approach integrates data files more fully into a SQL Server database by storing files as binary large objects. BLOB data is typically much larger than the maximum row size of 8,060 bytes in SQL Server, so SQL Server stores BLOB data in a binary tree (B-Tree) that consists of a set of dedicated data pages. Earlier versions of SQL Server included the image data type for storing BLOBs. The image data type is still available but because it is now deprecated, you should not use it in any new databases.

To store BLOB data in new SQL Server database, you can use the varbinary(max) data type. The varbinary(max) data type enables you to store binary data up to 2,147,483,647 bytes (approximately 2 GB) in size. When you designate a column as varbinary(max), SQL Server stores data that is up to 8,000 bytes in size in the same data page as the rest of the row. For data that is larger than 8,000 bytes in size, SQL Server stores the data in separate data pages and includes a pointer to the pages in the row.

The advantages of storing data files by using the varbinary(max) data type:

  • Streamlined and single point of management of data
  • Transactional and relational integrity for BLOB data and relational data.
  • A single point of security configuration
  • Reduced complexity for application developers to use the data.
  • Take advantage of using integrity services such as full-text search and statistical sematic search.

The disadvantages of this approach:

  • no direct access to BLOB data through file systems.
  • Degraded read performance for large BLOBs that require many data pages.

3.      Store Data Files by Using FILESTREAM

FILESTREAM, which was introduced in SQL Server 2008, combines the performance advantages of storing data on the file system with the advantages of storing data as BLOBs in the database. FILESTREAM is not a data type in its own right; it is an attribute of the varbinary(max) data type. When you enable the FILESTREAM attribute on a varbinary(max) column, SQL Server stores BLOB data for that column on the NTFS file system (NTFS). You can subsequently access the data by using high-performance Win32 application programming interfaces (APIs), or by using Transact-SQL. The use of Win32 APIs for data access offers better performance and, because FILESTREAM uses the NTFS cache to cache BLOB file data instead of the SQL Server buffer pool, the SQL Server buffer pool remains free for processing other queries.

In addition to delivering better performance, FILESTREAM also offers greater flexibility by enabling you to store BLOBs that are larger than 2 GB. When you create a FILESTREAM column in a table, BLOB size is only limited by the size of the volume on which the data is stored.

The advantages of this approach:

  • Improved performance over storing BLOBs when BLOBs are larger than 1 MB on average.
  • Integrated management and security.
  • transactional and relational integrity for BLOB data.
  • Reduced complexity for application developers.
  • Take advantage of using integrity services such as full-text search and statistical sematic search.
  • The ability to store BLOBs that are larger than 2 GB.

The disadvantages of this approach:

  • no direct access to BLOB data through file systems.
  • No advantage of using remote storage such as shared folder on the network. FILESTREAM BLOBs must be stored on the same server as database files are located or a Storage Area Network (SAN).

4.      Store Data Files by Using a FileTable

FileTable, as a new feature introduced in SQL Server 2012, extends the FILESTREAM feature, makes working with stored files easier and more flexible. A FileTable overcomes the FILESTREAM limitation by enabling users to have nontransactional access to the files, which enables users to access FILESTREAM files in the same way that they access files that are not stored in SQL Server. Furthermore, for FileTables, unlike FILESTREAM, you can use a network share as the location for BLOB storage. By using FileTable, you eliminate the need to store one copy of your files in the database so that you can index and search them, and store another copy of the same files separately in a file share so that users can open and interact with them.

FileTable Structure and Access Options

A FileTable is a SQL Server table that has a predefined schema. The columns in a FileTable include a varbinary(max) column that has the FILESTREAM attribute enabled, and a series of metadata columns that store information including the file size, file creation time, and the last write time. FileTable files are part of a hierarchy that includes a database-level directory and a separate directory for each FileTable in the database. Each row in a FileTable represents either a file or a directory in the FileTable shared directory. Each FileTable has two columns, which are called path_locator and parent_path_locator. These columns use the hierarchyid data type to keep track of the place of each file and folder in the FileTable folder hierarchy.

To use FileTable, you must create or alter a database to set the NON_TRANSACTED_ACCESS option on a database that has a filegroup that is configured for FILESTREAM. Nontransactional access enables access to BLOB files through the file system, but enabling it means that you cannot restore BLOB data to a specific point in time. NON_TRANSACTED_ACCESS can be configured by using one of the following values:

  • FULL. read and write files and folders by using the FILESTREAM shared directory.
  • READONLY. read files in the FILESTREAM shared directory, but no modify or save new files to the directory.
  • OFF. No access BLOB files in the FILESTREAM shared directory. Only access the files programmatically.

5.      Store Data Files on a Dedicated Remote BLOB Store (RBS)

The Remote BLOB Store (RBS) add-on in SQL Server enables the use of dedicated third-party BLOB storage to store BLOB data. Offloading BLOB data storage by using RBS can improve performance and make sure that the server resources remain available. Typically, RBS is great and beneficial for massive databases that contain a high portion of unstructured binary data.

Key Factors for Planning a Solution to Store Data Files

There are various factors that influence your choice for storage solution. In the following table four storage options that are described above, are compared against the factors.

 Store data on the file systemStore data by using varbinary(max)Store data by using FILESTREAMStore data by using a FileTable
Performance Good, uses the file system Good when files are smaller than 1 MB on average Good, uses file system streaming Good, uses file system streaming
Manageability Requires separate management of relational data and data files, including separate backups Enables integrated management of all data, including backups Enables integrated management of all data, including backups Enables integrated management of all data, including backups
Security Requires separate security for relational data and data files Enables integrated security for all data Enables integrated security for all data Enables integrated security for all data
Nontransactional access through file system All access is nontransactional Not available, transactional access only Not available, transactional access only Enables both transactional and nontransactional access
Development effort Greater because of separate storage Lower because of integrated storage Lower because of integrated storage Lower because of integrated storage
Maximum size of filesSize limited by size of volume Maximum of 2 GB Size limited by size of volume Size limited by size of volume
Maintenance of transactional integrity No transactional integrity Transactional integrity maintained Transactional integrity maintained Depends on configuration
Ability to perform integrated searches Cannot perform integrated SQL Server searches Can perform full-text and semantic searches in SQL Server Can perform full-text and semantic searches in SQL Server Can perform full-text and semantic searches in SQL Server
Storage of data on a remote network share Yes NoNoYes

The Table does not include RBS option because factors such as manageability and performance depend on the RBS solution.

 

Hamide B. Abshuri

Hamide B. Abshuri is SQL Server Database Developer and Administrator with extensive of working experience. She started work with SQL Server since year 2015. She is currently Microsoft Certified Professional and Microsoft Certified Solutions Associate: SQL Server 2016 Database Developer.

More Posts - Website


Leave a Comment

avatar
1000
  Subscribe  
Notify of