Data Integrity

The estimated reading time for this post is 11 minutes

Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. SQL Server enforces data integrity to guarantees the quality of data in the database. It is a critical step in maintaining high-quality data.

Data Integrity Within Application Levels

Data integrity can be applied at different levels within the application. Applications are often structured in levels. Common application levels are:

  • User-interface level
  • Middle tier
  • Data tier

Data integrity can be enforced at each of these levels

User-Interface Level

Data integrity in this level response to the end user which is possible to trap minor errors before any calls to other layers. Error messages in this level is often clearer because the code is more aware about action that caused the error.

The main disadvantage of integrity at user-interface level is that more than one application may need to work with the same data and each application might use different rules. If integrity is implemented in this level, each application need to apply all of the rules.

Middle Tier

Business logics require many integrity issues. The middle tier is where the bulk requirements exist in code. In addition, multiple user interfaces often reuse the middle tier. At this level, the logic still quite aware of the actions that cause errors, so the error message still be quite specific.

It is also easy to apply integrity checks in the middle tier. For example, it might seem easy to check that a customer exists before placing an order for the customer. Consider, though, it’s possible that another user removes the customer between the time that you check for customer existence and the time that you place the order

Data Tier

The advantage of integrity at this level is that upper level can’t bypass it. It is common multiple applications or even direct access tools such as SSMS access to the same data. If integrity is implemented in this layer, all application be forced by the same rules. The data tier is often unaware of user actions that cause errors, so the error messages are very detailed to describe the issues, and hard to understand for end users. They usually need to be retranslated by upper layer to be clearer.

Multiple Tiers

In this solution, integrity checks and rules be applied at multiple layers. However, the challenge in this solution is maintaining consistency between all rules and checks at different levels.

Types of Data Integrity

Data integrity falls into four categories:

  • Entity integrity
  • Domain integrity
  • Referential integrity
  • User-defined integrity

Entity Integrity

Entity integrity defines each row within a table must be a unique entity. It enforces the integrity on the columns through UNIQUE indexes, UNIQUE constraint or PRIMARY KEY constraint.

Domain Integrity

Domain integrity specifies a set of valid values for a column. It enforces columns to restrict the type by using data types, restrict the format by CHECK constraints, or restrict the range of possible values by FOREIGN KEY constraints, CHECK constraints, DEFAULT definition, or NOT NULL definition.

Referential Integrity

Referential integrity preserves the relationships among the primary keys (or unique keys) and foreign keys.

When you enforce referential integrity, SQL Server prevent users to do the following:

  • Add or change rows to a related table if there is no correspondent row in the primary table.
  • Change values in primary table that causes orphaned rows in the related table.
  • Delete rows from primary table that have matching rows in the related table.

User-Defined Integrity

User-defined integrity let you define specific business rules that do not fall into other integrities. This includes all column-level and table-level constraints in CREATE TABLE, stored procedures, and triggers.

Data Integrity Options

Data Type

First option that defines the type of data that can be stored in a column.

Nullability

This option determines whether the value must be present in the column. This is referred to as whether the column is mandatory or not.

Constraint

Constraints enable user to restrict the permitted values in a column. You can also apply constraints at the table level and enforce relationships between tables.

Default Value

A default value is used to insert a specific value for a column when no value is supplied in the statement that inserted the row.

Trigger

Triggers are procedures that are executed when specific event such as INSERT or UPDATE occur on the specific object such as a table.

Implementing Domain Integrity

Domain integrity limit the range and values for a column. The following steps enable you to implement domain integrity:

Data Type

One of the most important steps to implement domain integrity is assigning most efficient data types for columns when designing tables. Data types can be assigned to a column by using one of the following methods:

  • System data types which is supplied by SQL Server (e.g. int, varchar, binary). For further information about system data types refer to this link.
  • Alias data types are common for consistency problems among designing tables. For example, you may have several columns for weight value within your database. One column might be defined as decimal(10,2), another column might be defined as decimal(12,2) and etc. Alias data types enable you to create a data type called ProductWeight with definition as decimal(12,2) to use it as a data type for all columns assigned as weight.
  • User-defined data types that are created in managed code to design not existence data types and new behavior of data types

DEFAULT Constraint

As you read before, a DEFAULT constraint presents a value for a column when no value is presented in statement that inserted new row.

Note: for nullable columns without DEFAULT constraint, if no value is presented for the column in statement that inserted row, the column will be left as NULL. If a DEFAULT constraint is presented for the column, the default value would be used instead of NULL. However, consider that if the statement insert NULL explicitly, the default value would not be used.

The following example shows how to set default value for a column when create a new table:

CREATE TABLE Members
(
       ID INT IDENTITY(1,1),
       FullName VARCHAR(50) NOT NULL,
       RegisterDate DATE DEFAULT (SYSDATETIME()),
       Sex BIT
)

And the following example shows how to set default value for a column in an existing table:

ALTER TABLE Members
       ADD CONSTRAINT DF_MEMBERS_SEX
       DEFAULT 0 FOR Sex;

SQL Server does not require you to apply names for constraints that you create. If the name is not applied, SQL Server will generate a name automatically.

CHECK Constraint

A CHECK constraint limits the acceptable values for a column. CHECK constraints work with logical (Boolean) expressions that return TRUE, FALSE, or UNKNOWN. CHECK constraints reject values that evaluate to FALSE. This does not include an UNKNOWN return value because these values will not be rejected. Particular care must be taken with any expression that could have a NULL return value.

You can also supply CHECK constraints at the table level to check the relationship between columns of a table.

The following example add a CHECK constraint to check the age of members to be older than 18 on members table

ALTER TABLE Members
       ADD CONSTRAINT [CK_Age]
       CHECK (YEAR(GETDATE())-YEAR(Birthday)>18)

Implementing Entity Integrity

It is important to be able to uniquely identify rows within tables and to be able to establish relationships across tables.

PRIMARY KEY Constraints

PRIMARY KEY constraint is used to uniquely identify each row in a table that must not be NULL. It might involve multiple columns.

The following example shows how to set the PRIMARY KEY when creating a table:

CREATE TABLE Members
(
       ID INT
       CONSTRAINT PK_Members
       PRIMARY KEY,
       FullName VARCHAR(50) NOT NULL,
       RegisterDate DATE,
       Sex BIT,
       Birthday DATE
)

UNIQUE Constraint

A UNIQUE constraint shows that the column or combination of columns is unique. If the column is nullable one row can be NULL. It ensures that values in each row are different.

The following example shows how to set the UNIQUE constraint when creating a table:

CREATE TABLE Members
(
       ID INT IDENTITY(1,1)
       CONSTRAINT PK_Members
       PRIMARY KEY,
       N_Code INT NOT NULL UNIQUE,
       FullName VARCHAR(50) NOT NULL,
       RegisterDate DATE,
       Sex BIT,
       Birthday DATE
)

FOREIGN KEY Constraint

A FOREIGN KEY constraint establishs a link between the tables to enforce the relationship. A FOREIGN KEY constraint must refer to a PRIMARY KEY or UNIQUE constraint with not NULL values in target table. However, a column (or columns) with FOREIGN KEY constraint can be nullable.

when FOREIGN KEY is defined on the column, it is not possible to change the length of the column.

The target table can be the same table. For example, an employee row might refer to a manager who is another row in the same Employee table.

When you add a FOREIGN KEY constraint, SQL Server will check the data to ensure that the reference to the target table is valid. However, by using WITH NOCHECK, SQL Server does not check the reference and will only check in future insert or update. The WITH NOCHECK option can also be applied to other types of constraints.

The following example add the FOREIGN KEY for member contacts:

CREATE TABLE Members
(
       ID INT IDENTITY(1,1)
       CONSTRAINT PK_Members
       PRIMARY KEY,
       Contact_ID INT FOREIGN KEY REFERENCES Member_Contact(ID),
       N_Code INT NOT NULL UNIQUE,
       FullName VARCHAR(50) NOT NULL,
       RegisterDate DATE,
       Sex BIT,
       Birthday DATE
)

Cascading Referential Integrity

The FOREIGN KEY constraint includes a CASCADE option that enables any change to the value of UNIQUE or PRIMARY KEY to propagate the change to FOREIGN KEY values that reference it.

The CASCADE action is taken separately for UPDATE and DELETE and can have for value:

  1. NO ACTION is the default. It does not allow to DELETE or UPDATE the values in target table that have been For example, if you attempt to delete a contact and there are members that refer to that contact, the deletion will fail.
  2. CASCADE propagates the changes to the referencing tables. If the contact is being deleted, the members that refer to it will be deleted, too. If the contact PRIMARY KEY is being updated, the contact key in the members table will be updated as well.
  3. SET NULL makes the values in the columns in the reference table to be nullified. For example, when you attempt to DELETE or UPDATE a contact in contacts table, the Members which refer to the mentioned contact would still exist, but would not refer to any contact.
  4. SET DEFAULT causes the values in the columns in the reference table to be set to their default value.

however cascading referential integrity is easy to set up, you should take precaution when using it within database designs.

Considerations for Using Constraints

There are few common considerations that you need to mind when working with constraints.

  • As mentioned earlier, it’s unnecessary to specify name for constraints. When you leaving SQL Server to select a name, SQL server provides complicated name for the constraint. Often, you need to refer to constraints by name. Therefore, it’s better to have chosen meaningful names yourself.
  • You can create, alter, or drop constraint without having to drop or recreate the underlaying table. Use ALTER TABLE statement to add, alter, or drop constraints.
  • Perform error checking in your applications and transactions.
  • Disable CHECK and FOREIGN KEY constraints: to improve performance during large batch jobs. to avoid checking existing data when you add new constraints.

IDENTITY Property

The IDENTITY property generates a series of numbers automatically for an integer column, which means that in INSERT statement SQL Server provides a value for the column automatically; instead of specifying the value for the column. It’s typically associated with int or bigint columns.

When you specify IDENTITY property, you specify a seed and an increment. The seed is the starting value and the increment is how much the value goes up by each time. Default seed and increment are both 1 if they are not specified.

Although explicit insert is not typically allowed for the column with IDENTITY property, it’s possible to insert values explicitly. You can use SET IDENTITY_INSERT ON to enable the user to insert the values into the columns with IDENTITY property temporarily.

having IDENTITY property on the column does not ensure the column is unique

To retrieve the inserted identity value, you can use one of this two ways:

  • The system variable @@IDENTITY returns the last identity value within the session, in any scope.
  • The SCOPE_IDENTITY () function provides the last identity value within the current scope only.

the following example shows how to set the IDENTITY property on the column when creating a table:

CREATE TABLE Members
(
       ID INT IDENTITY(1,1)
       CONSTRAINT PK_Members
       PRIMARY KEY,
       FullName VARCHAR(50) NOT NULL,
       RegisterDate DATE,
       Sex BIT,
       Birthday DATE
)

Sequences

You can use sequences as the same as IDENTITY property. However, unlike IDENTITY property, sequences are not tied to any specific table. This means that you can use a single sequence to provide key values for the group of tables.

Sequences can be cyclic. They can return to the first value, when exceed over the maximum value.

Sequences are created by the CREATE SEQUENCE statement, modified by the ALTER SEQUENCE statement, and deleted by the DROP SEQUENCE statement.

In the following example, a sequence called BookingID is created:

CREATE SEQUENCE BookingID AS INT
       START WITH 100001
       INCREMENT BY 10;
 
CREATE TABLE FlightBooking
(
       FlightBookingID INT NOT NULL
       PRIMARY KEY CLUSTERED
       DEFAULT
              (NEXT VALUE FOR
              BookingID),
       FlightNo INT NOT NULL,
       CustomerID INT NOT NULL
)

To retrieve the Values from sequences use the NEXT VALUE FOR clause. In the example on the above, the sequence provides the default value for the FlightBookingID column in the FlightBooking table.

Hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.

 

Hamide B. Abshuri

Hamide B. Abshuri is SQL Server Database Developer and Administrator with extensive of working experience. She started work with SQL Server since year 2015. She is currently Microsoft Certified Professional and Microsoft Certified Solutions Associate: SQL Server 2016 Database Developer.

More Posts - Website