The estimated reading time for this post is 11 minutes
Data integrity refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. SQL Server enforces data integrity to guarantees the quality of data in the database. It is a critical step in maintaining high-quality data.
Data integrity can be applied at different levels within the application. Applications are often structured in levels. Common application levels are:
Data integrity can be enforced at each of these levels
Data integrity in this level response to the end user which is possible to trap minor errors before any calls to other layers. Error messages in this level is often clearer because the code is more aware about action that caused the error.
The main disadvantage of integrity at user-interface level is that more than one application may need to work with the same data and each application might use different rules. If integrity is implemented in this level, each application need to apply all of the rules.
Business logics require many integrity issues. The middle tier is where the bulk requirements exist in code. In addition, multiple user interfaces often reuse the middle tier. At this level, the logic still quite aware of the actions that cause errors, so the error message still be quite specific.
It is also easy to apply integrity checks in the middle tier. For example, it might seem easy to check that a customer exists before placing an order for the customer. Consider, though, it’s possible that another user removes the customer between the time that you check for customer existence and the time that you place the order
The advantage of integrity at this level is that upper level can’t bypass it. It is common multiple applications or even direct access tools such as SSMS access to the same data. If integrity is implemented in this layer, all application be forced by the same rules. The data tier is often unaware of user actions that cause errors, so the error messages are very detailed to describe the issues, and hard to understand for end users. They usually need to be retranslated by upper layer to be clearer.
In this solution, integrity checks and rules be applied at multiple layers. However, the challenge in this solution is maintaining consistency between all rules and checks at different levels.
Data integrity falls into four categories:
Entity integrity defines each row within a table must be a unique entity. It enforces the integrity on the columns through UNIQUE indexes, UNIQUE constraint or PRIMARY KEY constraint.
Domain integrity specifies a set of valid values for a column. It enforces columns to restrict the type by using data types, restrict the format by CHECK constraints, or restrict the range of possible values by FOREIGN KEY constraints, CHECK constraints, DEFAULT definition, or NOT NULL definition.
Referential integrity preserves the relationships among the primary keys (or unique keys) and foreign keys.
When you enforce referential integrity, SQL Server prevent users to do the following:
User-defined integrity let you define specific business rules that do not fall into other integrities. This includes all column-level and table-level constraints in CREATE TABLE, stored procedures, and triggers.
First option that defines the type of data that can be stored in a column.
This option determines whether the value must be present in the column. This is referred to as whether the column is mandatory or not.
Constraints enable user to restrict the permitted values in a column. You can also apply constraints at the table level and enforce relationships between tables.
A default value is used to insert a specific value for a column when no value is supplied in the statement that inserted the row.
Triggers are procedures that are executed when specific event such as INSERT or UPDATE occur on the specific object such as a table.
Domain integrity limit the range and values for a column. The following steps enable you to implement domain integrity:
One of the most important steps to implement domain integrity is assigning most efficient data types for columns when designing tables. Data types can be assigned to a column by using one of the following methods:
As you read before, a DEFAULT constraint presents a value for a column when no value is presented in statement that inserted new row.
The following example shows how to set default value for a column when create a new table:
CREATE TABLE Members
(
ID INT IDENTITY(1,1),
FullName VARCHAR(50) NOT NULL,
RegisterDate DATE DEFAULT (SYSDATETIME()),
Sex BIT
)
And the following example shows how to set default value for a column in an existing table:
ALTER TABLE Members
ADD CONSTRAINT DF_MEMBERS_SEX
DEFAULT 0 FOR Sex;
A CHECK constraint limits the acceptable values for a column. CHECK constraints work with logical (Boolean) expressions that return TRUE, FALSE, or UNKNOWN. CHECK constraints reject values that evaluate to FALSE. This does not include an UNKNOWN return value because these values will not be rejected. Particular care must be taken with any expression that could have a NULL return value.
You can also supply CHECK constraints at the table level to check the relationship between columns of a table.
The following example add a CHECK constraint to check the age of members to be older than 18 on members table
ALTER TABLE Members
ADD CONSTRAINT [CK_Age]
CHECK (YEAR(GETDATE())-YEAR(Birthday)>18)
It is important to be able to uniquely identify rows within tables and to be able to establish relationships across tables.
PRIMARY KEY constraint is used to uniquely identify each row in a table that must not be NULL. It might involve multiple columns.
The following example shows how to set the PRIMARY KEY when creating a table:
CREATE TABLE Members
(
ID INT
CONSTRAINT PK_Members
PRIMARY KEY,
FullName VARCHAR(50) NOT NULL,
RegisterDate DATE,
Sex BIT,
Birthday DATE
)
A UNIQUE constraint shows that the column or combination of columns is unique. If the column is nullable one row can be NULL. It ensures that values in each row are different.
The following example shows how to set the UNIQUE constraint when creating a table:
CREATE TABLE Members
(
ID INT IDENTITY(1,1)
CONSTRAINT PK_Members
PRIMARY KEY,
N_Code INT NOT NULL UNIQUE,
FullName VARCHAR(50) NOT NULL,
RegisterDate DATE,
Sex BIT,
Birthday DATE
)
A FOREIGN KEY constraint establishs a link between the tables to enforce the relationship. A FOREIGN KEY constraint must refer to a PRIMARY KEY or UNIQUE constraint with not NULL values in target table. However, a column (or columns) with FOREIGN KEY constraint can be nullable.
The target table can be the same table. For example, an employee row might refer to a manager who is another row in the same Employee table.
When you add a FOREIGN KEY constraint, SQL Server will check the data to ensure that the reference to the target table is valid. However, by using WITH NOCHECK, SQL Server does not check the reference and will only check in future insert or update. The WITH NOCHECK option can also be applied to other types of constraints.
The following example add the FOREIGN KEY for member contacts:
CREATE TABLE Members
(
ID INT IDENTITY(1,1)
CONSTRAINT PK_Members
PRIMARY KEY,
Contact_ID INT FOREIGN KEY REFERENCES Member_Contact(ID),
N_Code INT NOT NULL UNIQUE,
FullName VARCHAR(50) NOT NULL,
RegisterDate DATE,
Sex BIT,
Birthday DATE
)
The FOREIGN KEY constraint includes a CASCADE option that enables any change to the value of UNIQUE or PRIMARY KEY to propagate the change to FOREIGN KEY values that reference it.
The CASCADE action is taken separately for UPDATE and DELETE and can have for value:
There are few common considerations that you need to mind when working with constraints.
The IDENTITY property generates a series of numbers automatically for an integer column, which means that in INSERT statement SQL Server provides a value for the column automatically; instead of specifying the value for the column. It’s typically associated with int or bigint columns.
When you specify IDENTITY property, you specify a seed and an increment. The seed is the starting value and the increment is how much the value goes up by each time. Default seed and increment are both 1 if they are not specified.
Although explicit insert is not typically allowed for the column with IDENTITY property, it’s possible to insert values explicitly. You can use SET IDENTITY_INSERT ON to enable the user to insert the values into the columns with IDENTITY property temporarily.
To retrieve the inserted identity value, you can use one of this two ways:
the following example shows how to set the IDENTITY property on the column when creating a table:
CREATE TABLE Members
(
ID INT IDENTITY(1,1)
CONSTRAINT PK_Members
PRIMARY KEY,
FullName VARCHAR(50) NOT NULL,
RegisterDate DATE,
Sex BIT,
Birthday DATE
)
You can use sequences as the same as IDENTITY property. However, unlike IDENTITY property, sequences are not tied to any specific table. This means that you can use a single sequence to provide key values for the group of tables.
Sequences can be cyclic. They can return to the first value, when exceed over the maximum value.
Sequences are created by the CREATE SEQUENCE statement, modified by the ALTER SEQUENCE statement, and deleted by the DROP SEQUENCE statement.
In the following example, a sequence called BookingID is created:
CREATE SEQUENCE BookingID AS INT
START WITH 100001
INCREMENT BY 10;
CREATE TABLE FlightBooking
(
FlightBookingID INT NOT NULL
PRIMARY KEY CLUSTERED
DEFAULT
(NEXT VALUE FOR
BookingID),
FlightNo INT NOT NULL,
CustomerID INT NOT NULL
)
To retrieve the Values from sequences use the NEXT VALUE FOR clause. In the example on the above, the sequence provides the default value for the FlightBookingID column in the FlightBooking table.
Hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.