Sorting and Filtering Data

The estimated reading time for this post is 8 minutes

hi all, hope enjoy the last post about ‘SELECT Queries’, you can also refer to ‘Saeed’s Post’ for further information about Querying Multiple Tables by JOINs. In this post we will discuss how to limit the number of rows in SELECT queries and control the order of results.

Because of relational theory, sets of data don’t include any sort order. therefore, if you need to have certain order in your query result, you should use ORDER BY clause in your SELECT statement. Let’s see how to write a query by using ORDER BY to have sorted results.

Sorting Data by Using ORDER BY Clause

In the logical order of query processing, ORDER BY is the last segment of the SELECT statement that be executed. ORDER BY sorts rows in results and without ORDER BY clause, there is no guaranteed order of rows.

ORDER BY Clause Form

SELECT <select_list>
FROM <table_source>
ORDER BY <order_by_list> [ASC|DESC];

ORDER BY can take several type of elements in its list:

  • columns by name: ORDER BY list can includes any column name of source tables whether they are beyond the select list or not.

ORDER BY with Column name

SELECT orderid, custid, orderdate
FROM Sales.Orders
ORDER BY orderdate;

If the query uses the DISTINCT option, any columns in the ORDER BY list must be included in the SELECT list.

  • Column aliases: according to the logical order of query processing, ORDER BY is processed after SELECT clause. Therefore can access to SELECT list aliases.

ORDER BY with Column Alias:

SELECT orderid, custid, YEAR(orderdate) AS orderyear
FROM Sales.Orders
ORDER BY orderyear;

  • Column by position in the SELECT clause: This one is not recommended because of low readability and extra care to keep ORDER BY list up to date with any changes made on the SELECT list order.

In ORDER BY can also use COLLATE clause to change the collation of order list.

In addition, you can also control the direction of the sort by using ASC for ascending(A-z, 0-9) or DESC for descending (z-A, 9-0). Ascending sort is default and each column can have a separate order, as the following example:

ASC and DESC Sort

USE TSQL; 
GO 
SELECT hiredate, firstname, lastname 
FROM HR.Employees 
ORDER BY hiredate DESC, lastname ASC;

The syntax of the ORDER BY clause appears as follow:

ORDER BY <order_by_list> 
OFFSET <offset_value> ROW|ROWS 
FETCH FIRST|NEXT <fetch_value> ROW|ROWS ONLY

The OFFSET-FETCH will be covered later.

Sometimes you want to retrieve only a subset of all rows in the table(s). To limit which rows are returned, you need to use WHERE clause in the SELECT statement. now, let’s see how to use WHERE clause to filter the result set.

Filtering Data in the WHERE Clause with Predicates

To limit the rows which are returned by query, you need WHERE clause following the FROM clause. WHERE clause is written as a predicate expression. The predicate provide filter which each row must pass and only rows with TRUE state in the predicate will be output.

Consider following, when you are writing a WHERE clause:

  • The evaluation of your predicate must be TRUE or FALSE. (the evaluation may be NULL when working with missing values or NULLs)
  • Only rows with TRUE evaluation will be passed through the filter.
  • FALSE or UNKNOWN values will be filter out.
  • Due to logical order of the SELECT statement, column aliases declared in SELECT clause cannot be used in WHERE clause.
  • Logically, the WHERE clause is the next phase that is processed after the FROM clause.

the following example shows how to use expression in SELECT and WHERE clause correctly:

Filtering Example

SELECT orderid, custid, YEAR(orderdate) AS ordyear 
FROM Sales.Orders 
WHERE YEAR(orderdate) = 2006;

And the following query will be failed, due to use column aliases in the WHERE clause:

Incorrect Column Alias in WHERE Clause

SELECT orderid, custid, YEAR(orderdate) AS ordyear 
FROM Sales.Orders 
WHERE ordyear = 2006;

The resulting error:

image-3065

Efficient WHERE clause can provide a positive impact on query performance. WHERE clause filter data on the server side which can reduce network traffic and memory usage on the client side.

WHERE Clause Syntax

WHERE <search_condition>

The most common form of WHERE clause is as follow:

Typical WHERE Clause

WHERE <column> <operator> <expression>

Some example of WHERE clauses:

Filter Rows for Customers from Spain

SELECT contactname, country 
FROM Sales.Customers 
WHERE country = N'Spain';

Filter Rows for Orders after January 1,2007

SELECT orderid, orderdate 
FROM Sales.Orders 
WHERE orderdate > '20070101';

Filter Orders within a Range of Dates

SELECT orderid, orderdate 
FROM Sales.Orders 
WHERE orderdate >= '20070101' AND orderdate < '20080101';

To using operators in a WHERE clause, you may also use another T-SQL operations in your predicate:

Predicates and OperatorsDescription
INDetermines whether a specified value matches any value in a subquery or a list.
BETWEENSpecifies an inclusive range to test.
LIKEDetermines whether a specific character string matches a specified pattern.
ANDCombines two Boolean expressions and returns TRUE only when both are TRUE.
ORCombines two Boolean expressions and returns TRUE if either is TRUE.
NOTReverses the result of a search condition.

WHERE with OR Example

SELECT custid, companyname, country 
FROM Sales.Customers 
WHERE country = N'UK' OR country = N'Spain';

WHERE with IN Example

SELECT custid, companyname, country 
FROM Sales.Customers 
WHERE country IN (N'UK',N'Spain');

Range Example

SELECT orderid, custid, orderdate 
FROM Sales.Orders 
WHERE orderdate >= '20070101' AND orderdate <= '20080630';

BETWEEN Operator

SELECT orderid, custid, orderdate 
FROM Sales.Orders 
WHERE orderdate BETWEEN '20070101' AND '20080630';

Filtering in the SELECT Clause Using TOP Option

sometimes you may need to limit the number or percentage of rows returned by your query. The TOP option in the SELECT clause allow you to specify the number of rows to return, either as ordinal number or percentage of rows. The simplified syntax of the TOP option is as follow:

TOP Option

SELECT TOP (N) <column_list> 
FROM  <table_source> 
WHERE <search_condition> 
ORDER BY <order list>;

In the following example, we only retrieve five most recent orders:

TOP Example

SELECT TOP (5) orderid, custid, orderdate 
FROM Sales.Orders 
ORDER BY orderdate DESC;

The TOP operator depends on the ORDER BY clause. there is no guarantee which rows will be returned, if there is no ORDER BY clause.

In addition, TOP also accepts the WITH TIES option, which retrieve two or more rows that tie for last place in the limited result set.

TOP … WITH TIES can be used only in SELECT statements, and only if the ORDER BY clause is specified.

Without the WITH TIES Option

SELECT TOP (5) orderid, custid, orderdate 
FROM Sales.Orders 
ORDER BY orderdate DESC;

The result shows five rows in two distinct orderdate value:

image-3066

With the WITH TIES Option

SELECT TOP (5) WITH TIES orderid, custid, orderdate 
FROM Sales.Orders 
ORDER BY orderdate DESC;

The following result shows eight rows in two distinct orderdate value:

image-3067

In addition, to return the percentage of the rows, use the PERCENT keyword with TOP.

Returning a Percentage of Records.

SELECT TOP (10) PERCENT orderid, custid, orderdate 
FROM Sales.Orders 
ORDER BY orderdate DESC;

TOP (N) PERCENT may also be used with the WITH TIES option.

Filtering in the ORDER BY Clause Using OFFSET-FETCH

Like TOP, OFFSET-FETCH allows to return a range of rows. However, it adds the functionality to process a starting point (an offset) and a value for the number of rows to return (a fetch value). The OFFSET-FETCH is a convenient technique for paging through results. This applies to SQL Server 2012 through SQL Server 2017 and azure.

OFFSET-FETCH Clause

OFFSET 
	{ integer_constant | offset_row_count_expression } 
	{ ROW | ROWS }     
[FETCH { FIRST | NEXT } 
	{integer_constant | fetch_row_count_expression } 
	{ ROW | ROWS } ONLY]

To use OFFSET-FETCH, you supply a starting value (zero if no skipping is required) and an optional number of rows to return, consider that fetch clause is optional. the following example will skip the first 10 rows and return 10 rows as determined by orderdate.

OFFSET-FETCH Example

SELECT orderid, custid, orderdate 
FROM Sales.Orders 
ORDER BY orderdate, orderid DESC 
OFFSET 10 ROWS 
FETCH NEXT 10 ROWS ONLY;

Working with UNKNOWN Values

After knowing about predicates and filters, it is the time to consider to missing and UNKNOWN values. In this part we will discuss how SQL Server uses NULL to mark missing values and how to work with NULL in queries.

Understanding NULL Values

Earlier in this post we learned predicates return TRUE or FALSE. However, in SQL Server you may need to act on some data that is missing or unknown. SQL Server uses NULLs to mark missing values. A NULL is neither TRUE nor FALSE but it’s marked as UNKNOWN.

Handling NULL in Queries

To handle NULLs in predicates, keep in mind this guideline:

  • Query filters, Such as ON, WHERE and HAVING clause, treat NULL like a FALSE result.

ORDER BY Query that Includes NULL in Result

SELECT productid, name, color 
FROM Production.Product
ORDER BY color

the result set is as follow:

image-3068

In WHERE clause, a NULL is not equivalent to another value, even another NULL. To write queries to test NULL columns use IS NULL (or IS NOT NULL) operator instead of equals (or not equals).

Incorrectly Testing For NULL

SELECT empid, lastname, region 
FROM HR.Employees 
WHERE region = NULL;
 

Correctly Testing For NULL

SELECT empid, lastname, region 
FROM HR.Employees 
WHERE region IS NULL;

Now, you know how to filter and sort the result sets according to your business logic. In next post we will discuss about DML commands to modify data.

Hamide B. Abshuri

Hamide B. Abshuri is SQL Server Database Developer and Administrator with extensive of working experience. She started work with SQL Server since year 2008. She is currently Microsoft Certified Professional and Microsoft Certified Solutions Associate: SQL Server 2016 Database Developer.

More Posts - Website