Grouping and Aggregating Data

The estimated reading time for this post is 5 minutes

Except of row-by-row queries, you may need to summarize data to analyze it. Microsoft SQL Server provides built-in functions that can aggregate, and summarize information based on multiple rows. In this post, we will discuss about aggregate functions, GROUP BY, and HAVING clauses.

Using Aggregate Functions

Microsoft SQL Server provide aggregate functions such as SUM, MAX, MIN, and AVG to calculate multiple rows and summarize, then return a single result.

When using aggregate functions, you need to consider that the Aggregate Functions:

  • Return a single (scalar) value
  • Can be used in SELECT, HAVING, and ORDER BY clauses within SELECT statements.
  • Ignore NULLs except when using COUNT (*)
  • In SELECT list generate columns with no column name. for this purpose, you may need AS clause to provide column name.
  • In SELECT clause operate to all rows and all rows will be summarized, If there is no GROUP BY in SELECT statement.

Built-in Aggregate Functions

CommonStatisticsOther
SUMSTDEVCHECKSUM_AGG
MINSTDEVPGROUPING
MAXVARGROUPING_ID
AVGVARP
COUNT
COUNT_BIG

In this post we only discuss common aggregate functions. For further information on other built-in aggregate functions, visit MSDN for Aggregate Functions (Transact-SQL)

Common Aggregate Functions

Function NameSyntaxDescription
SUMSUM()Totals all the non-NULL numeric values in a column
AVGAVG()Averages all the non-NULL numeric values in a column (sum/count)
MINMIN()Returns the smallest number, earliest date/time, or first-occurring string (according to collation sort rules)
MAXMAX()Returns the largest number, latest date/time, or last-occurring string (according to collation sort rules)
COUNT or COUNT_BIGCOUNT(*) or COUNT() With (*), returns count of all rows, including those with NULL values; With (), returns count of non-NULL rows for that column. COUNT returns an int; COUNT_BIG returns a big_int

Aggregate Example

SELECT	AVG(unitprice) AS avg_price,
	MIN(OrderQty)AS min_qty,
	MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;

Consider that the above example doesn’t use GROUP BY clause. Therefore, all rows will be summarized and single row will return as result.

The Results:

When writing aggregates in SELECT clause, all SELECT list must be used aggregate functions, or be referenced in a GROUP BY clause.

The following example will return an error:

Partial Aggregate Error

SELECT SalesOrderID,
AVG(unitprice) AS avg_price,
MIN(OrderQty)AS min_qty,
MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;

The Results:

As you see, SalesOrderID column doesn’t appear either in aggregate function or the GROUP BY clause. Therefore, it’s invalid to use it in SELECT list. Removing SalesOrderID from previous example will prevent an error.

Aggregate functions in addition to numeric, can also summarize character and date/time data. The following example returns first and last names of customers:

SELECT MIN(LastName) AS first_customer,
MAX(LastName) AS last_customer
FROM Sales.Customer SC INNER JOIN Person.Person PP
ON SC.PersonID=PP.BusinessEntityID

The Results:

And the following example use aggregate function to returns first and last order date.

SELECT MIN(OrderDate) AS [FIRST ORDER],
MAX(OrderDate) AS [LAST ORDER]
FROM Sales.SalesOrderHeader

The Result:

Using DISTINCT with Aggregate Functions

DISTINCT removes duplicate values from input column before computing the summery value.

The following example returns count of customer who have placed orders. Although in the second column it uses distinct to remove duplicate customer before counting.

SELECT SalesPersonID,
YEAR(orderdate) AS orderyear,
COUNT(CustomerID) AS all_customers,
COUNT(DISTINCT CustomerID) AS unique_customers
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID, YEAR(orderdate);

The Results:

Note that the difference between COUNT in column 3 and DISTINCT COUNT in column 4 is column 3 only returns count of all rows per SalesPersonID and order year except of NULL values whereas column 4 eliminates duplicate values for customers and then returns count of unique customers.

Aggregate Functions and NULLs

T-SQL aggregate functions with the exception of COUNT (*), ignore NULLs. This means, for example, that the SUM function just adds non-NULL values. NULLs do not evaluate to zero. Consider that if you want to count NULLs in your aggregate functions, you need to replace NULLs with zero before aggregation.

For example, given the following table named t1:

The following example compares the AVG functions NULLs-ignored verses replaced with 0.

SELECT AVG(c2) AS AvgWithNULLs,
AVG(COALESCE(c2,0)) AS AvgWithNULLReplace
FROM dbo.t1;

The Results:

Using GROUP BY Clause

You may need to arrange your data into subsets before aggregation and summarizing. the GROUP BY clause let you subdivide the results into groups of rows.

GROUP BY Syntax

GROUP BY <column-name> [, … n] | <column-expression> [, … n]

GROUP BY create groups to placed summarizing rows into each group.

The following example illustrate count of sales per salesman:

SELECT SalesPersonID, COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

The Result:

When you use GROUP BY clause, you are permitted to use columns without aggregate functions in SELECT clause that appear in GROUP BY clause. Otherwise, you have to use aggregate functions with columns in SELECT clause.

The following example returns an error since OrderDate is not input of GROUP BY clause and is used in SELECT clause without any aggregate function:

SELECT SalesPersonID, YEAR(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID

The Error Returns:

Msg 8120, Level 16, State 1, Line 15Column ‘Sales.SalesOrderHeader.OrderDate’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

If you want to amend the query to see the result per SalesPersonID and per year of OrderDate add it to the GROUP BY clause, as follow:

SELECT SalesPersonID, YEAR(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,YEAR(OrderDate)

The Results:

The following example shows the SELECT statement that lists columns and column expressions and aggregate functions with GROUP BY clause.

SELECT SalesPersonID, YEAR(OrderDate) OrderDate, MAX(SubTotal) HighestOrder,
MIN(SubTotal) LowestOrder, COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,YEAR(OrderDate)

The Results:

GROUP BY and the Logical Order of Operations

The following table shows the logical order of clauses in SELECT statement:

Logical OrderPhaseComments
5SELECT
1FROM
2WHERE
3GROUP BYCreates Groups
4HAVNIGOperates on Groups
6ORDER BY

Filtering Grouped Data Using the HAVING Clause

As you see in the above table, HAVING clause is placed after WHERE and GROUP BY clause in forth phrase of logical order.

A HAVING clause is used to make a search condition on grouped results which returned by GROUP BY clause.

The following example groups all orders by customer, then returns only those who placed 10 or more orders.

GROUP BY with HAVING Clause

SELECT CustomerID, COUNT(*) AS CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
HAVING COUNT(*) >= 10

The results:

Note: remember that the GROUP BY and HAVING clauses are proceed before the SELECT clause. Therefore any column aliases in SELECT list can’t be used in GROUP BY and HAVING clauses.

HAVING versus WHERE

While both HAVING and WHERE clauses filter the result. It’s important to consider that WHERE operates on results of the FROM clause, while HAVING operates on grouped results of the GROUP BY clause not detail rows.

I hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.

 

Hamide B. Abshuri

Hamide B. Abshuri is SQL Server Database Developer and Administrator with extensive of working experience. She started work with SQL Server since year 2015. She is currently Microsoft Certified Professional and Microsoft Certified Solutions Associate: SQL Server 2016 Database Developer.

More Posts - Website