The estimated reading time for this post is 5 minutes
Except of row-by-row queries, you may need to summarize data to analyze it. Microsoft SQL Server provides built-in functions that can aggregate, and summarize information based on multiple rows. In this post, we will discuss about aggregate functions, GROUP BY, and HAVING clauses.
Microsoft SQL Server provide aggregate functions such as SUM, MAX, MIN, and AVG to calculate multiple rows and summarize, then return a single result.
When using aggregate functions, you need to consider that the Aggregate Functions:
Built-in Aggregate Functions
Common Statistics Other
SUM STDEV CHECKSUM_AGG
MIN STDEVP GROUPING
MAX VAR GROUPING_ID
AVG VARP
COUNT
COUNT_BIG
In this post we only discuss common aggregate functions. For further information on other built-in aggregate functions, visit MSDN for Aggregate Functions (Transact-SQL)
Common Aggregate Functions
Function Name Syntax Description
SUM SUM() Totals all the non-NULL numeric values in a column
AVG AVG() Averages all the non-NULL numeric values in a column (sum/count)
MIN MIN() Returns the smallest number, earliest date/time, or first-occurring string (according to collation sort rules)
MAX MAX() Returns the largest number, latest date/time, or last-occurring string (according to collation sort rules)
COUNT or COUNT_BIG COUNT(*) or COUNT() With (*), returns count of all rows, including those with NULL values; With (), returns count of non-NULL rows for that column. COUNT returns an int; COUNT_BIG returns a big_int
Aggregate Example
SELECT AVG(unitprice) AS avg_price,
MIN(OrderQty)AS min_qty,
MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;
Consider that the above example doesn’t use GROUP BY clause. Therefore, all rows will be summarized and single row will return as result.
The Results:
When writing aggregates in SELECT clause, all SELECT list must be used aggregate functions, or be referenced in a GROUP BY clause.
The following example will return an error:
Partial Aggregate Error
SELECT SalesOrderID,
AVG(unitprice) AS avg_price,
MIN(OrderQty)AS min_qty,
MAX(UnitPriceDiscount) AS max_discount
FROM Sales.SalesOrderDetail;
The Results:
As you see, SalesOrderID column doesn’t appear either in aggregate function or the GROUP BY clause. Therefore, it’s invalid to use it in SELECT list. Removing SalesOrderID from previous example will prevent an error.
Aggregate functions in addition to numeric, can also summarize character and date/time data. The following example returns first and last names of customers:
SELECT MIN(LastName) AS first_customer,
MAX(LastName) AS last_customer
FROM Sales.Customer SC INNER JOIN Person.Person PP
ON SC.PersonID=PP.BusinessEntityID
The Results:
And the following example use aggregate function to returns first and last order date.
SELECT MIN(OrderDate) AS [FIRST ORDER],
MAX(OrderDate) AS [LAST ORDER]
FROM Sales.SalesOrderHeader
The Result:
DISTINCT removes duplicate values from input column before computing the summery value.
The following example returns count of customer who have placed orders. Although in the second column it uses distinct to remove duplicate customer before counting.
SELECT SalesPersonID,
YEAR(orderdate) AS orderyear,
COUNT(CustomerID) AS all_customers,
COUNT(DISTINCT CustomerID) AS unique_customers
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID, YEAR(orderdate);
The Results:
Note that the difference between COUNT in column 3 and DISTINCT COUNT in column 4 is column 3 only returns count of all rows per SalesPersonID and order year except of NULL values whereas column 4 eliminates duplicate values for customers and then returns count of unique customers.
T-SQL aggregate functions with the exception of COUNT (*), ignore NULLs. This means, for example, that the SUM function just adds non-NULL values. NULLs do not evaluate to zero. Consider that if you want to count NULLs in your aggregate functions, you need to replace NULLs with zero before aggregation.
For example, given the following table named t1:
The following example compares the AVG functions NULLs-ignored verses replaced with 0.
SELECT AVG(c2) AS AvgWithNULLs,
AVG(COALESCE(c2,0)) AS AvgWithNULLReplace
FROM dbo.t1;
The Results:
You may need to arrange your data into subsets before aggregation and summarizing. the GROUP BY clause let you subdivide the results into groups of rows.
GROUP BY Syntax
GROUP BY <column-name> [, … n] | <column-expression> [, … n]
GROUP BY create groups to placed summarizing rows into each group.
The following example illustrate count of sales per salesman:
SELECT SalesPersonID, COUNT(*) AS CountOfSales FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID
The Result:
The following example returns an error since OrderDate is not input of GROUP BY clause and is used in SELECT clause without any aggregate function:
SELECT SalesPersonID, YEAR(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID
The Error Returns:
Msg 8120, Level 16, State 1, Line 15Column ‘Sales.SalesOrderHeader.OrderDate’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If you want to amend the query to see the result per SalesPersonID and per year of OrderDate add it to the GROUP BY clause, as follow:
SELECT SalesPersonID, YEAR(OrderDate), COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,YEAR(OrderDate)
The Results:
The following example shows the SELECT statement that lists columns and column expressions and aggregate functions with GROUP BY clause.
SELECT SalesPersonID, YEAR(OrderDate) OrderDate, MAX(SubTotal) HighestOrder,
MIN(SubTotal) LowestOrder, COUNT(*) AS CountOfSales
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,YEAR(OrderDate)
The Results:
The following table shows the logical order of clauses in SELECT statement:
Logical Order Phase Comments
5 SELECT
1 FROM
2 WHERE
3 GROUP BY Creates Groups
4 HAVNIG Operates on Groups
6 ORDER BY
As you see in the above table, HAVING clause is placed after WHERE and GROUP BY clause in forth phrase of logical order.
A HAVING clause is used to make a search condition on grouped results which returned by GROUP BY clause.
The following example groups all orders by customer, then returns only those who placed 10 or more orders.
GROUP BY with HAVING Clause
SELECT CustomerID, COUNT(*) AS CountOfOrders
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
HAVING COUNT(*) >= 10
The results:
Note: remember that the GROUP BY and HAVING clauses are proceed before the SELECT clause. Therefore any column aliases in SELECT list can’t be used in GROUP BY and HAVING clauses.
While both HAVING and WHERE clauses filter the result. It’s important to consider that WHERE operates on results of the FROM clause, while HAVING operates on grouped results of the GROUP BY clause not detail rows.
I hope this post was informative for you, please share it with others if you think it worth to read. Stay tuned to learn more about SQL Server.