SQL 101 - Part 4: Aggregating dataMarch 12, 2009 14:17 Comments 
For demos in this article, we will use a table named Customer that contains 7 columns:
Afer adding a few rows to the table, the data looks like this.
|1||David||Giard||123 Oxford Ct||Erlanger||KY||40111||500.00|
|2||Magic||Johnson||456 Hollywood Blvd||Lansing||MI||45222||1500.00|
|3||Bubba||Smith||789 Killbubbakill St||Baltimore||MD||10111||1000.00|
|4||Ron||Mason||501 E Grand River Ave||Lansing||MI||45333||2000.00|
|5||Steve||Smith||900 Belle St||Detroit||MI||48888||5000.00|
|6||Ryan||Miller||1 Shutout Ct||Buffalo||NY||32323||250.00|
|7||Brad||Van Pelt||99 Linebaker Ln||Owosso||MI||47777||4000.00|
In the last article in this series, I expleined the basic functionality of the SELECT statement. I showed how to select columns from a table and sort or filter the results.
In this article, I'll show some more things you can do with the SELECT statemnts.
Sometimes we want our results to aggregate data. When aggregating data, we use functions that consolidate multiple rows and return the result of this aggregate function applied to many rows. The most common aggregate functions I use that SQL Server supports are:
|MAX||The maximum value of a column across all included rows|
|MIN||The minimum value of a column across all included rows|
|SUM||The sum of all values in a column across all included rows|
|AVG||The arithmetic average of all values in a column across all included rows|
|COUNT||The number of included rows|
The MIN, MAX, SUM, and AVG functions accept a parameter - the name of the column on which to calculate these values. The column must hold a numeric data type, such as an INT, FLOAT or MONEY.
You may pass a column name as a parameter to the COUNT function but it doesn't matter which column because - in SQL Server - every column appears exactly once in each row and the COUNT function is used to count rows. By convention, we pass "*" as the parameter to the COUNT aggregate function to represent all rows.
A few examples will help clarify this. Since our table contains only one numeric column - TotalSales - we will use this in most of our aggregate functions.
The following query returns the maximum value of the TotalSales column.
SELECT MAX (TotalSales) AS MaxSales FROM Customer
Here are the results - one row with one column containing the highest numerical value in the TotalSales column.
Similarly, the following query returns the sum of all values in the TotalSales column.
SELECT SUM (TotalSales) AS SumSales FROM Customer
This returns 14,750 which is the sum of 1000 + 1500 + 1000 + 2000 + 5000 + 250 + 4000
We can filter before applying an aggregate function. If we only want to sum of TotalSales for customers in michigan, we simply add a WHERE clause to our query.
SELECT SUM (TotalSales) AS SumSales FROM Customer WHERE State = 'MI'
This returns 12,500, which is 1500 + 2000 + 5000 + 4000, or the sum of the TotalSales column for only those customers in Michigan.
As mentioned before, we do not need to specify a particular column for the COUNT function.
SELECT COUNT (*) AS NumCusts FROM Customer
returns the a row and column containing number 7, which is how many rows are in our table.
So far, each query we have written has returned only one row. Often, however, we want to calculate an aggregation for each distinct value in a column or columns. We can do this by adding a GROUP BY clause to the query. For example, we may want to see the SUM of TotalSales for each state.
SELECT State, SUM (TotalSales) AS SumSales FROM Customer GROUP BY State
The above query returns a row for each distinct value in the State column and calculates the sum of TotalSales of all rows corresponding to that state.
It's important to note that, when using the GROUP BY clause, you cannot return a column that is not part of the grouping. So
SELECT State, City, SUM (TotalSales) AS SumSales FROM Customer GROUP BY State
results in an error because we are trying to return the city column, but we are not grouping on that column. Because a given state can have multiple cities, SQL does not know which one to display for the row returned.
You can group on multiple columns as in the following query
SELECT State, City, SUM (TotalSales) AS SumSales FROM Customer GROUP BY State, City
In this case, we get a row with a sum for each combination of state and city.
By using the GROUP BY clause on a large table, we may end up with so many rows that it becomes difficult to find relevant data. Sometimes, we are only interested in those times when the aggregate value exceeds some threshhold. In these cases, it would be nice to only show aggregate rows that exceed that threshhold. This sounds like a good place to use a filter. Unfortunately, we cannot use the WHERE clause to accomplish this task because the WHERE clause filters data before the aggregation. We have to wait until after calculating the aggregate values becuase it is the aggregate values on which we want to filter. The HAVING clause is used to filter on aggregate values.
We can run our query to get the TotalSales sum for each state, but show only those states that have total sales of more than 2000, using the following query
SELECT State, SUM (TotalSales) AS SumSales FROM Customer GROUP BY State HAVING SUM (TotalSales) >= 1000
In this case, we don't see the row for Kentucky and New York because they had total sales summing less than $1000
In this article, we showed how to use T-SQL's grouping and aggregate functions to return summary data from a database.