Back To Basics

NOTE:

For demos in this article, we will use a table named Customer that contains 7 columns:

Name Data Type
FirstName nvarchar(50)
LastName nvarchar(50)
StreetAddress nvarchar(255)
City nvarchar(255)
State char(2)
ZipCode nvarchar(10)
TotalSales money

Afer adding a few rows to the table, the data looks like this.

CustID FirstName LastName StreetAddress City State ZipCode TotalSales
1 David Giard 123 Oxford Ct Erlanger KY 40111 500.00
2 Magic Johnson 456 Hollywood Blvd Lansing MI 45222 1500.00
3 Bubba Smith 789 Killbubbakill St Baltimore MD 10111 1000.00
4 Ron Mason 501 E Grand River Ave Lansing MI 45333 2000.00
5 Steve Smith 900 Belle St Detroit MI 48888 5000.00
6 Ryan Miller 1 Shutout Ct Buffalo NY 32323 250.00
7 Brad Van Pelt 99 Linebaker Ln Owosso MI 47777 4000.00

In the last article in this series, I expleined the basic functionality of the SELECT statement. I showed how to select columns from a table and sort or filter the results.

In this article, I'll show some more things you can do with the SELECT statemnts.

Sometimes we want our results to aggregate data. When aggregating data, we use functions that consolidate multiple rows and return the result of this aggregate function applied to many rows. The most common aggregate functions I use that SQL Server supports are:

Function Descripiton
MAX The maximum value of a column across all included rows
MIN The minimum value of a column across all included rows
SUM The sum of all values in a column across all included rows
AVG The arithmetic average of all values in a column across all included rows
COUNT The number of included rows

The MIN, MAX, SUM, and AVG functions accept a parameter - the name of the column on which to calculate these values. The column must hold a numeric data type, such as an INT, FLOAT or MONEY.

You may pass a column name as a parameter to the COUNT function but it doesn't matter which column because - in SQL Server - every column appears exactly once in each row and the COUNT function is used to count rows. By convention, we pass "*" as the parameter to the COUNT aggregate function to represent all rows.

A few examples will help clarify this. Since our table contains only one numeric column - TotalSales - we will use this in most of our aggregate functions.

The following query returns the maximum value of the TotalSales column.


SELECT MAX (TotalSales) AS MaxSales
    FROM Customer

Here are the results - one row with one column containing the highest numerical value in the TotalSales column.

MaxSales
5000.00

Similarly, the following query returns the sum of all values in the TotalSales column.

SELECT SUM (TotalSales) AS SumSales
    FROM Customer

This returns 14,750 which is the sum of 1000 + 1500 + 1000 + 2000 + 5000 + 250 + 4000

SumSales
14750.00

We can filter before applying an aggregate function. If we only want to sum of TotalSales for customers in michigan, we simply add a WHERE clause to our query.

SELECT SUM (TotalSales) AS SumSales
    FROM Customer
    WHERE State = 'MI'

This returns 12,500, which is 1500 + 2000 + 5000 + 4000, or the sum of the TotalSales column for only those customers in Michigan.

SumSales
12500.00

As mentioned before, we do not need to specify a particular column for the COUNT function.

SELECT COUNT (*) AS NumCusts
    FROM Customer

returns the a row and column containing number 7, which is how many rows are in our table.

So far, each query we have written has returned only one row. Often, however, we want to calculate an aggregation for each distinct value in a column or columns. We can do this by adding a GROUP BY clause to the query. For example, we may want to see the SUM of TotalSales for each state.

SELECT 
        State, 
        SUM (TotalSales) AS SumSales
    FROM Customer
    GROUP BY State

The above query returns a row for each distinct value in the State column and calculates the sum of TotalSales of all rows corresponding to that state.

State SumSales
 KY 500.00
MD 1000.00
MI 12500.00
NY 500.00

It's important to note that, when using the GROUP BY clause, you cannot return a column that is not part of the grouping. So

SELECT 
        State, 
        City, 
        SUM (TotalSales) AS SumSales
    FROM Customer
    GROUP BY State

results in an error because we are trying to return the city column, but we are not grouping on that column. Because a given state can have multiple cities, SQL does not know which one to display for the row returned.

You can group on multiple columns as in the following query

SELECT 
        State, 
        City, 
        SUM (TotalSales) AS SumSales
    FROM Customer
    GROUP BY State, City

In this case, we get a row with a sum for each combination of state and city.

State City SumSales
 KY Erlanger 500.00
MD Baltimore 1000.00
MI Lansing 1500.00
MI Okemos 2000.00
MI Owosso 4000.00
NY Buffalo 250.00

By using the GROUP BY clause on a large table, we may end up with so many rows that it becomes difficult to find relevant data. Sometimes, we are only interested in those times when the aggregate value exceeds some threshhold. In these cases, it would be nice to only show aggregate rows that exceed that threshhold. This sounds like a good place to use a filter. Unfortunately, we cannot use the WHERE clause to accomplish this task because the WHERE clause filters data before the aggregation. We have to wait until after calculating the aggregate values becuase it is the aggregate values on which we want to filter. The HAVING clause is used to filter on aggregate values.

We can run our query to get the TotalSales sum for each state, but show only those states that have total sales of more than 2000, using the following query

SELECT 
        State, 
        SUM (TotalSales) AS SumSales
    FROM Customer
    GROUP BY State
    HAVING SUM (TotalSales) >= 1000

In this case, we don't see the row for Kentucky and New York because they had total sales summing less than $1000

State SumSales
MD 1000.00
MI 112500.00

In this article, we showed how to use T-SQL's grouping and aggregate functions to return summary data from a database.