# Tuesday, March 10, 2009
Back To Basics

NOTE:

For demos in this article, we will use a table named Customer that contains the followong columns:

Name Data Type
FirstName nvarchar(50)
LastName nvarchar(50)
StreetAddress nvarchar(255)
 City nvarchar(255)
State char(2)
 ZipCode nvarchar(10)
TotalSales money

Afer adding a few rows to the table, the data looks like this.

CustID FirstName LastName StreetAddress City State ZipCode TotalSales
1 David Giard 123 Oxford Ct Erlanger KY 40111 1000.00
2 Magic Johnson 456 Hollywood Blvd Lansing MI 45222 1500.00
3 Bubba Smith 789 Killbubbakill St Baltimore MD 10111 1000.00
4 Ron Mason 501 E Grand River Ave Okemos MI 45333 2000.00
5 Steve Smith 900 Belle St Detroit MI 48888 5000.00

Retrieving data is one of the most common tasks performed on a relational database.

Fortunately, SQL Server includes a language that allows users (and programs) to retrieve the data they want. This language is called Structured Query Language. It is usually abbreviated "SQL" and often pronounced "SEE-kwuhl".

Getting data from a database is known as querying the database. The code to retrieve that data is a query. When this code is written in SQL (as most of my queries are), it is known as a SQL query.

The basic syntax for a SQL query that retrieves data from a single table is

SELECT [List of Columns]
    FROM [Table Name]
    WHERE [Filter Condition]
    ORDER BY [Sort column or columns]

In SQL, line breaks and extra spaces are not important. The language parser is smart enough to figure out when a statement or command ends, so I usually try to format my SQL statements to make them easy to read.

Only the SELECT and FROM parts of the query are required, so let's start with those.

EvenEven though the "SELECT" keyword typically comes first in this type of query, I'll start by explaining the "FROM" keyword. When getting data from only one table, follow the keyword FROM with the name of that table. For example, the clause

    FROM Customer 

indicates that we are getting data from a table named Customer. We'll talk later about how to get data from multiple tables in the same query.

It is possible to provide an alias for a table by following the table name with a space, and the alias. For example

    FROM Customer cust

The above SQL clause says that we will get data from the Customer table, but that we will use the string “cust” to refer to this table elsewhere in our query.

This is useful in the following situations

  • You want to provide a shorter name for the table in order to avoid retyping a long name elsewhere in the query
  • You want to avoid ambiguity when listing the same table name twice in the FROM clause.

SELECT is the first keyword of this type of query and tells SQL Server that we want to retrieve data from the database. The word "SELECT" is followed by a list of columns that the query will return. If the column list contains duplicate column names (as when you are getting data from two different tables and they each have a column with the same name), you should precede the column name with the table name or alias.

You can also use the special character "*" in place of (or in addition to) the list of column names in order to return all columns in the tables.

The following query returns all columns and rows in the Customer table.

SELECT * 
    FROM Customer 
CustID FirstName LastName StreetAddress City State ZipCode TotalSales
1 David Giard 123 Oxford Ct Erlanger KY 40111 1000.00
2 Magic Johnson 456 Hollywood Blvd Lansing MI 45222 1500.00
3 Bubba Smith 789 Killbubbakill St Baltimore MD 10111 1000.00
4 Ron Mason 501 E Grand River Ave Okemos MI 45333 2000.00
5 Steve Smith 900 Belle St Detroit MI 48888 5000.00

We can return onWe can return only the FirstName and LastName columns from the Customer table with the following table

SELECT 
        FirstName, 
        LastName 
    FROM Customer 
FirstName LastName
David Giard
Magic Johnson
Bubba Smith
Ron Mason
Steve Smith

In the queries above, it is possible to qualify the column names, explicitly indicating that they are from the Customer table. Of course, in this case it is unnecessary because the column names are unique within the Customer table. The following examples qualify the column names and returns the same data.

SELECT
        Customer.FirstName,
        Customer.LastName
    FROM Customer

SELECT
        cu.FirstName,
        cu.LastName
    FROM Customer cu

By default, each column returned by the query retains the name of the corresponding column in the source table. you want to change the name of a If you want to change the name of a column in the query, alias that column by appending the keyword " AS " followed by the alias you want. For example

SELECT
        FirstName AS FirstNm,
        LastName AS LastNm
    FROM Customer

This will return the following output
FirstNm LastNm
David Giard
Magic Johnson
Bubba Smith
Ron Mason
Steve Smith

If wIf we want to sort the output we can add the ORDER BY clause to our query. The syntax for this clause is

ORDER BY [List of Columns on which to sort]

For example, we can sort our output on Last Name by changing our query to

SELECT
        FirstName,
        LastName
    FROM Customer
    ORDER BY LastName

FirstName LastName
David Giard
Magic Johnson
Steve Smith
Bubba Smith
Ron Mason

We can add more columns to the list of sort columns if we separate each with a comma. The second column is only appropriate in our sort if two rows have identical values for the first column.

For example

SELECT FirstNameFor example

SELECT
        FirstName,
        LastName
    FROM Customer
    ORDER BY LastName, FirstName

FirstName LastName
David Giard
Magic Johnson
Bubba Smith
Steve Smith
Ron Mason

The above result set contains 2 rows with the last name "Smith".  These two rows were sorted in order of their FirstName column.

In many cases, we may not want to return eIn many cases, we may not want to return every row in a table. We can use the WHERE clause to filter data. The syntax of the WHERE clause is

   WHERE [Filter Condition]

The Filter condition is a Boolean expression, meaning it evaluates to either TRUE or FALSE for every row. The query will return only those rows for which this condition evaluates to TRUE.

For example, if we want to get only those customers in Michigan we can use the query

SELECT
   FROM Customer 
   WHERE State = 'MI'

CustID FirstName LastName StreetAddress City State ZipCode TotalSales
2 Magic Johnson 456 Hollywood Blvd Lansing MI 45222 1500.00
4 Ron Mason 501 E Grand River Ave Okemos MI 45333 2000.00
5 Steve Smith 900 Belle St Detroit MI 48888 5000.00

This query only returned those rows that match our filter condition.

Of course, we can combine several of these clauses as in the following

SELECT
        cu.FirstName,
        cu.LastName,
        cu.State AS ResidencyState
    FROM Customer cu
    WHERE cu.State = 'MI'
    ORDER BY cu.LastName

which returns the following results
FirstName LastName ResidencyState
Magic Johnson MI
Ron Mason MI
Steve Smith MI

As you can see, we can use the SELECT commmand to retrieve data from a table in a database and customize the way that data comes back.  We've just scratched the surface of this command.  In the next article, we'll look at more options of the SELECT command

Tuesday, March 10, 2009 4:36:54 PM (GMT Standard Time, UTC+00:00)
# Monday, March 9, 2009

Episode 14

Jim Holmes takes a break from organizing and coordinating CodeMash to talk about the conference.

2 minutes, 38 seconds

Monday, March 9, 2009 4:11:04 AM (GMT Standard Time, UTC+00:00)
# Saturday, March 7, 2009

Episode 13

Matt Pizzimenti is involved in a2geeks.org - a social network for techies and entrepeneurs.  He discusses it in this interview.

Saturday, March 7, 2009 12:45:47 PM (GMT Standard Time, UTC+00:00)
# Thursday, March 5, 2009

Episode 12

Mark Hindsbo is the General Manager of the Microsoft Evangelism group in the United States. In this interview, Mark talks about what Microsoft can do to help the developer community succeed.

2 minutes, 24 seconds

Thursday, March 5, 2009 11:52:25 AM (GMT Standard Time, UTC+00:00)
# Wednesday, March 4, 2009

I woke up this morning and was surprised and deleted to see the following e-mail in my inbox


Congratulations! We are pleased to present you with the 2009 NVP Award! The NVP Award is our way to say thank you for promoting bad behavior and general revelry in the office. We appreciate your extraordinary efforts at “Video interviewing everyone on the planet” during the past year..

To celebrate this tremendous achievement, please join your other NVPers at the NVP Summit at SRT Solutions Thursday afternoon where we will talk about the future of <__>, drink lots of <__>, play a little <__>, and not do any <__>. As we NVPers like to say – “Anything that happens at NVP Summit, stays at NVP Summit”. NDA restrictions apply. You will also receive a special gift for being a NVP and some useless swag.

--Mike (NVP Regional Director)

I cannot tell you how proud I am to receive this award. I certainly would have dreamed of it all my life had I heard of it before yesterday.

I would be thrilled to share all I learn with you - my readers - were it not for the attached 423-page non-disclosure agreement I was forced to sign. But rest assured that this knowledge and its accompanying free drinks will ultimately help the community and be good for all software developers.

Thank you to the committee, to those who nominated me and seconded my nomination and especially to <__>, without whom I would not have won this award.

Wednesday, March 4, 2009 12:58:00 PM (GMT Standard Time, UTC+00:00)
# Tuesday, March 3, 2009

Episode 11

After writing a distributed application, software architect Phil Japikse needed a way to deploy updates to users across the state.  In this conversation, Phil describes the deployment strategy he implemented using tools provided by the .Net framework. 

Tuesday, March 3, 2009 11:55:01 AM (GMT Standard Time, UTC+00:00)
# Friday, February 27, 2009

Episode 10

Tim Adams of Microsoft was filming a series of "Man On The Street" video interviews at CodeMash when I caught up with him. I interviewed him while he interviewed me and we each filmed the conversation.

Here is the conversation from my side:

Here is the same conversation from Tim's point of view:

Friday, February 27, 2009 5:05:59 AM (GMT Standard Time, UTC+00:00)
# Thursday, February 26, 2009

In my last article, I defined the basic concepts of database, table, column and row. Using these constructs, you can organize data into a rectangular format. This paradigm often works really well, because

  • You can group related information into a single container (a table)
  • Each row represents a single entity (such as a customer, employee, or invoice) and
  • Each column represents an attribute of the entity (such as FirstName, LastName, or TotalSales).

Using this model, we can create a table containing information about a customer's purchases. Each row in this item might represent a single purchase.

When a customer purchases an item, we probably would want to store some information about that purchase. These bits of information about each purchase are attributes of the purchase and are therefore candidates for columns. Below are examples of the information we might want to save about a customer's purchase.

  • Date of Purchase
  • Customer First Name
  • Customer Last Name
  • Customer Street Address
  • Customer City
  • Customer Zip Code
  • Item Purchased
  • Quantity Purchased
  • Price per Item

We can create a table CustomerPurchase with a column for each of the above attributes and begin populating with data each time a customer purchases something. The data would look something like this:

PurchaseDate Customer
FirstName
Customer
LastName
Customer
StreetAddress
Customer
City
Customer
ZipCode
ItemPurchased Quantity PricePerItem
2/26/2009 John Smith 123 Elm Bigg City 48222 Lamp 1 40
2/26/2009 Bill Jones 456 Maple Smallville 48333 Chair 2 100
2/26/2009 Mary Brown 789 Oak Middleton 48444 Table 1 50

This model seems to capture the information we want. Do you see any problems with it?

What happens if a customer orders more than one item? If John Smith purchases a Chair in addition to his Lamp, we can just add another row to the table, like so.

PurchaseDate Customer
FirstName
Customer
LastName
Customer
StreetAddress
Customer
City
Customer
ZipCode
ItemPurchased Quantity PricePerItem
2/26/2009 John Smith 123 Elm Bigg City 48222 Lamp 1 40
2/26/2009 Bill Jones 456 Maple Smallville 48333 Chair 2 100
2/26/2009 Mary Brown 789 Oak Middleton 48444 Table 1 50
2/26/2009 John Smith 123 Elm Bigg City 48222 Chair 1 100
2/27/2009 John Smith 123 Elm Bigg City 48222 Table 1 50

But notice that now we are storing John Smith's name and address multiple times.  Assuming John Smith will never change his name, this is a waste of space.  Granted, this isn't very much wasted space when we have only a few orders, but imagine a system with thousands of customers and millions of orders.  Do you really want all that redundant information cluttering up your database?

Also, imagine that we want to correct an error in the spelling of John's name.  With the current model, we must correct that error three times due to the redundant storage.

To address these issues, we can normalize the data.  Data normalization refers to structuring our data in order to remove redundancy. 

In our example, we accomplish this by creating a table of customers with the following structure

  • FirstName
  • LastName
  • StreetAddress
  • City
  • ZipCode

and and moving the customer data to this table - one row per customer.

FirstName LastName StreetAddress City ZipCode
John Smith 123 Elm Bigg City 48222
Bill Jones 456 Maple Smallville 48333
Mary Brown 789 Oak Middleton 48444
 

Then we add an extra column to the ustomerPurchase tab table.  This new column is special in that the value in it will uniquely identify each row - in other words, no two rows will have the same value.  This unique column goes by many names but we will call it a Primary Key here.  In this case, the Primary Key column will be named "CustomerID" and will hold an integer.

CustomerID FirstName LastName StreetAddress City ZipCode
1 John Smith 123 Elm Bigg City 48222
2 Bill Jones 456 Maple Smallville 48333
3 Mary Brown 789 Oak Middleton 48444
 

Now we can go back to the ustomerPurchase tab table, and replace the columns that describe customer with a column to hold the CustomerID.  This replacement column is known as a "Foreign Key".  It references a Primary Key in another table and is used to point to a single unique record in that other table.

PurchaseDate CustomerID ItemPurchased Quantity PricePerItem
2/26/2009 1 Lamp 1 40
2/26/2009 2 Chair 2 100
2/26/2009 3 Table 1 50
2/26/2009 1 Chair 1 100
2/27/2009 1 Table 1 50
 

This is all we need because, given the CustomerID, we can look in the Customer table, find the record for that customer and get all information about that customer.

This concept of using a key value to point to a row in another table is known as a relationship.  We say that the Customer table is related to the CustomerPurchase tab table. 

This type of relationship is known as a one-to-many relationship, every customer may have many orders.  In this type of relationship the table with one row is known as the parent and the table with (potentially) many rows is known as the child table.  

This relationship is typically represented by a drawing similar to the one below.

Organizing data in this way can make storage of that data far more efficient and flexible.

Thursday, February 26, 2009 1:17:33 PM (GMT Standard Time, UTC+00:00)
# Wednesday, February 25, 2009

Episode 9

Mike Wood is the Lead Director for the Cincinnati .Net User Group.  He and I spoke about how to build a strong community and what makes the Microsoft Heartland Community so special.

Wednesday, February 25, 2009 11:53:42 AM (GMT Standard Time, UTC+00:00)
# Tuesday, February 24, 2009

In this article, we will define a database, a table and the main parts of a table - rows and columns.

A database is an organized (or structured) collection of information.

Most companies spend a lot of time and effort collecting information and storing it somewhere, but not all that information is organized, which makes it difficult to retrieve anything relevant later on.  A database adds structure to the information making it easier to maintain and query it.

Database engines like SQL Server provide a structure to organize data in a way that makes sense to a user.  Specifically, SQL Server uses a relational model* to organize its data.

In a relational database, data is partitioned into tables.  Tables are a way of data storing data in rows and columns, kind of like in an Excel worksheet. 


Figure 1 - Data in an Excel workbook

I've always found this rectangular view of data very intuitive.

Just as in a workbook, each table row represents a discrete record.  All information in that row serves to describe the row.  

Similarly, a table column is a placeholder that describes a single attribute for each row.  The advantage SQL Server has over Excel is that you can easily place rules onto a column, restricting the type of data that can be stored there. 

If a SQL Server column is designed to hold a date, a property of that column can be set to throw an error if a person or program tries to store a string.   We can set up such restrictions for many data types, so that a column can be restricted to allow only integers, only TRUE/FALSE values, or only binary objects.  We can even restrict the maximum length of a string or require users to always enter a value into a column - all simply by setting properties on the column.**

For example, a table named "Customers" might be used to store information about your company's customers.  Each row in this table would represent a single customer.  Each column would hold an attribute of that customer, so you could create columns such as FirstName, LastName and StreetAddress that would hold the appropriate values for each customer. 


Figure 2 - Data in a SQL Server table

Looking at the first row, gives us information about the customer.  It should be obvious that this customer has a first name of "David", a last name of "Giard" and an address of "123 Main St".


*SQL Server does provide some non-relational ways of storing data but those are beyond the scope of this article.
** It is possible to configure Microsoft Excel to restrict data input, but this task is relatively advanced and far more easily accomplished in SQL Server.

Tuesday, February 24, 2009 6:14:48 PM (GMT Standard Time, UTC+00:00)