# Thursday, February 26, 2009

In my last article, I defined the basic concepts of database, table, column and row. Using these constructs, you can organize data into a rectangular format. This paradigm often works really well, because

  • You can group related information into a single container (a table)
  • Each row represents a single entity (such as a customer, employee, or invoice) and
  • Each column represents an attribute of the entity (such as FirstName, LastName, or TotalSales).

Using this model, we can create a table containing information about a customer's purchases. Each row in this item might represent a single purchase.

When a customer purchases an item, we probably would want to store some information about that purchase. These bits of information about each purchase are attributes of the purchase and are therefore candidates for columns. Below are examples of the information we might want to save about a customer's purchase.

  • Date of Purchase
  • Customer First Name
  • Customer Last Name
  • Customer Street Address
  • Customer City
  • Customer Zip Code
  • Item Purchased
  • Quantity Purchased
  • Price per Item

We can create a table CustomerPurchase with a column for each of the above attributes and begin populating with data each time a customer purchases something. The data would look something like this:

PurchaseDate Customer
ItemPurchased Quantity PricePerItem
2/26/2009 John Smith 123 Elm Bigg City 48222 Lamp 1 40
2/26/2009 Bill Jones 456 Maple Smallville 48333 Chair 2 100
2/26/2009 Mary Brown 789 Oak Middleton 48444 Table 1 50

This model seems to capture the information we want. Do you see any problems with it?

What happens if a customer orders more than one item? If John Smith purchases a Chair in addition to his Lamp, we can just add another row to the table, like so.

PurchaseDate Customer
ItemPurchased Quantity PricePerItem
2/26/2009 John Smith 123 Elm Bigg City 48222 Lamp 1 40
2/26/2009 Bill Jones 456 Maple Smallville 48333 Chair 2 100
2/26/2009 Mary Brown 789 Oak Middleton 48444 Table 1 50
2/26/2009 John Smith 123 Elm Bigg City 48222 Chair 1 100
2/27/2009 John Smith 123 Elm Bigg City 48222 Table 1 50

But notice that now we are storing John Smith's name and address multiple times.  Assuming John Smith will never change his name, this is a waste of space.  Granted, this isn't very much wasted space when we have only a few orders, but imagine a system with thousands of customers and millions of orders.  Do you really want all that redundant information cluttering up your database?

Also, imagine that we want to correct an error in the spelling of John's name.  With the current model, we must correct that error three times due to the redundant storage.

To address these issues, we can normalize the data.  Data normalization refers to structuring our data in order to remove redundancy. 

In our example, we accomplish this by creating a table of customers with the following structure

  • FirstName
  • LastName
  • StreetAddress
  • City
  • ZipCode

and and moving the customer data to this table - one row per customer.

FirstName LastName StreetAddress City ZipCode
John Smith 123 Elm Bigg City 48222
Bill Jones 456 Maple Smallville 48333
Mary Brown 789 Oak Middleton 48444

Then we add an extra column to the ustomerPurchase tab table.  This new column is special in that the value in it will uniquely identify each row - in other words, no two rows will have the same value.  This unique column goes by many names but we will call it a Primary Key here.  In this case, the Primary Key column will be named "CustomerID" and will hold an integer.

CustomerID FirstName LastName StreetAddress City ZipCode
1 John Smith 123 Elm Bigg City 48222
2 Bill Jones 456 Maple Smallville 48333
3 Mary Brown 789 Oak Middleton 48444

Now we can go back to the ustomerPurchase tab table, and replace the columns that describe customer with a column to hold the CustomerID.  This replacement column is known as a "Foreign Key".  It references a Primary Key in another table and is used to point to a single unique record in that other table.

PurchaseDate CustomerID ItemPurchased Quantity PricePerItem
2/26/2009 1 Lamp 1 40
2/26/2009 2 Chair 2 100
2/26/2009 3 Table 1 50
2/26/2009 1 Chair 1 100
2/27/2009 1 Table 1 50

This is all we need because, given the CustomerID, we can look in the Customer table, find the record for that customer and get all information about that customer.

This concept of using a key value to point to a row in another table is known as a relationship.  We say that the Customer table is related to the CustomerPurchase tab table. 

This type of relationship is known as a one-to-many relationship, every customer may have many orders.  In this type of relationship the table with one row is known as the parent and the table with (potentially) many rows is known as the child table.  

This relationship is typically represented by a drawing similar to the one below.

Organizing data in this way can make storage of that data far more efficient and flexible.

Thursday, February 26, 2009 1:17:33 PM (GMT Standard Time, UTC+00:00)
# Wednesday, February 25, 2009

Episode 9

Mike Wood is the Lead Director for the Cincinnati .Net User Group.  He and I spoke about how to build a strong community and what makes the Microsoft Heartland Community so special.

Wednesday, February 25, 2009 11:53:42 AM (GMT Standard Time, UTC+00:00)
# Tuesday, February 24, 2009

In this article, we will define a database, a table and the main parts of a table - rows and columns.

A database is an organized (or structured) collection of information.

Most companies spend a lot of time and effort collecting information and storing it somewhere, but not all that information is organized, which makes it difficult to retrieve anything relevant later on.  A database adds structure to the information making it easier to maintain and query it.

Database engines like SQL Server provide a structure to organize data in a way that makes sense to a user.  Specifically, SQL Server uses a relational model* to organize its data.

In a relational database, data is partitioned into tables.  Tables are a way of data storing data in rows and columns, kind of like in an Excel worksheet. 

Figure 1 - Data in an Excel workbook

I've always found this rectangular view of data very intuitive.

Just as in a workbook, each table row represents a discrete record.  All information in that row serves to describe the row.  

Similarly, a table column is a placeholder that describes a single attribute for each row.  The advantage SQL Server has over Excel is that you can easily place rules onto a column, restricting the type of data that can be stored there. 

If a SQL Server column is designed to hold a date, a property of that column can be set to throw an error if a person or program tries to store a string.   We can set up such restrictions for many data types, so that a column can be restricted to allow only integers, only TRUE/FALSE values, or only binary objects.  We can even restrict the maximum length of a string or require users to always enter a value into a column - all simply by setting properties on the column.**

For example, a table named "Customers" might be used to store information about your company's customers.  Each row in this table would represent a single customer.  Each column would hold an attribute of that customer, so you could create columns such as FirstName, LastName and StreetAddress that would hold the appropriate values for each customer. 

Figure 2 - Data in a SQL Server table

Looking at the first row, gives us information about the customer.  It should be obvious that this customer has a first name of "David", a last name of "Giard" and an address of "123 Main St".

*SQL Server does provide some non-relational ways of storing data but those are beyond the scope of this article.
** It is possible to configure Microsoft Excel to restrict data input, but this task is relatively advanced and far more easily accomplished in SQL Server.

Tuesday, February 24, 2009 6:14:48 PM (GMT Standard Time, UTC+00:00)
# Monday, February 23, 2009

Episode 8

Brian Prince joined Microsoft last year as an Architect Evangelist.  In this interview, he talks about the experience and how it is different from any company he has worked for in the past.

Monday, February 23, 2009 11:30:08 AM (GMT Standard Time, UTC+00:00)
# Sunday, February 22, 2009

I deal with a lot of smart, passionate people with years of experience in technology. 

I've noticed that these folks love to talk about and write about the next generation of software and advanced topics in software development.  This is great because it gives me a chance to learn new things from the smart people in my life. 

Unfortunately, not everyone is ready for advanced topics.  Developers who are just starting their careers need to understand the basics of languages, programming constructs and relational databases before diving deeper into these and other areas.  These basic topics are often less interesting to experienced developers but they are vitally important.  And who is better capable of explaining them than an experienced developer or architect?

I'll address this partial vacuum in a new Back To Basics feature on this site.  In this feature, I'll explain some fundamental concepts of software development, assuming little or no experience on the part of the reader.

The feature begins this week with a set of articles entitled SQL Server 101, in which I'll describe the basics of SQL Server and relational databases.  This will tie in with a talk I'll be giving at the West Michigan .Net University April 4 in Grand Rapids, MI.

Sunday, February 22, 2009 4:42:15 AM (GMT Standard Time, UTC+00:00)
# Friday, February 20, 2009

Episode 7

Microsoft Architect Evangelist Darryl Hogan loves his job.  In this interview, Darryl explains what he does why it rocks.

Friday, February 20, 2009 11:09:40 AM (GMT Standard Time, UTC+00:00)
# Wednesday, February 18, 2009

Episode 6

In this interview, Joe Fiorini discusses how he and his team created the meetinbetween.us application that won the 2008 Rails Rumble contest.

Wednesday, February 18, 2009 11:44:20 AM (GMT Standard Time, UTC+00:00)
# Monday, February 16, 2009

Episode 5

BizTalk Server is one of those products that many people have heard of, but few are familiar with. 

In this discussion, Monish Nagisetty briefly and clearly explains the purpose and uses of BizTalk Server messaging

Monday, February 16, 2009 1:35:42 PM (GMT Standard Time, UTC+00:00)
# Thursday, February 12, 2009

Episode 4

I've attended two conferences where Alan Stevens helped to make Open Spaces a success.  In this interview, Alan describes open spaces technology and explains his role in the process.

Thursday, February 12, 2009 3:28:00 PM (GMT Standard Time, UTC+00:00)
# Tuesday, February 10, 2009

Episode 3

In this interview, Jason Follas explains spatial data types, which were introduced in SQL Server 2008

Tuesday, February 10, 2009 3:16:49 PM (GMT Standard Time, UTC+00:00)