# Tuesday, 23 June 2015

The past few years, I've heard a lot about something called NoSQL. Some people really love it. Those who love it, talk about its lack of ceremony and the speed with which you can develop and the speed with which it reads and writes and its scalability. It sounds all sounds so awesome!

But I grew up on relational databases. My first computer language was FoxPro, which included a relational database and supported a powerful version of SQL. From there, I graduated to SQL Server and I've dabbled occasionally in Microsoft Access. I've even worked with Oracle and MySQL and, as a developer, I find them intuitive. Should I abandon the databases with which I am familiar and travel to this brave, new world of NoSQL? Is NoSQL the best solution for every project? For some projects? How do I know?

Let's start with a definition for NoSQL. This is harder than you might think, because NoSQL databases are basically defined by what they are not. The only real definition is that they are not SQL databases. They tend not to have pre-defined schemas; they tend not to enforce relationships; and they tend to be able to store hierarchical data; and, of course, they tend not to support the SQL language (although some support syntaxes similar to SQL, such as LINQ). These are broad definitions and only address things that NoSQL databases don't do. There is no standard language, syntax, storage mechanism, or API for addressing NoSQL databases.

For purposes of this article, I'll define SQL databases as those in which the database engined provides the following features:

  1. Supports SQL
  2. Enforces pre-defined schemas
  3. Enforces referential integrity among related, normalized tables

This includes database engines supported by large companies, such as Microsoft SQL Server and Oracle, as well as Open Source databases, such as MySQL.

I'll lump all other persistent storage technologies as NoSQL databases. This includes MongoDB, RavenDB, Azure table storage, and DocumentDB.

So when should we choose good old SQL databases and when should we use this newfangled NoSQL thing?

Let's start with SQL databases. They have a few advantages:

SQL databases

Advantages of SQL DBs

First, they are relational, so they make it easy to normalize your database into a set of related tables. This almost always saves disc space and often makes your data more consistent (e.g., you can change the name of a product in one table and it changes throughout your entire application). Databases like this also allow you to create persistent relationships between these tables and these relationships enforce referential integrity, ensuring that we are not left with orphaned records (Who wants an order line without a corresponding order?)  You can set up cascading deletes or force users to create and delete records in an order that will never have inconsistent data.

The SQL language itself is very flexible, allowing users to create either pre-defined or ad-hoc queries against a relational database. This makes SQL databases great for reporting.

The schema in a SQL database helps catch errors in almost the same way that a compiler or a unit test does. If you want to capture a customer's last name and you create a “LastName” column, but one time you accidentally misspell it as "LastNmae", the database will catch this and throw an exception which should be early and obvious enough for you to fix the error.

Disadvantages of SQL DBs

But these features come at a price. There is overhead in enforcing database schemas and referential integrity. As a result, saving to a SQL database tends to be slower.

Also, when developers build an application intended for human interaction, they almost never structure normalize the application's objects in the  way that they normalize the data in their relational database. An entire class of Object Relational Mapper (ORM) software exists simply to deal with this mismatch. It requires code, time, and CPU cycles to map between objects in an application and data in a database.

NoSQL databases

Advantages of NoSQL DBs

Because NoSQL databases don't need to enforce schemas or relationships, they tend to perform faster than their SQL cousins.

Database development tends to be faster because developers and DBAs are not required to pre-define the columns in each table.

The lack of database relationship enforcement also makes it easier to move parts of a database to another server, which makes it easier to support very large data sets. Relational databases can move across servers, but it tends to be more difficult because of their need to enforce referential integrity.

The lack of schema also adds flexibility, especially if you are capturing data in which different objects may have different properties. For example, a product catalogue table may contain some items (such as computer monitors) for which diagonal size in inches is an important property, and other items (such as hard drives) for which capacity in GB is an important property. Mapping these disparate needs to a relational database table would be add complexity to your data model.

Finally, it is possible to serialize and de-serialize objects in the same format that they are used in an application's user interface. This eliminates the need for an ORM, which makes applications simpler and faster.

Disadvantages of NoSQL DBs

When reading data, NoSQL databases tend to be very fast, as long as you are looking up rows by an index or key. If you want to look up a row by any other property or filter your data by a property, this often requires a full table scan, which is very slow. Some NoSQL databases allow you to create index on non-key rows, which speeds up such searches but slows down data writes - decreasing one of the advantages of NoSQL.

Other factors

It's worth looking at the cost of any database solution. For example, Azure provides both SQL and NoSQL databases as a service. If we compare the cost of Azure SQL Database with Azure table storage (a NoSQL option), we can see that the price of table storage is far less than the cost of SQL Server. Table storage might not be the answer for your application, but it's worth examining whether some of your data can work with Azure table storage.

Conclusion

As with most questions facing IT developers, architects and managers, there is no clear-cut answer to whether to use SQL or NoSQL databases. SQL databases tend to be better when ad-hoc reporting is required, while NoSQL databases tend to shine when saving and retrieving transactional data from a user application. Many applications take advantage of the features of both database types by creating a NoSQL database with which their application interacts; then transforming and regularly copying this data into a relational database, which can be queried and reported on.

There are many options for your persistent storage needs. Choose the right one for your application.