Constraint Controversy

2008-12-17 - General, tSQLt

It wasn't until recently that I ever thought to do away entirely with database constraints. Aside from appropriate normalization and writing fast SQL code, much of the time database professionals spend is on figuring out exactly what constraints to put on their schema. We're taught by textbooks and by some bad experiences that database constraints are a good idea. But how do they fit in with our new, agile view of the world?

Someone told me that they no longer used database constraints because their code that accessed and modified the database was thoroughly unit-tested and, furthermore, constraints were a form of duplication. I had to hold back on my initial reaction; which was, "Are you nuts?" I really wanted to give this some due thought before making any conclusions. This involves considering why we have database constraints in the first place, and some reasons not to use them. My suspicion is that the right answers exist somewhere in the middle.

Constraints for Preventing Defects

I learned the value of constraints early in my career when I was burned badly. The project was well underway when I started and involved a series of web pages built on top of a database with no foreign key constraints. In fact, the only constraints it had were primary keys. Data errors were common as the page logic inserted, updated and deleted directly against the database tables with statements often written to execute in the wrong order. This manifested in some rather troublesome, and irrecoverable production defects. Only after implementing appropriate database constraints were we able to systematically locate and repair the defective code.

The lesson I learned from this, is that database constraints are one way to help prevent defects in application code. This is a defensive programming technique. Database constraints were, in this case, being implemented because the application code was not trusted. There are several agile techniques for increasing the trustworthiness of our application code, including test-driven development, continuous integration and pair programming. In hindsight, using these agile techniques would have probably prevented these types of defects and also improved the team's understanding of software development.

Constraints Duplicate Business Tier Logic

In software development we know that it is best not to repeat ourselves. Anytime we duplicate logic, we open ourselves for mistakes. If we need to fix something in two places, some day we'll forget and only fix one of those places. For most applications relying on a database, constraints are implemented in some form in the business tier; often as part of the relationships between objects in an object-oriented program. This means that if we decide a constraint is obsolete (or just wrong), we have to fix it in our object code and in our database definition.

This can lead to defects which we would not encounter otherwise. It can also slow down refactoring, making our design more rigid. As an application becomes more complex, this can lead us to forgo making schema changes and being stuck with a design that no longer meets our needs.

Some constraints may be difficult or more costly to implement in the business tier however. For example, it may impose extra design on your object model to implement uniqueness checks and relational integrity checks. The database system will likely implement these constraints with better performance than your object model because it has been optimized to do so and it is closer to the data to be checked.

Checking For Sanity

Even with all the great unit tests we're writing into our object code, we still make mistakes. This means that our automated test suite should not be our only fail safe against defects. In fact, we know it is still wise to test our application as a user would use it, to shake out problems during development. It is also necessary to have qualified testers employ testing techniques such as looking for boundary conditions, input/output class testing, and others. The need for these tells us that we know our unit tests are not going to catch everything.

In fact, a keen observer would note that TDD is much more about design than testing. It may very well be appropriate to use a design-by-contract approach to define the interface between the database and object tier. Part of this contract is proper storage of data, and this is enforced in the database using constraints. On the object side, classes may check the values of the inputs to their constructors and setter methods using assertions. While adding extra maintenance work if the design needs to change, constraints and contract assertions add a layer of defect prevention and isolation.

Tool Support and Query Optimization

Constraints can also be used as a tool for understanding the database schema. They help communicate the intent of the relationships between tables and specify restrictions on columns. Persistence frameworks can request the constraint information from the database to help in automating creation of SQL statements. Constraints can also be used by tools to visualize the structure of the database. For example, consider rendering a schema diagram that did not show any of the relationships between the tables.

Query optimizers implemented in the database engine can also utilize constraints when developing query plans. This is especially true for unique constraints, which provide the optimizer with a hint as to what it can expect from the data.

Constraints Getting in the Way of Creating Test Data

Consider a stored procedure or a class method that relies on data from a single table. We'd like to test-drive the design of this procedure or class and need to put some test data into the table1. Suppose this table has several foreign key constraints. We would need to put data into these other tables. And if these other tables have foreign key constraints, we might have a lot of setup work for a method that does not need to know this much about the schema. Constraints simply get in the way of test driven-development.

This, for a long time, was a compelling argument against database constraints. However, using the fake table technique, as described here, we can test our database procedures in isolation.

Central Repositories and Multiple Applications

A well tested business tier implementation of constraints can often substitute for database level constraints. However, we do not always have control over the business tier. Many databases are repositories used by multiple applications. It is possible that our own development teams may not be implementing all of these applications. These databases may even be shared by multiple organizations with their own development techniques. In these situations, it seems prudent for the database developers to enforce constraints at the database level.

If we own the data or the database implementation, then we should also own the responsibility of data integrity. It is not always reasonable to assume that all development teams are using adequate testing. In situations like these, it seems best to view the database implementation as a product. The product must care for its own internal data integrity.

In the Middle

It is easy to see that the decision to use constraints is like any other design decision: it depends highly on the context in which you're developing. For small, lightweight applications that utilize well tested data access frameworks, agile techniques may replace the need for database constraints. At the other extreme, enterprise databases updated by multiple applications and possibly utilized by several organizations may require database level constraints.

The rest of the world falls somewhere in the middle. Decisions must be made on an project-by-project basis, and perhaps, on a constraint-by-constraint basis. The above discussion should provide some guidance on how to make these decisions.


1. Most of the time when testing a method in a class, we would mock out the database layer. But there are times when we would prefer to put data into a table, for example as an integrated test.

 

Categories: General, tSQLt

Leave a Reply