This post is my contribution to T-SQL Tuesday #20, hosted by Amit Banerjee (blog | twitter). This month’s topic is “T-SQL Best Practices”.
In OOP it is a common best practice to use interfaces. Interfaces provide a layer of separation between the parts of the system. It also provides a contract layer. There are several advantages in using interfaces, most importantly the ability to hide implementation details of a single module from other modules that are using it. For a good introduction into this topic, check out Interface Oriented Design by Ken Pugh.
SQL Server does not provide a language construct that allows for the use of interfaces in the sense of an OOP language. However, in some areas it allows us to get close.
One of the best practices that I strongly recommend for every project to follow is the use of an interface layer between the data and the application. In particular, application code should never directly read from or write to a table. Instead, all access should go through views or stored procedures.
What do I gain by doing this? The biggest advantage is that in an implementation like this, I can change the table structure in a live system while hiding the changes from the application.
There is only one thing certain in software development, and that is that there will be change, so it is a good idea to be ready to implement changes.
While you can change out the application code in a matter of seconds, there are changes in the database that will take a substantial amount of time. Think about a simple change like moving the phone number from the Person table into a newly created PersonPhone table because the system now allows for multiple phones per person. The old version of the application expects the phone number field to be in the Person table. The new version expects it to not be there. So at some point you have to take the database offline, copy all the phones over to the new table, drop the old column and then bring the system back online. At the same time, the application upgrade needs to happen.
If the application has an SLA that does not allow for such a downtime, you are stuck.
However, if the access goes through a view, you can implement the necessary changes to the table structure behind the scenes while the application still sees the old picture.
I am not saying that implementing a change like this becomes trivial if you use views. Instead, I am saying that an interface layer like a view makes it a lot simpler.
In addition, if you use a version specific schema or schema prefix you could even have both versions live while you go through the app servers and upgrade them one by one.
The rest of this post goes through an example to show what an actual implementation of this pattern would look like.
The following code listing creates a tbl.Pers table and a view v001.Pers that selects from it. There is also a statement to fill the table with some random data using Itzik’s GetNum function, to have some data in there for testing.
CREATE SCHEMA tbl ; GO CREATE SCHEMA v001 ; GO CREATE TABLE tbl.Pers ( PersNo INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, FirstName NVARCHAR(60), LastName NVARCHAR(60), PhoneNo VARCHAR(20) ) ; GO CREATE VIEW v001.Pers AS SELECT * FROM tbl.Pers ; GO INSERT INTO tbl.Pers ( FirstName, LastName, PhoneNo ) SELECT CAST(NEWID() AS NVARCHAR(60)), CAST(NEWID() AS NVARCHAR(60)), CAST(ABS(CHECKSUM(NEWID())) AS VARCHAR(20)) FROM dbo.GetNums(1000)
If you now run a select against the table and one against the view you will see the same execution plan:
The same is true for an update:
The view gets eliminated by the optimizer and therefore there is no impact on the execution time of your queries.
However there will be a small impact on the compilation time.
The view is actually doing a SELECT * from the table and there is a reason for that. You are probably aware of the best practice to always specify a column list. I am not going into details here as to why you should follow that best practice.
Also, the purpose of the view in this example is to build an interface. One of the things an interface should do is to specify how it can be used, and a * does not help with that. So there seem to be a lot of reasons to specify the column list in the view.
However, you are not going to change all the tables all the time, so most of the time the views are just going to reflect the table columns. By not specifying the list in this context you make the life of the optimizer (specifically the binding stage) a little easier. Every time a query using this view gets compiled the view text needs to be parsed and included into the text of the query so that the execution plan can be built. The binding stage is responsible to link the query to the objects and check object and column names. The less text it has to read through, the quicker it can do its job, so the * in this case can be seen as a performance optimization.
The preceding paragraph also points out that compiling a query using such a view is potentially slower than a comparable query not using the view. That means you need to make sure not to use too many non-cacheable 1 ad-hoc queries to avoid having to compile your statements too often. But that is another well-known best practice anyway. Even if you can’t get rid of all of them, the impact of the views will be comparatively small, as the output of the binding phase, the algebrizer tree, can be cached for views.
One last thing you need to be aware of: Before you actually implement any change to any of the tables, make sure to change their views to now mention the column names, so that a change to the table structure does not become immediately visible to the users of the view.
You must be logged in to post a comment.
Pingback: T-SQL Tuesday #20 Wrap-up and a few smiles « TroubleshootingSQL