Thursday, August 7, 2014

SQL Server - Coding guidelines

(Note - this is the actual article from the last post, rather than a link to my LinkedIn article)

Let's talk about best practices around writing SQL Server code, specifically in a project (which means - not just an ad-hoc t-sql script)

This post goes through various SQL (Server) code design guidelines and best-practices.

Following these guidelines, we are able to achieve:
- Cleaner, more readable code
- Getting the most out of Database Projects (SSDT)
- Keeping a standard, which makes our SQL Code clear and easily readable across different teams/organizations (by following a similar pattern)

Background:

Existing resources on databases design are usually focusing on relational tables, keys, relationships, indexes, performance and other topics which share one thing in common: Schema & Data.
While this is probably the most important aspect of a relational database, most of today’s SQL databases contain code, which in many cases, lack a decent design best-practices rules, unlike its data-related objects.
Over the years, in various SQL-Related roles such as Developer, Dev Manager, Architect and consultant, I have come across numerous SQL Projects and was often surprised by the lack of common interface and by the different variations each company/team used to write its SQL Code.
These are the main reasons for forming this document.

Using:

We are using Visual Studio SQL Database Project (a.k.a. SSDT).
Although many of the guidelines can be used on a different environment, SSDT (at the point of writing this document), is the most complete SQL Project container, compiler and publisher.

Samples:

Many of the examples given in this article are taken from our sample SQL Server Database Project: UserManagementSampleDatabase
You can download the sample project HERE
Project structure:
There are two different methods you can use for your project;
- Schema First
- Object Type First
When creating a new empty database project and choosing to import from an existing database, Visual Studio will use the Schema-First structure (see screenshot)
Visual Studio Project Structure
Our sample project is using the Object-Type first, as we believe it is easier and more readable to have all tables under the same branch (for example)
Having that said - this is not a strong recommendation.

Our Project Structure

Coding guidelines:

General concept:
Your SQL Server stored procedures code is not an ad-hoc scripting workarounds, it is a code, part of your enterprise applications and should be treated as such.
This is rather than an ad-hoc scripting language.

Covered objects:

All coding guidelines can be applied to SQL Stored Procedures, Functions, Triggers or any other SQL language script.
This excludes .Net CLR objects which are written in different languages (c# / vb.net) to which a same language coding guidelines should be applied.

Headers

The reason for a descriptive header is to allow some generic information which can be easily viewed on the server itself.
While the true source of record for all changes is usually your source control, it is important to understand that the database code is often being accessed on premise, by users who do not have quick access to the source control.
Seeing a brief explanation and change log may help these users quickly understand the purpose of the procedure.
A typical example is when DBA’s connect to your Production SQL Server in order to troubleshoot a recent issue. They typically do not have access to the developers’ source control, however while going through the procedures which are involved on the incident and seeing a quick change-log, they are usually able to easily pinpoint the problem (a classic: “what’s different” investigation approach)
A suggested header structure is:
  1. Object Drop/Recreate, which cannot run within SSDT, but good to keep whenever this procedure is being executed manually,
    followed by a simple one line procedure overview.
  2. Change-log which contains the “when”, the “who” and the “what” of the change
  3. A sample, which may be the same test-script you later use for the Unit-Testing project
Here is a sample header for a stored procedure:
Note: More samples can be found in the sample database project.

(Over) Sizing

As a generic guideline, a stored procedure should not exceed ~50 lines (excluding header).
Long procedures are typically harder to read & understand and in most cases can, and should be refactored to multiple smaller procedures. This is also relevant from a usability point of view.
Try to keep your code a small and atomic as possible. When a procedure exceeds the recommended size, consider refactoring, splitting logic to child procedures.

Naming Conventions - Do’s

Consider using schemas as a limited (yet better than none) solution for namespaces.
Although SQL Server can’t handle multiple namespaces such as App.UserManagement.Users.Save(), we can still create a UserManagement schema under which we place all the user-management-related object.
This is both more readable and in addition - easier to use for IntelliSense interpretation

Naming Conventions - Don’ts

Following the above guidelines, do not name your procedures with the same prefix ‘indicating’ that they are indeed - procedures (or functions)
Being more specific, do not use [sp_*] and/or [fn_*] which, in addition for being an invalid naming convention will also affect performance (reserved prefix which causes SQL Server to look for an instance in master database on every execution call)

User-Defined table types (UDTT)

We recommend using udtt as input parameters in the following cases:
- Input may contain multiple rows
  • (use instead of either xml or a delimited string)
- Procedure with large number of input variables
- When a set of the input variables is passed to another procedure
  • On which case - it is better to pass the entire @udtt rather than calling a child procedure while passing multiple input variables (again, mostly from readability point of view)

#Temp Tables

Temp tables can be used to pass data between stored procedures executed within the same session.
While this can be a rather simple way to pass data, the general recommendation would be to pass the data as UDTT rather than *assume* a certain temporary table exists and call it in a stored procedure which was not responsible of creating it.
Note: this guideline does not reference the usage of udtt vs. #temp in queries, where in many cases leans towards temp tables which may carry better execution plans.

Private/Public (Accessibility Levels)

The concept is to let the user (i.e. the procedure executor) know which stored procedures should be called, vs. which stored procedure are considered ‘child’ or ‘internal’ procedures that are being executed through the ‘public’ procedures.
In common programming languages we user the Accessibility Levels in a class, which is designed for that matter.
However, the closest thing to Accessibility Levels in SQL Server would be roles/permissions, which is somewhat problematic to use since when a procedure is executed by a user, the permissions for any ‘child’ execution is inherited.
With that in mind, there are two recommended approaches:
- Begin ‘public’ procedures/functions with a capital letter (PascalCasting), and ‘private’ with a lowercase letter (camelCasting).
(ex: UserManagement.UserDetailsSave for public & UserManagement .userPhoneNumberSave for private)
(This example appears in our sample database)
- Create two schemas for public/private. (ex: UserManagement.UserDetailsSave for public & userManagementPrivate.userPhoneNumberSave for private)
It is important to note that both options do not serve as restrictions (i.e. prevent the user from executing anything) but rather to display a clear picture of what should be executed.
To physically prevent a user from executing a procedure refer to Roles/Permissions on SQL Server.

Backward Compatibility / Versioning

Stored procedure versioning (i.e. creating multiple instances of the same procedure whenever the signature changes to ensure backward compatibility) is not recommended.
When we need to change a method we do not typically clone it, unless absolutely required, and the same logic should apply here.
In many cases, writing a correct code which works with our database prevents backward compatibility issues. This may include:
- Calling a procedure using @variable=value instead of relying on variable position within the procedure
- Allowing a more flexible returned DataTable - meaning that if a new/addition column is returned, the code doesn’t break
- In case of @UDTT input variables, consider using a function that returns the UDTT structure. (Example is provided within the sample project)
And from the procedure side
- Avoiding “SELECT *” as a return value (which is being affected whenever one of the participating tables changes)
- Avoiding OUTPUT @variables as return values

Keywords

Schema & object names using reserved keywords should be avoided.
When absolutely necessary, reserved names must be wrapped by [square brackets], even when no error occurs otherwise.
(Bad) Example: CREATE PROCEDURE [Version].[Procedure]

Inline Comments

In addition to the object header, here are some guidelines around inline comments:
- Use in order to briefly describe a part of the code that may not be easily understood
-- In case @UserId is still null, try to search on Inactive Users table
IF @UserId IS NULL...
- Use “--“ for a single line comment and /* … */ for a block content
- Don’t “over-comment” the obvious
i.e.
-- We're going to select UserId and UserName from our Users table
SELECT UserId, UserName FROM Users

Logging

At the present point of writing this document, many of the “logging” and/or “debugging” inside stored procedures are done using “PRINT”
This is an incorrect usage and is basically missing both purposes.
It is highly recommend to create a simple logging table for logical messages.
There are pre-existing logging platforms such as Log-Table Viewer (link). A logging table/sproc is also provided within the sample project.
If you do need an immediate console return - use RAISERROR (with nowait) instead of PRINT

Returning Results

There are quite a few approaches for getting a result back from an executed stored procedure. These include:
- Using the RETURN exit code
- Using OUTPUT @variables
- Returning a table/s
The high-level recommendation is to return a table. (DataTable or DataSet on the caller side)
Using OUTPUT variables may also be a valid option but may be a bit tricky from a backward compatibility point of view.
The reason is that output variables always need to be the last, which means that if the procedure needs new variables is a newer version - the output variable/s are being pushed.
As long as all code against this procedure does not rely on the order (i.e. populated by @variable=value) you are safe. Otherwise - consider either returning a table and/or refactor your client code.
Note that not all stored procedures need to return anything (for example, insisting on having the last line as “RETURN 0”). In many cases - executing without getting an exception thrown means a successful execution.

Error handling

Following both output & logging paragraphs above, we recommend using the following guidelines for handling errors:
- To handle errors, use the “BEGIN TRY”/”BEGIN CATCH”/”THROW” (2012 and up)
- Only use error handling when an additional action is made in the error handling section, otherwise have the caller handle the exception
- Some errors may be reported using the logging mechanism described above. Keep in mind that if your error handling includes rolling back a transaction, you should log the error AFTER it is rolled back.
- Remember: returning the entire exception (either by not handling or by using “THROW;” at the handling section) is usually more helpful for the calling app rather than formatting your own textual message.
- Using RAISERROR is not recommended

References (project/.dacpacs)

SSDT provides an easy (database) reference usage. A database reference can either be another project on the same solution, or any compiled .dacpac file.
A reference can be embedded as:
- Different Server
- Same Server, different database
- Same Server/Database
The choice whether to use project or .dacpac reference is very similar to choosing between a .dll vs. hosting a class in your .net code.
In general, the recommendation is that if the reference is not directly and exclusively exists under the same project (in a logical concept) - use a .dacpac
For example - many developers create their “Common” SQL Library for repeating tasks. Such class should be referred as .dacpac from any used project.
You can find an example in our sample project where LoggerSupport.dacpac is being referred to (configured as ‘same database’)
Note: if you are using a .dacpac reference and actively work with Debug/Release environments, you may need to change your project file manually to automatically point at the correct referred .dacpac according to the build configuration.
In our sample project, this means changing UserManagementSampleDatabase.sqlproj to the following:
- Current:
  • <HintPath>..\LoggerSupport\bin\Debug\LoggerSupport.dacpac</HintPath>
- Modified:
  • <HintPath>..\LoggerSupport\bin\ $(Configuration)\LoggerSupport.dacpac</HintPath>

Unit-Testing

Visual Studio offers an integrated database unit-testing environment.
The sample project contains a unit-test sample. Some recommendations include:
- Recommended:
  • Create a separate class for each procedure (rather than containing all tests in a single class)
  • Name the class as the procedure name unless
  • Multiple tests for a specific procedure should be created under the same class
  • Test name should be declarative
- Alternative:
  • Create a class per a logic group of testing, for example (test names colored) :
    • User
      • UserDetailsSave
        • Add User
        • Update User
        • UserDetailsDelete
          • Delete existing user
          • Delete non-existing user
References: