Wednesday, November 19, 2014

How to use Excel and tSQL to help you Generate Code

Introduction

It's amazing to me the reactions I get when I show people seemingly simple tricks on how to automate or generate code and/or script generation. Whenever I see a pattern, that means there is a way to simplify or automate it. Here are some very simple tips and tricks to help you get some of that mundane and repetitive work out of the way quickly.

Using Excel's Concatenate Function

The CONCATENATE function is a nice hidden gem that many people apparently don't know about, this has helped me get many things done quickly and because Excel is tabular, it is very easy to work with and hone in on specifics using formulas. The best way to understand how to use this is by example of course.

Simple Example

If you were not aware, everything in Excel uses VB, you can even write your own VB Modules (Static Classes) and use them in your sheets, therefore some of the syntax may be a little strange or familiar to you depending on how much legacy coding you have done. This example is showing three very simple ways to use the concatenate function.


The formulas used for Columns C through E are as follows:

' Column C: Concatenate the names into "First, Last" format
=CONCATENATE($A2, ", ", $B2)

' Column D: Concatenate the names into "Last, First" format
=CONCATENATE($B2, ", ", $A2)

' Column E: Concatenate the names into "First Last" format, but only use the first letter of each to produce initials
=CONCATENATE(LEFT($A2, 1), LEFT($B2, 1))

This is a very simple example of how to use the concatenate function, as you can see it simply allows you to append (aka concatenate) cell contents. If you have never seen this before, let your imagination run wild with what you can do with this. Imagine being able to export data from SQL Server into an excel sheet and playing with the data like this. Go nuts.

SQL Scripting Examples

Now that you know how to concatenate cells together, think about how much easier it is to manipulate data in a database when you are trying to clean up your data or insert data for a first time.

Inserting New Data


Let's assume that there is a Person table that you are creating, it has 4 columns in it (ID, First, Last, Age) and you got your data from someone in an Excel sheet or managed to just put it in an Excel sheet yourself. You can generate the insert statements when you are happy with your column names and data.

Column E is hidden on purpose for this example
The formula used for Column F is:

' Column F: Creating an Insert Statement for sheet data
=CONCATENATE("INSERT INTO Person (First, Last, Age) VALUES ('",B2, "', '", C2, "', ", D2, ");")

As you can see that formula is a little long and hard to read, but the pattern is simple - you are stringing the columns together and formatting them according to data type; just like you would if you were writing it by hand in SSMS. Please note that the ID column (Column A) is assumed to be sequential integers and that the Person table has an auto-incrementing identity column named "ID".

Updating Existing Data

Let's say you made a mistake with some of the data upon insert and now you need to update that data. In the image below you will see that Column D is the old data and Column E is the corrected data. I kept column D in the shot so you can see the differences.

Column F is hidden on purpose for this example
The formula used for Column G is:

' Column G: Creating an Update Statement for sheet data
=CONCATENATE("UPDATE Person SET Age = ", E2, " WHERE ID = ", A2,";")

As you can see, you can easily fix your data now by mass producing these update statements.

General Precautions

Be very careful when producing these kinds of statements and make sure to triple check your statements before executing them. Work with a single statement to make sure it works first before applying the formula to an entire column or group of cells. There have been times where I don't realize I have either referenced the wrong cell or downshifted a cell by accident which leads to trouble.

Keep in mind these are single hits, this is not intended for large volumes as your DBA would probably kill you if he found out that you were executing several hundred statements in this format. This is more for smaller sets of data, where writing complex SQL Statements just isn't worth the hassle. This methodology allows you to bypass syntax restrictions and annoying table joins most of all.

Excel Knowledge Milestone

If you have not seen this before, then I hope your mind is blown and you can go nuts with it. I showed SQL generation examples, but this can be used for literally anything. I have generated classes and HTML markup using this methodology. Make sure to explore the other functions in Excel that will help you along with your coding.

On an Aside: Soft Quotes in Formulas

This is a rather annoying thing to find out the hard way, if you need to include a soft quote ["] in your formula as text, then you must escape it. 


In order to escape soft quotes you need to double up on the soft quotes. In other words, for every soft quote you must add a soft quote if you want it to be a literal. Here are the formulas for the above:

' Row 2
=CONCATENATE("I am totally """, A2, """ to your complaints.")

' Row 3
=CONCATENATE("I do not """, A3, """ about your cat photos.")

' Row 4
=CONCATENATE("I need these Jennifer Lawrence photos for """, A4, """.")

Using tSQL's Concatenation Operator

This methodology is very useful when you have all of the data you want to manipulate right in SQL Server and instead of using Excel, you can just work directly in SSMS using tSQL to get your work done. The difference here is instead of using a function or method - we will be using a tSQL operator which is the plus sign "+"

This operator of course will do different things depending on the context. Therefore tSQL is not forgiving (by design) in this respect, where as in Excel it didn't matter what your data was. In Excel your data will be converted to a string automatically and appended together into text. Therefore keep in mind that you must convert your non-string values into strings in order to concatenate them together. Ok well let's dive right into this shall we?

Simple Example

For the sake of this example let's assume the Person table exists in a Dummy database. This Person table is an exact match from the previous section. In fact, I used those insert statements to insert the data. Here is the SQL for the Person table:

CREATE TABLE [dbo].[Person](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [First] [varchar](50) NOT NULL,
 [Last] [varchar](50) NOT NULL,
 [Age] [varchar](50) NOT NULL
) ON [PRIMARY]

Since the table has data in it, I will go ahead and re-perform the simple example from the previous section, but using tSQL this time:

Execution results of the query below
This virtual table was achieved by executing the following SQL:

SELECT
 FirstLast = First + ', ' + Last,
 LastFirst = Last + ', ' + First,
 Initials = LEFT(First, 1) + LEFT(Last, 1)
FROM Person

Notice that this example is basically exactly the same as the simple example in the previous section.

SQL Scripting Example

Since the data is already inserted I am going to skip that as an example and go right into the update scenario. Let's assume that you want to age everyone by one year, then you can do the following:

Execution results of the query below
This virtual table was achieved by executing the following SQL:

SELECT
 sqlUpdate = 'UPDATE Person SET Age = ' + cast((Age + 1) as varchar(10)) + ' WHERE ID = ' + cast(ID as varchar(10))
FROM Person

Notice how the integers have to be casted as varchar in order to be concatenated properly. Now this isn't exactly the best example because realistically you can just write that update statement directly in SSMS and execute it in one shot, but this is just a demonstration to open up the topic. If you have not seen this before, then this should open up several doors for you. Some people just don't think to do this is all.

Generating an Enumeration from a Lookup Table Example

This is a stronger example of how to use tSQL to generate something more useful for you. Let's assume that you have a PersonType lookup table in your Dummy database as follows:

CREATE TABLE [dbo].[PersonType](
 [PersonTypeId] [int] NOT NULL,
 [Description] [varchar](50) NOT NULL
) ON [PRIMARY]

This table describes different types of people, but you want to code against it as an enumeration. In that case then why bother writing it by hand if you have a lookup table already. The following screenshot shows the data and enumerations written out already:

Execution results of the query below
This is accomplished by running the following SQL:

SELECT
 PersonTypeId,
 Description,
 Enum = Description + ' = ' + cast(PersonTypeId as varchar(10)) + ', ' 
FROM PersonType

The only thing you have to do now is copy and paste the output into Visual Studio and surround it with enumeration syntax. Don't forget to get rid of that last comma.

Bonus Tip (If you didn't know already that is)

You can display a lot of information about a table by just highlighting its name and pressing "Alt + F1". This is by far one of the most useful things you will ever use in SSMS. This is unique to SSMS in that it is a keyboard combination trick. You can quickly remind yourself of a table's structure or column size without having to use the object explorer and it keeps your hands on the keyboard which is good for productivity.

Example using the Person table:


tSQL Knowledge Milestone

Once again, if you did not know about any of this, I hope that this helps you get your mundane portions of your work done more quickly. Honestly there is no point in writing out really boring repetitive stuff when you don't have to. Explore more ways to work with tSQL Concatenation - there are a wonderful number of functions provided by Microsoft for dealing with string manipulation in tSQL.

Conclusion

Work smart not hard. These are all tricks I have picked up over the years from other people and things I have discovered on my own. Many of these things are what have compelled me to work on the Simple Class Creator that I produced, because once again, "If there is a pattern - then you can automate it!"

Tuesday, November 18, 2014

TSQL: Passing objects to stored procedures via XML

Introduction

This is a subject I have heard about repeatedly, but I never bothered looking into it until recently. I was told that writing hard coded insert and update statements is bad practice. To which I responded with "Okay, I am not using an ORM here, then how should I be doing this then?" I was then informed about what I knew already - "Use XML." sigh... It has finally caught up with me, I will just learn how to do this then. I am not a big fan of XML because it is bloated and not very friendly to work with regardless of context. This is why I have avoided using it for so long. Traversing XML is always a big to do as well - no one does it exactly the same way, so why should it follow any basic methodology in tSQL? It doesn't... Looking on the bright side, now I know how to do it - however I will not claim to be an expert on the subject, but this article will show you some basics to at least get started.

I don't have time to read all of this crap!

Trust me I understand that feeling and I feel it all too often when trying to solve a problem under the gun, so I will say that if you are trying to pass an object to a stored procedure, then view these sections:
  1. Passing an Object from Code to a Stored Procedure
  2. Consuming the XML

Using XML with tSQL

When ever you look this subject up, everyone shows you the equivalent of a hello world example - which is all fine and dandy until you need to do some real work. I will show you the Micky Mouse example then move on to a REAL and USEFUL example with some additional stuff no one else has covered properly. All of the articles I have read have hinted at passing your objects as XML to a stored procedure - but no one has provided the exact mechanisms to get it done. Well I am changing that here and now.

Reading in a list of elements (The Micky Mouse Example)

This example is working with a list of scalars.


DECLARE @XMLlist XML = '<list><i>2</i><i>4</i><i>6</i><i>8</i><i>10</i><i>15</i><i>17</i><i>21</i></list>'

SELECT @XMLList

SELECT r.c.value('.','int') AS IDs
FROM @XMLList.nodes('/list/i') AS r ( c )

As you can see there is a list of integers being set into an XML parameter. Using the nodes method with it's XQuery string parameter; the XML is translated into a list of integers by reading each row and the column. Each column's content is converted to an integer using the value method. This is as basic as it gets, it is useful to play around with, but not useful if you want to pass a multidimensional object (object with multiple properties) aka a matrix. This example is only showing how to convert a list of scalar values - which is definitely not the case most of the time.

Output in SSMS



Passing in an object (More realistic and less "hello worldy")

This example is working with a single matrix (object).


As I have pointed out in the previous example, that is just working with scalars. When dealing with objects you are dealing with a matrix or an object with multiple properties. Therefore this example shows you how to read in XML that represents a simple object.

-- Simple object example
DECLARE @xml XML = '<obj><a>John</a><b>Smith</b><c>10</c></obj>'

SELECT @xml

-- Method 1: Full path to each value
SELECT 
 @xml.value('(/obj/a)[1]','varchar(10)'),
 @xml.value('(/obj/b)[1]','varchar(10)'),
 @xml.value('(/obj/c)[1]','int')

-- Method 2: Selecting the base node
SELECT 
 r.c.value('(./a)[1]','varchar(10)'),
 r.c.value('(./b)[1]','varchar(10)'),
 r.c.value('(./c)[1]','int')
FROM @xml.nodes('/obj') AS r(c)

-- Method 3: Individual hits
SELECT r.c.value('.','varchar(10)') FROM @xml.nodes('/obj/a') AS r(c)
SELECT r.c.value('.','varchar(10)') FROM @xml.nodes('/obj/b') AS r(c)
SELECT r.c.value('.','int') FROM @xml.nodes('/obj/c') AS r(c)

The above is demonstrating three different ways to accomplish the same end result (for the most part). Let's explore each of these methods (the SSMS output is below this list):

  1. Full path to each value - Notice there is no FROM clause, this is because the XQuery in the value method is pointing directly to a node using a full path. Notice that there is a "[1]" after value - this isn't a mistake, it is for selecting the first element found in "node a".
  2. Selecting the base node - Notice that there is a FROM clause this time. The reason for this is to declare the path being queried using the nodes method a single time instead of repeatedly declaring it in the value method. This can be more useful for longer paths. The syntax "r(c)" denotes row [r] and column [c] - you can use any letters you want, some people use x and y - I just think r and c makes more sense.
  3. Individual hits - this is the method being used for the Mickey Mouse example above, notice how inefficient this is with respect to traversing an object - this method requires three queries instead of one and it returns three virtual tables which isn't terribly useful for querying and I will further show why down below. This is why showing someone how to select a list of integers isn't an optimal way of explaining to them how to use XML with tSQL.
Output in SSMS



Passing an Object from Code to a Stored Procedure

Taking the examples shown above into consideration, you can easily perform inserts and updates in a stored procedure instead of having to declare every single property/parameter in the procedure's signature (which is tedious, annoying and pointless work which I never bothered).

Writing your XQueries

However before I move on to passing objects from code into stored procedures I am going to give you a query that will help you reduce the amount of time you are spending writing out XQuery strings:

select
 TABLE_NAME,
 xQueryPath = 'r.c.value(''(./' + column_name + ')[1]'',''' + data_type + '(' + CAST(ISNULL(Character_maximum_length, 0) as varchar(10)) + ')''),'
from YourDatabaseNameHere.information_schema.columns

This query will produced XQueries for each column in all of the tables in your database. The output is just starter code - you still need to manipulate it to make it work properly. This is better than writing it by hand.

The Hard Coding Dilemma and Trade Off at the Cost of Speed

Whenever people would argue with me about hard coding my insert and update queries in code as in-line SQL (assuming the absence of an ORM or Generated Service Layer), everyone always falls back to saying - "hey just use XML", but I would always throw into their face "Okay, so you want me to hard code XML instead? How is that better?" to which I always just get a broken and distant look. The point is you are going to be hard coding something UNLESS you use XML serialization or some other form of custom serialization - all of which inevitably use reflection.

Reflection will ALWAYS make things slower - however it is far more convenient, almost maintenance free and safer than hard coding XML. This convenience comes at an inconvenience though - the cost of a small performance decrease, but if the point here was to stop the taboo of hard coding SQL in code, then you cannot be a hypocrite and hard code XML instead - that's just pointless and stupid.

Code for Executing a Non-Query by passing a Target Object as XML

This is code that I wrote for passing a target object as XML to a target stored procedure regardless of what it does. As you have read ad nauseam by this point, the target object is serialized into XML using Magic unless you read the next section to understand how this is done.

//Using blocks required
using System.IO;
using System.Xml;
using System.Xml.Serialization;

protected int ExecuteNonQuery<T>(T anyObject, string storedProcedureName)
{
 SQLText = ConstructXmlArguments(anyObject, storedProcedureName);

 //This method is not provided to you in this example - use a basic Execute Non Query method here
 return ExecuteNonQuery();
}

protected int ExecuteInsert<T>(T anyObject, string storedProcedureName)
{
 SQLText = ConstructXmlArguments(anyObject, storedProcedureName);

 //This method is not provided to you in this example - use a basic Execute Insert or Execute Scalar method here
 return ExecuteScalar().ConvertToInt32();
}

//This method is fully functional on its own and does not need to be modified
public static string SerializeToXmlString<T>(T anyObject)
{
 string strXML = string.Empty;

 XmlSerializer ser = new XmlSerializer(typeof(T));

 //Setting the namespace to nothing on purpose
 XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
 ns.Add(string.Empty, string.Empty);

 //Setting the rest of the xml writer settings to remove inconvenient parts for tSQL
 XmlWriterSettings settings = new XmlWriterSettings();
 settings.Indent = false;
 settings.NewLineHandling = NewLineHandling.None;
 settings.OmitXmlDeclaration = true;

 //Write everything into memory
 using (MemoryStream ms = new MemoryStream())
 {
  //Write the XML to memory
  using (XmlWriter xmlWriter = XmlWriter.Create(ms, settings))
  {
   ser.Serialize(xmlWriter, anyObject, ns);
  }

  //Rewind
  ms.Position = 0;

  //Read the stream into a string
  using (StreamReader sr = new StreamReader(ms))
  {
   strXML = sr.ReadToEnd();
  }
 }

 return strXML;
}

Please note that this is the partial implementation and that this code was taken from a DAL that I wrote. You need to retrofit it to work with your DAL that you intend on using it with, the above code is for basic idea and reference only.

SerializeToXmlString Method

This method is geared specifically to serializing an object into XML that is acceptable for use with tSQL. What I mean by this is that it does not include (List of Gotchas!):
  • The default XML Namespace - it is blank on purpose
  • Removed indents
  • Removed new lines
  • Removed the XML Document tag
If any of the above things were present in the XML that you were passing over to your stored procedure, it would break it during run time. Those things are not allowed period. Therefore when someone tells you, "Just use XML" - look at them and quiz them on what they are talking about - none of this stuff is obvious.

Precautions
  • As usual though, if you try to serialize an object that is not serializable, you will end up having a run time error. This is not out of the norm.
  • If someone removes or renames a class property, they have to be conscious enough to know that they must now go alter the accompanying Stored Procedures. This is a definite con when relying on a Serializer. Had the XML been hard coded, this would not matter at all. If the XML is hard coded then:
    • In the case of removing a property - you would not be able to compile the code unless you modified that XML as well - in which case you know you need to update the Stored Procedure(s)
    • In the case of renaming a property - you could rename the property and forget to update the XML without worrying about causing any run time errors. I am not saying this is proper, I am just saying it wouldn't cause a problem.
These are examples of invalid and valid XML for tSQL usage/consumption. Make sure to compare and contrast the two examples to see how they relate to the bullet points listed above. Each one of these details matters when trying to pass or utilize XML in tSQL.

Invalid XML for tSQL Usage (Gotcha!)
This is what the default output would be for a "Bob" object if you did not configure the serialization manually. None of this is okay for usage because the xml document tag shows up, the namspaces (xmlns:{xxx}) are listed and there are indentations (tab characters) and new lines (carriage return and line feed characters).

<?xml version="1.0" encoding="utf-16"?>
<Bob xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Name>Bob the 5th</Name>
  <MaryInstance>
    <IsMary>true</IsMary>
  </MaryInstance>
  <MaryInstance2>
    <IsMary>false</IsMary>
  </MaryInstance2>
  <Age>62</Age>
</Bob>

Valid XML for tSQL Usage
The main difference to note here is how this example adheres to every bullet point in the list above. The XML is all on one line and it is very simplistic.
<Bob><Name>Bob the 5th</Name><MaryInstance><IsMary>true</IsMary></MaryInstance><MaryInstance2><IsMary>false</IsMary></MaryInstance2><Age>62</Age></Bob>

Consuming the XML

Using a similar example from the previous section, we are going to serialize a "Bob" object and insert it into the Bob table. The Bob class was modified to be flat like a rudimentary DTO with some basic mundane properties. Pay close attention to the DateTime property and how it has to be dealt with, more explanation to follow.

The Bob class (Bob.cs if you will)
This is a very flat class purely for demonstration purposes. An instance of this class is serialized into XML.

public class Bob
{
 public string Name { get; set; }
 
 public string Occupation { get; set; }
 
 public int Age { get; set; }
 
 public string Mood { get; set; }
 
 public DateTime CreatedOn { get; set; }
}

Bob Object Serialized
I am showing the Bob object formatted here for readability, no one wants to read this crap on a straight line - just keep in mind, during run time this will be on a straight line for all of the reason listed above. Note the format of the CreatedOn node value. This is passed to the stored procedure.

<Bob>
 <Name>Bob the 5th</Name>
 <Occupation>Fast Food Engineer</Occupation>
 <Age>40</Age>
 <Sex>None Yet</Sex>
 <Mood>Resentful of others</Mood>
 <CreatedOn>2014-11-18T22:30:42.4817743-04:00</CreatedOn>
</Bob>

Bob Insert Stored Procedure
As you can see in the the tSQL below, the Bob object is being inserted directly into the Bob table, there are no other intermediary transfer parameters. This would be different with an Update Statement I would imagine. Note how the CreatedOn node is being handled.

CREATE PROCEDURE [dbo].[Bob_NewBob]
 @parameters XML
AS
BEGIN
 SET NOCOUNT ON;

    INSERT INTO [dbo].[Bob]
           ([name]
           ,[occupation]
           ,[age]
           ,[sex]
           ,[mood]
           ,[createdOn])
     SELECT 
  r.c.value('(./Name)[1]','varchar(100)'),
  r.c.value('(./Occupation)[1]','varchar(100)'),
  r.c.value('(./Age)[1]','int'),
  r.c.value('(./Sex)[1]','varchar(20)'),
  r.c.value('(./Mood)[1]','varchar(255)'),
  CAST(r.c.value('(./CreatedOn)[1]','varchar(35)') as datetime2)
 FROM @parameters.nodes('/Bob') AS r(c)
END

XML to tSQL Gotchas!

There are two very important things to note when working with XML and tSQL (that I have found to be very obscure so far):
  • Casing matters! When you are writing your XQueries - the node names are case sensitive. Therefore if your class Properties are beginning with an upper case letter, then your node names are going to use an upper case letter etc... Therefore if you run into a weird parsing error where a column cannot be found, it is because of the casing more than likely.
  • There is no simple way to cast an XML serialized DateTime node value into a datetime2 sql parameter - you must do as I have shown above which is a double cast basically. First convert to a varchar(35), then cast that string to a datetime2. I have looked into changing how the XML serializes and formats the DateTime objects, but it is far more trouble than it is worth - this is a much easier approach.

Conclusion

Yeah... so "Just use XML" huh? So simple... I appreciate the journey that I went through learning about all of the trouble of just using XML, but it does feel like a bit much just to avoid using in-line SQL. The good news is once you are finished writing the code, it isn't a big deal anymore, but this took a few hours for me to understand (hence the article as documentation). I cannot say this was simple. Lastly there is definitely the trade off of using an XML Serializer.

I honestly don't think in-line SQL is too bad and I think it get's a bad wrap because people just don't implement it properly. It just depends on it's usage scenario. For example I think SELECT statements should always be stored procedures regardless of their simplicity, but if this is a SELECT statement that is being constructed on the fly using Dynamic SQL in a stored procedure - I would rather do that in code depending on the complexity because it is much easier to work with in code, than it is to do in a stored procedure - not to mention using business logic in stored procedures is a worse taboo than in-line SQL is. The reason for this is simple, now you have to maintain logic in two places instead of one - this is NEVER okay and leads to bad programming practices and spaghetti code.

Dynamic SQL in Stored Procedures is abused horribly by a lot of developers. I have seen some very bad practices in Stored Procedures which mostly involve a lot of copying and pasting of queries with minor differences in the where clause or joining on the dogs table if the cat bit is set to zero or null. My point is Dynamic SQL in Stored Procedures is more of a determent than a help if you have the wrong people working with it. Anytime someone introduces duplication, they are writing bad code aka Spaghetti code. If you are having a very hard time representing something in SQL you should probably consider NOT writing it in SQL and just do it in code instead... you know where the business logic lives.

People like to argue that you should use parameterized queries in code instead of in-line SQL and I strongly disagree. The only attractiveness that parameterized queries have is protection against SQL Injection - other than that - the amount of overhead of writing the code for this is ridiculous and I would rather not. This is not me being lazy, this is me saying I would rather just use a stored procedure if I am going to go through the trouble of writing one in code, which is basically what parameterized queries are. Other than protection against SQL Injection, the only usefulness I have found in parameterized queries is the insert and update operations for blobs (byte arrays). That's it. Frankly that is all I will use it for - other than that it is inflexible and overkill to use.

If your code is not written in a way where you are not protecting yourself from SQL Injection attacks - then you deserve to be hacked and lose your data. Stop being lazy and do it right.


Monday, November 17, 2014

Dyslexic Apps Updated

I have updated the Dyslexic Apps page with all of the software that I have available on google code. Compiled version of the executables and/or installers are available via drop box.

Check it out here