T-SQL Tuesday #22 – Data Presentation – XML Concatenation

2011-09-13 - General, T-SQL Tuesday, XML

T-SQL Tuesday

It is T-SQL Tuesday again, the 22nd incarnation and this time the topic is the presentation of data, hosted by Robert Pearl (Blog|Twitter).

Every so often you run into the problem of having to display a list of items to the user. This could be a list of item numbers or a list of names.

Assume for example if you are trying to write a view that displays information about all the indexes in the database.
You want this view to return one row per index. You also want to include the columns of the index. To make this happen you need to concatenate the information for the columns into one string.

This post is going to look at how to implement string concatenation in T-SQL.

In many other database management systems you have access to a function GROUP_CONCAT() that provides just this functionality. In T-SQL it is not that easy yet. While Denali offers a new CONCAT() function, a GROUP_CONCAT() is not available in SQL Server up to version Denali CTP3.

XML to the rescue

There are several ways to go about this problem. For example you could write a UDF or you could rely on undocumented behavior1.
Most of those solutions seem complicated and also do not really address a real concatenation in a GROUP BY context.

There is one solution that I have found to be the easiest to use: FOR XML.

Using the FOR XML clause, SQL Server allows the output of any select to be a single XML Document:

[sql] SELECT name
FROM master.sys.tables
FOR XML PATH('row');
[/sql]

returns something like

[xml] <row>
<name>spt_fallback_db</name>
</row>
<row>
<name>spt_fallback_dev</name>
</row>
<row>
<name>spt_fallback_usg</name>
</row>
<row>
<name>spt_monitor</name>
</row>
<row>
<name>spt_values</name>
</row>
<row>
<name>MSreplication_options</name>
</row>
[/xml]

Every row gets wrapped into a <row> tag and every column in a tag that matches the columns name.

To eliminate the row tag we can pass in an empty string to the PATH() function. The column tags are eliminated by naming the column [text()].

So our example would look like this:

[sql] SELECT name AS [[text()]] FROM master.sys.tables
FOR XML PATH('');
[/sql]

It's output looks like this:

[xml] spt_fallback_dbspt_fallback_devspt_fallback_usgspt_monitorspt_valuesMSreplication_options
[/xml]

That looks already very promising. You probably would like to add separators in between the strings. How you can do that we will get to a little later.
There are a few things that I would like to address first.

Special Characters

Let's look at characters that have a special meaning in XML like ‘<’:

[sql] SELECT '<&>' AS [[text()]] FOR XML PATH('');
[/sql]

Output:

[xml] &lt;&amp;&gt;
[/xml]

That is not what we wanted. To get those characters unescaped, we need to use one of the XML datatype functions:

[sql] SELECT (SELECT '<&>'
FOR XML PATH(''), TYPE
).value('.','NVARCHAR(MAX)');
[/sql]

This has the expected output: ‘<&>’. You might have noticed the missing column name. If the column name is not specified at all (explicitly or otherwise), it has the same effect as if the [text()] name is given. The additional TYPE keyword causes the XML to be returned as an XML datatype result instead of as an NVARCHAR(MAX) datatype result. The .value() function is defined on the XML datatype and takes two parameters. The first one is using the XPath syntax and describes what we want to get back. The '.' here means everything. The second one specifies the datatype to which we want the result to be converted to.

Illegal Characters

The next problem is not that easy to solve. The SQL Server XML datatype cannot handle specific characters at all:

[sql] SELECT CHAR(1)
FOR XML PATH('');
[/sql] This is handled and escaped correctly to &#x01; however, it falls apart once we add the necessary TYPE keyword back in:

[sql] SELECT CHAR(1)
FOR XML PATH(''), TYPE;
[/sql]
Msg 6841, Level 16, State 1, Line 1
FOR XML could not serialize the data for node 'NoName' because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.

There is a total of 2079 single character values that will cause this error. How to figure out which ones exactly will have to wait for another blog post. For now, if you think you might run ito any of them, you can use the following code to replace the most common of them with a question mark:

[sql] SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
N'the text that is to be cleaned here'
,NCHAR(1),N'?'),NCHAR(2),N'?'),NCHAR(3),N'?'),NCHAR(4),N'?'),NCHAR(5),N'?'),NCHAR(6),N'?'),NCHAR(7),N'?'),NCHAR(8),N'?'),NCHAR(11),N'?'),NCHAR(12),N'?'),NCHAR(14),N'?'),NCHAR(15),N'?'),NCHAR(16),N'?'),NCHAR(17),N'?'),NCHAR(18),N'?'),NCHAR(19),N'?'),NCHAR(20),N'?'),NCHAR(21),N'?'),NCHAR(22),N'?'),NCHAR(23),N'?'),NCHAR(24),N'?'),NCHAR(25),N'?'),NCHAR(26),N'?'),NCHAR(27),N'?'),NCHAR(28),N'?'),NCHAR(29),N'?'),NCHAR(30),N'?'),NCHAR(31),N'?');
[/sql]

This replacement has to happen before the text is converted to XML. The following examples are going to skip this step for readability.

Separators

Now we know how to concatenate string values together. To make our initial example of a select statement (or view) that returns one row per index and includes a comma separated column list work, two more steps are missing. First, the separator needs to be inserted between the values but not in front of the first one. This is not possible. But it is fairly simple to remove the additional separator after adding one in front of every item:

[sql] SELECT STUFF((SELECT ', ' + name
FROM master.sys.tables
FOR XML PATH('') ,TYPE
).value('.', 'NVARCHAR(MAX)'),
1, 2, ''
);
[/sql]

The STUFF function helps us with this problem. The first three parameters work just like the ones in the SUBSTRING function. But instead of returning that substring, the STUFF function replaces it with the value passed in as fourth parameter and then returns the complete string.

Putting it all together

The last step is to do this all inside a GROUP BY query. For that we have to use the CROSS APPLY functionality to create the column list for each group value in a correlated sub-query. Putting this all together looks like this:

[sql] SELECT t.name AS TblName ,
i.name AS IdxName ,
Columns.List AS Columns
FROM sys.tables t
JOIN sys.indexes i ON i.object_id = t.object_id
CROSS APPLY ( SELECT STUFF((SELECT ', ' + c.name
+ CASE WHEN ic.is_descending_key = 1
THEN ' DESC'
ELSE ''
END
FROM sys.index_columns ic
JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
ORDER BY ic.index_column_id
FOR XML PATH('') ,
TYPE
).value('.', 'NVARCHAR(MAX)')
, 1, 2, ''
)
) Columns ( List ) ;
[/sql]

This query returns the table name, the index name and a comma separated list of all columns for each index in the database. It also marks descending columns with the DESC keyword. What it does not do is separate out included columns. I'll leave that as an exercise for the reader.

1) One method that relies on undocumented behavior would be to continually add to a single variable within a single multirow select.

Categories: General, T-SQL Tuesday, XML

One Response to T-SQL Tuesday #22 – Data Presentation – XML Concatenation

  1. Pingback: T-SQL Tuesday #22 Round-Up - Data Presentation - Robert Pearl's Blog

Leave a Reply