Selecting the entire Database as XML String – 2

2012-02-01 - General, XML

About two weeks ago I wrote about how to get the content of the entire database into an XML document. Today Daniel commented, that he would like the XML to be in a compacter format. In this post I am going to explain how to get there.

While WordPress™ conveniently ate the example XML Daniel tried to post, I will assume for this article that he was going for this format:

[xml] <tables>
<table name="[dbo].[T1366]">
<row id="1" c1="1366" c2="2732" c3="4098" c4="5464" c5="6830" c6="8196" />
<row id="2" c1="1366" c2="2732" c3="4098" c4="5464" c5="6830" c6="8196" />
</table>
<table name="[dbo].[T127]">
<row id="1" c1="127" c2="254" c3="381" c4="508" c5="635" c6="762" c7="889" />
<row id="2" c1="127" c2="254" c3="381" c4="508" c5="635" c6="762" c7="889" />
</table>
</tables>
[/xml]

Instead of all column values being sub-nodes of their <row> nodes they are now attributes of an empty <row> node. Also the <data> node between <table> and <row> is now missing, as it was kind of superfluous. And while I was changing the code anyway, I went ahead and added the previously missing root node as <tables>.

So how did I get there?

First let us look at the missing root node, which turns out to be the simplest of all the changes. When you select rows for a table specifying the FOR XML PATH('NodeName') directive, every row is going to be represented as a node with the passed in name. Those nodes are just hanging together without an enclosing root node. To make the output a valid XML document you can just specify the ROOT clause next to the PATH clause like this:

[sql] SELECT *
FROM dbo.Table
FOR XML PATH('rowNodeName'),ROOT('rootNodeName'),TYPE
[/sql]

The TYPE directive tells SQL Server to return the XML as a value with the XML type. If you leave it out the XML gets returned as a string, which can create problems if you want to continue to work with it in T-SQL.

The next step was, to get rid of the <data> node. When you specify an XML value in a select statement that is in turn using FOR XML PATH, that value is wrapped in a node with the value's column name. The generated SQL in my original solution looked like this:

[sql] SELECT * FROM(
SELECT '[dbo].[t1]' [@name],(SELECT * FROM dbo.t1 FOR XML PATH('row'),TYPE) data
UNION ALL
SELECT '[dbo].[t2]' [@name],(SELECT * FROM dbo.t2 FOR XML PATH('row'),TYPE) data
)X FOR XML PATH('table'),TYPE;
[/sql]

The data column in the UNION ALL query contains XML values, so the outer SELECT * ... FOR XML PATH('table') is going to wrap each one into a <data> node like this:

[xml] <table name="[dbo].[t1]">
<data>
<row>
<id>1</id>
<c1>1</c1>
</row>
<row>
<id>2</id>
<c1>1</c1>
</row>
</data>
</table>
<table name="[dbo].[t2]">
<data>
<row>
<id>1</id>
<c1>2</c1>
<c2>4</c2>
</row>
<row>
<id>2</id>
<c1>2</c1>
<c2>4</c2>
</row>
</data>
</table>
[/xml]

To remove the additional node we can use the fact that SQL Sever will inline the value of a column that does not have a name. So we need to select the data column without specifying its name. To do that we will use a trick:

[sql] SELECT [@name],(SELECT data) FROM(
SELECT '[dbo].[t1]' [@name],(SELECT * FROM dbo.t1 FOR XML PATH('row'),TYPE) data
UNION ALL
SELECT '[dbo].[t2]' [@name],(SELECT * FROM dbo.t2 FOR XML PATH('row'),TYPE) data
)X FOR XML PATH('table'),TYPE;
[/sql]

By wrapping the data column in its own SELECT it loses its name property for the outer SELECT statement. The output it generates now looks like this:

[sql] <table name="[dbo].[t1]">
<row>
<id>1</id>
<c1>1</c1>
</row>
<row>
<id>2</id>
<c1>1</c1>
</row>
</table>
<table name="[dbo].[t2]">
<row>
<id>1</id>
<c1>2</c1>
<c2>4</c2>
</row>
<row>
<id>2</id>
<c1>2</c1>
<c2>4</c2>
</row>
</table>
[/sql]

The last thing we need to accomplish is to make the column values attributes of each <row> node instead of sub-nodes. If you specify a column name with a leading "@" sign, FOR XML PATH will make that value an attribute with that name (without the "@"):

[sql] SELECT id AS [@id] FROM dbo.t1 FOR XML PATH('row'),TYPE;
[/sql]

The output of above statement looks like this:

[xml] <row id="1" />
<row id="2" />
[/xml]

So instead of a SELECT * FROM for each table, we need to list all columns in the format shown above. To get this specially formatted column list we are going to use XML concatenation one more time:

[sql] SELECT STUFF((SELECT ','+QUOTENAME(name)+' AS '+QUOTENAME('@'+name)
FROM sys.columns c
WHERE c.object_id = OBJECT_ID('dbo.t2')
ORDER BY column_id
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'');
[/sql]

This produces the desired list:

[id] AS [@id],[c1] AS [@c1],[c2] AS [@c2]

With that we have all the parts collected to create the new "compact" solution. The last step is to put it all together:

[sql] DECLARE @cmd NVARCHAR(MAX) ;
SET @cmd = 'SELECT [@name],(SELECT data) FROM('
+ STUFF(
(SELECT ' UNION ALL SELECT ''' + QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
+ '.' + QUOTENAME(t.name) + ''' [@name],' + '(SELECT '+c.ColList+' FROM '
+ QUOTENAME(OBJECT_SCHEMA_NAME(object_id)) + '.' + QUOTENAME(t.name)
+ ' FOR XML PATH(''row''),TYPE) data'
FROM sys.tables t
CROSS APPLY (SELECT STUFF((SELECT ','+QUOTENAME(name)+' AS '+QUOTENAME('@'+name)
FROM sys.columns c
WHERE c.object_id = t.object_id
ORDER BY column_id
FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)'),1,1,'')
)c(ColList)
FOR XML PATH('') ,
TYPE).value('.', 'NVARCHAR(MAX)'), 1, 11, '')
+ ')X FOR XML PATH(''table''),ROOT(''tables''),TYPE;' ;

EXEC(@cmd) ;
[/sql]

This produces the desired output that was shown all the way at the beginning of this article. The size of the resulting XML document for my example database went from about 1.4 MB for the original solution to now about 0.5 MB – a significant improvement.

Categories: General, XML

4 Responses to Selecting the entire Database as XML String – 2

Leave a Reply