About two weeks ago I wrote about how to get the content of the entire database into an XML document. Today Daniel commented, that he would like the XML to be in a compacter format. In this post I am going to explain how to get there.
While WordPress™ conveniently ate the example XML Daniel tried to post, I will assume for this article that he was going for this format:
Instead of all column values being sub-nodes of their <row> nodes they are now attributes of an empty <row> node. Also the <data> node between <table> and <row> is now missing, as it was kind of superfluous. And while I was changing the code anyway, I went ahead and added the previously missing root node as <tables>.
So how did I get there?
First let us look at the missing root node, which turns out to be the simplest of all the changes. When you select rows for a table specifying the FOR XML PATH('NodeName') directive, every row is going to be represented as a node with the passed in name. Those nodes are just hanging together without an enclosing root node. To make the output a valid XML document you can just specify the ROOT clause next to the PATH clause like this:
The TYPE directive tells SQL Server to return the XML as a value with the XML type. If you leave it out the XML gets returned as a string, which can create problems if you want to continue to work with it in T-SQL.
The next step was, to get rid of the <data> node. When you specify an XML value in a select statement that is in turn using FOR XML PATH, that value is wrapped in a node with the value's column name. The generated SQL in my original solution looked like this:
The data column in the UNION ALL query contains XML values, so the outer SELECT * ... FOR XML PATH('table') is going to wrap each one into a <data> node like this:
To remove the additional node we can use the fact that SQL Sever will inline the value of a column that does not have a name. So we need to select the data column without specifying its name. To do that we will use a trick:
By wrapping the data column in its own SELECT it loses its name property for the outer SELECT statement. The output it generates now looks like this:
The last thing we need to accomplish is to make the column values attributes of each <row> node instead of sub-nodes. If you specify a column name with a leading "@" sign, FOR XML PATH will make that value an attribute with that name (without the "@"):
The output of above statement looks like this:
So instead of a SELECT * FROM for each table, we need to list all columns in the format shown above. To get this specially formatted column list we are going to use XML concatenation one more time:
This produces the desired list:
[id] AS [@id],[c1] AS [@c1],[c2] AS [@c2]
With that we have all the parts collected to create the new "compact" solution. The last step is to put it all together:
EXEC(@cmd) ;
[/sql]
This produces the desired output that was shown all the way at the beginning of this article. The size of the resulting XML document for my example database went from about 1.4 MB for the original solution to now about 0.5 MB – a significant improvement.
4 Responses to Selecting the entire Database as XML String – 2