Checking your DXA/DD4T JSON in the SDL Web broker database.

Recently, I’ve spent some time looking at both the Dynamic Delivery for Tridion (DD4T) framework, and SDL’s own Digital Experience Accelerator (DXA), which started life as a reference implementation for a modern “SDL Web”-based web  application and makes use of DD4T.

In general, DD4T-based web applications get all their content-managed data from the broker database. The DD4T framework makes use of SDL Web’s content delivery APIs to access this data and present it in a form that’s directly usable from an MVC web application. In many ways, it’s great that the data isn’t deployed to the file system any more, but it does also introduce some challenges. When publishing data, we often transform it more than once, and having access to the partially transformed data is very useful when you’re trying to figure out why and how something is working as it is. Whether it’s data or code, there will be moments when checking it half-way is a big help.

So the data isn’t on the file system any more. Fine, you may think – it’s in the database, so we can just look there. Sometimes, it’s that simple, but sometimes you still have a couple of extra steps to go through, and that’s what this blog post is about. In DXA, configuration settings and other resources are often published as JSON. In the content manager, there are schemas and components that allow you to enter your settings data in text fields, and a mechanism whereby you can easily publish everything to the broker. In the broker database, the binary content is stored as an NVARCHAR, but of course it’s just a buffer full of bytes. Along the way to getting stored in the database, SDL Web’s “native” storage format of Unicode has been transformed to UTF-8. An NVARCHAR field expects Unicode, so if instead you treat it as a raw buffer of bytes and throw UTF-8 in there, it won’t be readable using the standard tooling, such as SQL Server Management Studio. It looks like Chinese to me, although to be honest, I probably can’t distinguish between Chinese characters and various other kinds of logograms.

What’s happening is that for characters in the ASCII range, which most of the configuration JSON will be, UTF-8 stores it as a single byte. When we read it out of the NVARCHAR field, we’re expecting Unicode, which uses two bytes per character, so it reads through the buffer in pairs, and for each pair it displays a character. It just so happens that we end up in the part of Unicode that represents those Chinese-looking characters, so that’s what we see.

So – what’s to be done about it? It’s utterly unreadable in that form, so we need to transform the data back to characters via the UTF-8 encoding. My own preference for this kind of thing is to use the Windows Powershell, as you can see in the example below, but I’m just using standard .NET libraries, so you could easily translate this to work with the tooling of your choice.

As you can see, I’m just querying the broker directly, so you can use this script simply by modifying the publication id variable and the various connection string settings. It’s intended as much as an example as anything else – the same technique should also work for other kinds of data that are stored the same way. You can see that I’m checking the TYPE field of the BINARY_VARIANT table to filter for items which are ‘application/json’. You could remove this restriction and do more processing in the script, if that made more sense for you. As it is, I’m relying on the output being JSON, which allows me to round-trip it through the relevant ConvertTo and ConvertFrom functions in Powershell to get formatted output, like this…

JSON binary data for publication 20
===================================
Dumping JSON content for Binary: "277", Variant "version"
	 Path: 	/version.json
{
    "version":  "v1.4"
}
Dumping JSON content for Binary: "281", Variant "config"
	 Path: 	/system/config/core.json
{
    "name":  "environment",
    "settings":  "Tridion.ContentManager.ContentManagement.Fields.EmbeddedSchemaField",
    "cmsurl":  "http://WEB8:81",
    "mapsApiKey":  "xxx",
    "dataFormats":  "json, rss, atom",
    "culture":  "en-US",
    "language":  "English"
}
Dumping JSON content for Binary: "281", Variant "config-bootstrap"
	 Path: 	/system/config/_all.json
{
    "defaultLocalization":  true,
    "staging":  true,
    "mediaRoot":  "/media/",
    "siteLocalizations":  [
                              {
                                  "Id":  "20",
                                  "Path":  "/",
                                  "Language":  "English",
                                  "IsMaster":  true
                              }
                          ],
    "files":  [
                  "/system/config/core.json",
                  "/system/config/core.schemas.json",
                  "/system/config/core.taxonomies.json"
              ]
}
Dumping JSON content for Binary: "281", Variant "includes"
	 Path: 	/system/mappings/includes.json
{
    "335":  [
                "system/include/header",
                "system/include/footer",
                "system/include/content-tools",
                "system/include/left-navigation"
            ],
    "289":  [
                "system/include/header",
                "system/include/footer",
                "system/include/content-tools"
            ],
    "334":  [
                "system/include/header",
                "system/include/footer"
            ],
    "246":  [

            ],
    "348":  [

            ],
    "329":  [
                "system/include/header",
                "system/include/footer"
            ]
}

etc, etc.

And now, with a bit of luck, I’ll be able to figure out what’s going on. I hope you find the script useful too. I’m not even going to explain why I’ve used an ASCII stream reader, even though it is rather odd.