Publishing binary assets from SDL Tridion
In this blog post, I’m going to talk about publishing binary files from SDL Tridion. In particular, I’m going to go over some strange behaviour that you might see if you’re not careful with your templating, and why this happens, and what you can do to prevent it.
When I say binary files, of course, that can cover a variety of things. Obviously, image files such as JPGs, GIFs and PNGs, but also Word documents and PDFs and the various other things that are needed to make a web site, such as videos, fonts, etc, etc. There’s such a variety, and they are used for different purposes and processed in different ways, but from a Tridion point of view, they are all the same. Tridion knows how to output two kinds of files to your web application server. It’s either text or it’s binary. (And of course, there’s metadata, but that’s a different story.) If it’s text, then a stream of bytes is interpreted via some encoding standard to represent characters in some alphabet or other. In this case, your templates just need to take care of outputting characters one after the other and it’ll work just fine. Tridion will transform your text into whatever encoding is wanted on your website.
With binaries, on the other hand, Tridion’s only job is to allow a file to be uploaded into the Content Manager, and then to ensure that the exact same sequence of bytes is deployed as a file on the file system of your target server. How hard can it be, then – eh? Well obviously, there’s more to it than that. On the Content Delivery side, Tridion manages not just deployment, but undeployment, but these actions are not explicit for a binary. Publishing and unpublishing binaries always happens as a side-effect. Say for example that you have a web page with a link in it. So your final output in the web page that’s sent to the browser might have something like:
<img src="/images/banana.png" alt="Picture of a banana" />
or 5 of your css files might have something like:
In either of these cases, the item you are actually publishing is a text resource such as a web page (or a css file). Obviously, for these to work correctly, the binaries also have to be in place, so Tridion needs to deploy them along with your page. I’ll get to the templating aspects of this in a bit, but for now, it’s enough to know that each of your binaries is represented by a multimedia component on the Content Manager, and that on the Content Delivery side, the deployed files are managed to ensure that their relationships with the original component are correct. The most basic of these relationships is that every binary file on the Content Delivery system is based on one specific component on the Content Manager. When text and binary resources are deployed, Tridion also publishes metadata to allow the Content Delivery system to do its work. So Content Delivery “knows” which pages use a given binary, and that binary only needs to be present once, not for each page. The converse of this is that when pages are undeployed, Content Delivery maintains a reference count, and if a binary is no longer needed, it too will be undeployed.
What can go wrong?
All this is good. You want this to happen, and Tridion R&D have gone to a great deal of trouble to make it just so. “How hard can it be?” I asked. Well here’s the thing. You need to understand the rules that are enforced by Content Delivery when you do your templating. If you allow two distinct multimedia components to pubish to the same location, then Tridion will refuse to deploy the second item. Instead, you’ll get some cryptic message like:
Committing Deployment Failed Phase: Deployment Prepare Commit Phase failed. Unable to prepare transaction: tcm:0-4445-66560
If you dig into the deployer log, you will then find something like this:
2015-01-07 15:02:25,088 WARN PreCommitPhase - Failed to Prepare: tcm:0-4445-66560 error: Attempting to deploy a binary 8029 to a location where a different binary is already stored Existing binary: 8030:
(Note: this output comes from an SDL Tridion 2013 SP1 system. Earlier versions gave you the more detailed message directly without having to dig in to the logs. I’ve been told that Tridion R&D see this as a bug, so presumably future versions will also give the full information.)
“But shouldn’t Tridion also enforce the rules on the Content Manager?”, I hear you ask. And that’s a very good question, so let’s just take a sidetrack through history to find out why things are as they are.
How did it all begin?
Are there any old people here? So for example, do any of you know who Paul McCartney is ? 🙂 Or more to the point – can you remember the beginning of the World Wide Web? Truth be told, most of us can’t, even if we were around at the time, because in those days, we weren’t online all the time. I don’t think I even saw a web browser until about 1994, and then only because I knew someone in the business. The browser was Netscape, and even in those early days, you could put pictures in your web page. But that was as far as binary support went. I’m fairly sure that Netscape didn’t let you use your own fonts and I don’t recall it having document downloads. In those days, it was normal for your site to have an /images directory where all your images went. Nobody published anything out of a web CMS, because web CMSes didn’t exist. You just had a copy of the files on disk, and you FTP’d them to the server, so a complicated directory structure wasn’t helpful. A few years later (let’s say 1999/2000), when web CMSes came along, they followed this familiar model, and everyone was happy. All the early implementations of Tridion were based on the idea that you had an images directory configured in your publication properties, and all your binaries went there. Tridion took care of managing the relationship with the component quite simply, by appending the TCM URI of the component to the file name. This worked fine. Your images would all have names like /Images/banana_tcm127-486619.jpg and when you used the PublishBinary function in the old TOM API, it would make sure the images ended up there, and hand you back a location to refer to it in your template.
And everyone was still happy… sort of. The trouble was that people started using multimedia components for other things in their web sites, such as documents. The basic mechanism of an images directory wasn’t really great for this. It meant your documents would have to have a URL that pointed to your images directory and had an ugly TCM ID in it. Yet worse, (if you didn’t know how to set the content-disposition header on your document downloads) when people downloaded your documents to disk, the default file name would also have the TCM ID, which was pretty ugly. (Of course, we didn’t have HTML5.) So… over quite a long time, various customers sent feedback to Tridion that they wanted more control over this behaviour. When Tridion introduced compound templating in R5.3 (late 2007) they had to re-implement the binary publishing mechanism anyway, and the new version came with features that were flexible enough to meet everyone’s needs.
So what was new? Well firstly – they stopped automatically appending the TCM ID to the names of binary files at publish time. At the same time, they also introduced the possibility of specifying which structure group you wanted to publish to. If you didn’t specify a structure group, then the “Images Path” of the publication would be used as before. But with this increased power, there came increased responsibility. Tridion no longer took care of ensuring that the paths of deployed files were unique; that was the responsibility of the implementer writing the templates. This meant you had more or less two approaches. Either you could continue to use one directory, and ensure that the file names were unique, or you could make use several directories, and then you only had to keep the names unique within each directory. Either way was fine, as long as there were never two components that would be deployed to the same location. Many implementations use both approaches. For example, the images used in the web design (icons and logos and sprites and what have you) are often kept in one folder on the CMS, and published all together to the same structure group using the component name for the file name. That’s handy for assets that are controlled by the development team, but for images and documents that are part of the content, we usually fall back to the tried and tested approach of adding the component id to the file name. In the next section I’m going to go over the techniques for doing this.
Although I just mentioned an exception to the rule, mostly what you want is for the content authors to be able to add multimedia components and use them everywhere without having to worry about collisions. You could imagine techniques where every structure group has its own location for images, but you run out of options pretty fast if you are designing a system that has to survive real life. So we do what Tridion used to do automatically and add the component id to the file name. As I said, using the whole TCM URI for this is pretty ugly, and fortunately it’s not necessary. Just the item id of the component is enough – that’s just a number, which doesn’t look too bad.
In a Tridion template, if you want a binary to be deployed, you call the AddBinary method. When you do this, the binary is added to the deployment package with sufficient metadata to enable the deployer to place the file where it’s supposed to go. The following code snippet shows how in a template building block you can read the name of the original file from the multimedia component and use it to construct a new file name to pass to AddBinary. This kind of technique is typically used when you’re processing a Multimedia Link field in your component or perhaps in metadata.
You can see here how we’re building up a new file name with the component ID added for uniqueness. If you have several templates, you’ll want to ensure that they do this consistently.
Images in Rich Text Format areas
It’s all very well knowing how to call AddBinary with a customised file name, but in one of the commonest scenarios for images, we aren’t calling AddBinary ourselves. Typically, if you are using a Rich Text format area, you’ll be using PublishBinariesInPackage for this, so it might not be obvious how to modify the file name. If you look in the default Tridion template building blocks you will find one called ExtractBinariesFromHtml. Although you can use this directly yourself in your templating pipeline, often you won’t need to, as it is invoked by the Dreamweaver mediator when you are processing a rich text field. (If you use the Razor mediator, this also works out-of-the-box, as it relies on the same Tridion code.) ExtractBinariesFromHtml processes your content looking for binaries. When it finds them, it adds them to the package. At the same time it adds the “Filename” property to the item. (Package Item Properties are part of the documented API, but unfortunately not visible via the Template Builder. If you want to investigate further, check out the LogPropertiesOfItems recipe in the Tridion Cookbook. You may also wish to vote up this idea.)
The package item properties are used by PublishBinariesInPackage to construct the filename, in a similar way to the code sample above. One of the properties it uses is “ItemPropertyFileNameSuffix”, so what we need to do is put a suitable value in this property before PublishBinariesInPackage gets invoked. Here’s some code that does just that:
Obviously, this needs to be in your pipeline somewhere after your Output template, and before PublishBinariesInPackage. As you can see, if you want to match other kinds of binaries than images, you can pass in a parameter to specify what you want. With this building block in place, the images published from an RTF will also have the component id in their filename.
Remember when I said that for binaries, all Tridion has to do is “ensure that the exact same sequence of bytes is deployed as a file on the file system of your target server”? That’s not always true. Sometimes you want to have several versions of a binary, where the versions are generated at publish time in your templating. The canonical example is the one where you resize images. The master might be a high-resolution version, for use when your visitor pops it out in a lightbox feature on your web page. Then you might have a normal size version and a thumbnail. But it could be anything: black and white version, sepia wash… whatever. Obviously, these different renderings are related to each other, so Tridion allows you to specify a “variant ID” parameter when you call AddBinary. So you could call AddBinary several times in the same template: one for each variant, or you might have different templates responsible for different variants. On the Content Delivery side, you can retrieve the location of the relevant variant file by passing the variant ID as a parameter to your BinaryLink. Of course, you also have to take care of ensuring that two variants of the same binary don’t end up at the same location. It can be a useful feature, although in recent times it’s also become common to generate different versions of binaries on demand using an image processing library in your web application.
As part of Tridion’s modular templating framework, these APIs have been with us since Release 5.3, and this is not the first blog post to cover the subject. The techniques are not difficult to grasp, yet it remains a common area for mistakes. I’m sure I’ve made a few myself in this area. The problem lies in awareness and planning. First and foremost, programmers working with modular templating need to be aware of the issues, and the possible technical approaches. Once you have that, there’s a good chance that you’ll take a moment early in your project to decide how you’re going to organise your binaries. Any good technical design for a Tridion system will specify the chosen approach. It might only take a couple of lines but you’ll thank yourself later.