Latest: Express.js Route Middleware

Content with Style

Web Technique

Processing the output buffer with XSLT

by Pascal Opitz on July 24 2005, 18:07

The output buffer

Most programmers dealing with PHP will have come across various PHP errors when trying to do a redirect after an echo or something similar.

The error usually looks like this:

Warning: Cannot add header information - headers already sent by
(output started at /directory/to/starting_file.php:XXX) 
    in /directory/to/calling_file.php on line XX

And for many of you that will have been the only application for the function ob_start which immediately fixes exactly these errors. But most ignore that ob_start is just one function of a whole toolkit of functions that are referred to as “Output Control Functions”, which provide a sophisticated toolkit for controlling and manipulating the output generated by PHP.

The callback function

The most powerful bit in this set of functions is definitely ob_start and it's optional parameter, the callback function. This callback function will be called when the output is finally thrown. Using this it's easy to generate output and, for example, clean it afterwards with HTML tidy, escape it, replace parts of it or replace all of it.

To show what I mean I'll provide a little class-based script as an example:

<?
class examplePage
{
  function examplePage()
  {
    ob_start(array($this,'parseOutput'));
    echo $this->getExampleXML();
  }

  function parseOutput()
  {
    $str = "<pre>" . htmlentities(ob_get_contents()) . 
        "</pre> is the XML string we get from getExampleXML()";
    return $str;
  }  
  

  function getExampleXML()
  {
    $str = "<root><test>Teststring</test></root>";
    return $str;
  }
}

$example = new examplePage();

?>

As you can see the content thrown by the echo is parsed afterwards by the parseOutput method, and stuff gets added and escaped in one go.

Layered applications

This alone is a very powerful tool that can be used in pretty much every application that generates output with PHP, but we can push it one step further.

We'll use XML as an intermediate application layer. The callback function will then process the whole output and render it through an XSL transformation.

<?
class examplePage
{
  function examplePage()
  {
    ob_start(array($this,'parseOutput'));
    echo $this->getExampleXML();
  }

  function parseOutput()
  {
    $this->xslt = xslt_create();

    $this->arguments['/_xml'] = ob_get_contents();
    $this->xmlDoc = 'arg:/_xml';


    $this->arguments['/_xsl'] = $this->getExampleXSL();
    $this->xslDoc = 'arg:/_xsl';

    return xslt_process($this->xslt, $this->xmlDoc, 
                    $this->xslDoc, NULL, $this->arguments);
  }  
  

  function getExampleXML()
  {
    $str = "<root><test>Teststring</test></root>";
    return $str;
  }


  function getExampleXSL()
  {
    $str = '<?xml version="1.0" encoding="utf-8"?>
      <xsl:stylesheet version="1.0" 
	  	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
      
      <xsl:template match="/">
        Test: <xsl:value-of select="//test" />
      </xsl:template> 
       
      </xsl:stylesheet>
    ';

    return $str;
  }
}

$example = new examplePage();

?>

And here we go - dynamic processing of the XML-based application output. This is obviously a raw example, and it needs integration in whatever framework you use, but hopefully you can see the power and flexibility of this technique.

Outlook

So what could this be useful for?

In my opinion this could give some web applications a whole new twist. One possibility for the techniques described would be to separate the presentation-related rendering process into the step after the output. While your application is built to render XML and throw that into the PHP output, a separate method, maybe even a separate class, could handle this output and transform it into the right format.

The advantages are immediately obvious. Output rendering would become a reusable module and without it the application would still output W3C-compliant XML code (if you did everything right, that is).

And again, this is just one possibility to use the callback function. Together with regular expressions or applications like Tidy you could ensure that the output of dynamic data is valid. This could be useful for all people who use variables to pass html-content into XSL templates.

Comments

  • Wow, that was quick! I’ll be seriously considering using this next time I have to tangle with template engines. Any idea what the processing overhead is for this?

    by Mike Stenhouse on July 24 2005, 18:08 - #

  • There are some rants that ob_start and ob_flush increase the performance. Also since you just transform the output buffer, so that’s one single transformation, there is not much overhead added by the XSLT stuff.

    by Pascal Opitz on July 25 2005, 01:28 - #

  • Oh, and if anyone wants to see a real-world use for this technique, go have a read through Mike Davidson’s great article on Making Your Site Mobile Friendly.

    by Mike Stenhouse on July 25 2005, 17:02 - #

  • Setting –

    ob_implicit_flush(true);

    Will automatically flush any echo you do in that script.

    Be aware there are lots of well documented bugs with browsers and flush – You sometimes have to pad the first echo with some blank data so that it forces the browser to display.

    http://uk2.php.net/flush

    by sermad on September 16 2005, 05:29 - #

  • I thanks sincerelly the author for this article. I was trying to find a way to get an output while running a “cruise” function for this class : http://www.contentwithstyle.co.uk/

    The goal is to provide an HTML interface while using class to be able to talk to with the JABBER’s protocol…

    by Toucouleur on February 12 2006, 14:25 - #

  • Just wanted to fill you in on how I’ve been using the output buffer. After building a massive MVC app in PHP5, this technique made me realise that the Controller part of MVC can rightfully be left to Apache. I have a simple abstract class that contains the XSL functionality you describe above, and, depending on the content of the Accept-Type header, selects an XSL to transform the content of the output buffer into a format acceptable to the user.

    That is, the method called by the output buffer is the View layer, Model is model (as normal), but the controller part is simply Apache doing what it normally does best. None of this silly single point of access nonsense!

    So, for example, a request to ben.com/res/products.php with an Accept-Type of “text/html” renders the output as HTML! I have even taken this a step further and used it to include images (using the Accept-Type to select the most appropriate output type), convert model data to XML, which is then XSL to SVG, which is then run through ImageMagick Convert.

    Apache should be the Controller.

    P.S. Love this article! More please!

    by Ben Davies on June 19 2006, 11:46 - #

  • @ Ben: Great idea … like that you always present exactly what is needed on the client, right?
    Thanks for the props btw, but right now my head is spinning with other things and I have no time at all to write anything. But I’ll be back on track soon if I can come up with anything smart enough :)

    by Pascal Opitz on June 19 2006, 18:21 - #

  • Exactly. The Action is Type Agnostic, and frees the details of the Action from the View, always ensuring that the client recieves the response in a manner that they can accept and view.

    Great for things like tables of data: request as HTML, get a HTML file with a table of data, or request a PNG and get a nice PNG graph of the same data. The exact same code is called to collect and process the data for both calls.

    Seriously, seperating the View layer using the output buffer is really really inspired! Dude, I’m suprised no one else has thought of this, it reall does give you a nice clean seperation.

    Looking forward to more :)

    by Ben Davies on June 20 2006, 06:33 - #

  • That is, the method called by the output buffer is the View layer, Model is model (as normal), but the controller part is simply Apache doing what it normally does best. None of this silly single point of access nonsense!

    by Rakshi on September 22 2006, 04:12 - #

Leave your comment

Comments are moderated.
Tags allowed: a, strong, em, code, ul, ol, li, q, blockquote, br, p

Advertisement
Advertisement