Latest: Genstatic, my first sip of coffee

Content with Style

Web Technique

XSLT and HTML 5 problems

by Pascal Opitz on December 9 2008, 17:47

Sometimes I'm really getting annoyed about the lack of control that XSLT sometimes gives about what target formats are supported and what output it generates

I'm trying to utilize a canvas tag, and excanvas. Now the problem that I'm having is that excanvas is hooking up to onreadystatechange, and therefore will be executed before the ondomready event that jQuery offers.

Which means I have to either do inline JS, and generate the canvas tags per JS, in order to create valid HTML 4, or I have to use the HTML 5 Doctype and can write the canvas tag in there just like that.

Problem is: XSLT 1.0 doesn't support the HTML 5 to generate a doctype, and the output encoding meta tag that it selfishly applies is not valid in HTML 5 either. Any ideas anyone?

UPDATE

Quite a fruitful discussion in the comments.
So for anyone else who's reading this: Bottom line is that, even with existing technology for XSLT, it is possible to create HTML 5.

The first issue we were discussing was the DTD. HTML 5 in its current draft caters the generation with XSLT by providing a fallback DTD:


<!DOCTYPE html PUBLIC "XSLT-compat">

The other issue was the meta tag with the charset attribute, that HTML 5 introduces in order to target the character set:


<meta charset="..." /> 

It is just not possible to generate exactly that with libXSL, because libXSL forcefully replaces it with an HTML 4 style meta tag.


<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This is not a problem though, as the old meta tag in its encoding state is a valid declaration of the character set, too.

Comments

  • You just bumped into the idiotic decision of making HTML5 *not* XML. If it were XML, no doctype would be needed, as the XML namespace would take care of identifying the actual standard the markup is on. My own home-baked solution to the problem is to post-process the XSLT result and insert the doctype "manually". It's not pretty.

    by Sérgio Carvalho on December 9 2008, 20:59 - #

  • yep. pretty upsetting. This was one of the options I had in mind, that could solve my problem:

    1. Post process the HTML 4 output, replace Doctype and meta tags with HTML 5 equivalents
    2. Use XHTML 1 and extend it somehow, i.e. namespace for canvas, or extend the DTD. Not sure on that one.
    3. Use inline JS to write the canvas tag, stay with HTML 4
    4. Use conditional comments and JS otherwise insert the canvas tags when not IE
    5. Hack excanvas to expose the method that makes it work onreadystatechange, call that onDomReady from some other JS
    6. Not care about the validation errors in either HTML 5 or HTML 4

    Now I gotta say, none of these options makes me happy, but I guess inline js is the most reasonable, and probably the most transparent and least hacky thing to do.

    It's a bit weird though, that XSLT parsers don't give you that control over the meta tag that defines the output encoding.

    I can 'hack' the doctype thing by using CDATA and xsl:text, and the doctype only comes up when it is defined in the output element. But forcefully insert the meta tag, even though the output node hasn't got encoding specified? That's just plain weird, and leaves wishes for some flag to be available in the processor, to turn that off.

    I had a look at the libXSL mailing lists though, and couldn't find anything that would let me do that. XSL pros: Is there a way to fine tune libXSLT?

    Finally, to whether or not it was stupid to make HTML 5 not an XML derivative: There is a serialization of HTML 5 that is XML, but is has to be served with the content type application/xhtml+xml. As with XHTML 1, some people consider it harmful, if it's served with content type text/html.

    by Pascal Opitz on December 9 2008, 21:17 - #

  • Pascal, I *think* the auto-meta thing is just a PHP XSLT thing? I don't recall seeing it in other XSLT processors I have tried (but it has also been a while since I used output=html which I assume you are doing?). Can the doctype be output using just the public doctype part of the xsl output element? XSLT 2 specs came out a while ago so aren't aware of html5, else xsl:output method="html5" would solve everything. XSLT 2.1 anyone?

    by Anup on December 10 2008, 11:09 - #

  • Anup, the forceful generation of the meta tag, that's something libXSL does. PHP merely acts as a proxy.

    
    bash$ xsltproc xslt/client/main.xsl my_data.xml | grep UTF
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

    Here is the output tag I used in XSL:

    <xsl:output omit-xml-declaration="yes" method="html" />

    by Pascal Opitz on December 10 2008, 12:55 - #

  • Ah, yes, that's right. Thanks for the clarification. Maybe open a ticket with the maintainers of libxsl...?

    by Anup on December 10 2008, 17:12 - #

  • Here's something that should help:
    First of all, I couldn't believe that the doctype actively breaks XML and SGML without an alternative, and, tadaaa, there is a way for exactly this issue:
    The doctype legacy string, only to be used in conjunction with xslt.

    With that one down, I tried the slightly amended xslt demo from php.net (I used <xsl:output doctype-public="XSLT-compat" method="html" /> as output tag), saved the output to a file and validated it, and, apparently, everyone's happy! 2 warnings (html 5 validation being experimental and the legal legacy doctype being automatically replaced), but definitely no deal-breaker.

    I can't quite see where you had an issue with the content-type. I've seen some blurb about only accepting the first content-type that comes up (which might be your servers http header, which of course I didn't have); but I don't know if the w3c validator has any beef with that.

    by Matthias Willerich on December 12 2008, 11:08 - #

  • Good research, Matthias. Might use that, but I can help that I still feel annoyed with the lack of control over that bloody meta tag.

    by Pascal Opitz on December 12 2008, 11:14 - #

  • A little bit more on this: It seems almost as if this is down to a misinterpretation on xsltproc side.
    I'm quoting mainly from this mailing list thread(ideally read the whole thing) from January 2007, where someone had exactly the same problem as you.

    The w3c recommendation says:
    "If there is a HEAD element, then the html output method should add a META element immediately after the start-tag of the HEAD element specifying the character encoding actually used."

    the Xsltproc you're using interprets the "should" as "must". I can't find any information that would hint that this has been changed since; only several discussions about it and one suggested fix I don't quite understand. I guess all that's left is to bring this up with the makers of this library, or patch your library yourself (er, maybe not).

    by Matthias Willerich on December 12 2008, 16:18 - #

  • Interesting. I hardly ever validate by URL or file upload, but usually use the copy paste thing of the W3C validator, and rely on it sniffing the doctype.

    <!DOCTYPE html PUBLIC "XSLT-compat"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>HTML 5 test</title> </head> <body> <h1>HTML 5 doctype and canvas test</h1> <div class="canvascontainer"><canvas></canvas></div> </body> </html>

    Above input by source copy paste is failing, when i don't select the HTML 5 doctype from the dropdown in advanced options. DOH!

    by Pascal Opitz on December 12 2008, 17:44 - #

  • Pascal: The "XSLT-compat" string was added for just this purpose. :-) The name might change in the future (maybe to just "legacy-compat" or something). Sérgio: HTML5 does support XML. You can use HTML5 either in text/html form or in full-on XML form.

    by Ian Hickson on December 12 2008, 21:06 - #

  • You just bumped into the idiotic decision of making HTML5 *not* XML.

    In this area I don't believe a final decision has been made... there's an open issue on this exact problem in the HTML WG: http://www.w3.org/html/wg/tracker/issues/54 If you'd like to discuss your issue and how it interacts with the HTML 5 specification your best bet to send an email to one of these two lists:

    by Shawn Medero on December 12 2008, 21:20 - #

  • Thanks Ian and Shawn. I updated the article copy to reflect the discussion.

    Btw: I am still wodering if according to correct HTML 5 interpretation the charset attribute is the only correct way to signify the charset in the markup, or whether the old HTML 4 style declaration will be a valid fallback?

    by Pascal Opitz on December 12 2008, 23:28 - #

  • I guess the current draft answers my question:

    The Encoding declaration state's user agent requirements are all handled by the parsing section of the specification. The state is just an alternative form of setting the charset attribute: it is a character encoding declaration.

    by Pascal Opitz on December 12 2008, 23:39 - #

  • May I suggest that you template for HTML, not XHTML (as in XML) for your webpages? Browsers just don't get XML in webpages when you send it as 'text/html', see Hixie's argument. As XSL template writer that means you must at all times be aware that you *won't* generate empty tags for elements such as a DIV, e.g. , since that will break your webpage layout. Which means you'll write less legible templates just for the sake of having XML; And, invariably, you or a colleague will forget to add bogus xsl:text or xsl:comment tags at some place. HTML 5 should solve some of the HTML vs XHTML problems by specifying on the DOM level, instead of the wire format.

    by Jeroen Pulles on December 18 2008, 12:01 - #

  • Jeroen: Thanks for your input, but I think you slightly misunderstood what we were discussing about. We were merely discussing the issues that one faces when trying to generate HTML5 from XSLT. Nothing else.

    I think by now most people are aware of the "XHTML as text/html is harmful" opinion, and I am sure most people will have their own point of view about this. I definitely have read it with care, and decided on the roadmap for this blog and other sites.

    What kind of issues I have to be aware off when generating markup from XSLT when I want to generate XHTML is a different story.
    I personally find the <xsl:preserve-space /> element is a great help to achieve what you want. Also there's the xml:space attribute, which can be set to preserve.

    by Pascal Opitz on December 18 2008, 14:59 - #

  • I know I'm a little late with this comment, but I'm using the following:

    <xsl:output
      method="html"
      encoding="UTF-8"
      omit-xml-declaration="yes"
      indent="yes"
      media-type="text/html"
      doctype-public=""
      doctype-system=""
      />

    Which for me produces the html5 <!DOCTYPE html>. I'm processing this through libxslt (via both Python and PHP).

    by Phillip Oldham on January 6 2011, 08:25 - #


Comments for this article are closed.