Latest: buggy behaviour of parent:: in PHP 5.3.3

Content with Style

Web Technique

UTF-8 based transformation output in .Net

by Pascal Opitz on August 17 2006, 11:38

Using XSL transformations in .Net I came accross the weird behaviour that my transformations would be UTF-16 encoded even though I specified UTF-8 in the <xsl:output /> tag.

This left me a bit speechless, and I was assuming that this only could be a .Net bug. After a bit of research, however, I found this to be a result of .Net being very specific about character encodings.

In my following example the StringWriter has the property Encoding set to System.Text.Encoding.UTF-16, hence the output charset will be UFT-16 as well, no matter what I specify as character set in the XSL.


XslTransform xslt = new XslTransform();
StringWriter output = new StringWriter();
xslt.Transform(xml, args, output);
String code_transformed = output.ToString();

Steven Livingstone pointed out that, since the encoding property of System.IO.StringWriter is a read only property, one has to provide a different Stream object to recieve the transformation output, if this is to be encoded in UTF-8:


XslTransform xslt = new XslTransform();
MemoryStream ms = new MemoryStream();
xslt.Transform(xml, args, ms);
ms.Position = 0;
StreamReader sr = new StreamReader(ms, Encoding.UTF8);
String code_transformed = sr.ReadToEnd();

Another possibility would be to extend the StringWriter class in order to make a different encoding possible, as suggested on Robert McLaws FunWithCoding.Net which would read as follows:


using System;
using System.IO;
using System.Text;

namespace MyAwesomeNamespace
{
  public class StringWriterWithEncoding : StringWriter
  {
    private Encoding _enc;

    public StringWriterWithEncoding(Encoding NewEncoding) : base()
    {
      _enc = NewEncoding;
    }

    public override System.Text.Encoding Encoding
    {
      get
      {
         return _enc;
      }
    }
  }
}

Comments

Don't miss the opportunity to leave the first comment.

Leave your comment

Comments are moderated.
Tags allowed: a, strong, em, code, ul, ol, li, q, blockquote, br, p

Advertisement
Advertisement