Converting UTF-16 xml file to UTF-8 with a XmlWriter

Yesterday I was asked a question. A collegue asked how to convert an UTF-16 xml file to an UTF-8 one. He was using xslt for the conversion. I took a look at his code and everything looked fine. He was using a StringBuilder, StringWriter and a XmlWriter. He had set the encoding settings for the XmlWriter to UTF-8. Should work. Shouldn’t it? No, it does not work because a .Net string has a UTF-16 encoding and it cannot be changed, of course. So even though my collegue was setting the encoding it would not be used anyway.

Another approach was needed and I found one. The XmlWriter accepts a stream also. A MemoryStream can be used. It is just a buffer of bytes and won’t do anything with the content. Here is some example code. The first example uses a StringBuilder and doesn’t work. The second example uses the MemoryStream.

Example 1 (does not work):

StringBuilder sb = new StringBuilder();
using (XmlWriter xw = XmlWriter.Create(new StringWriter(sb)))
{
    XslCompiledTransform xct = new XslCompiledTransform();
    xct.Load(@"cdcatalog.xsl");
    xct.Transform(@"cdcatalog.xml", xw);
    xw.Flush();
}
Console.WriteLine(sb.ToString());

 

Example 2 (works):

MemoryStream ms = new MemoryStream();
using (XmlWriter xw = XmlWriter.Create(new StreamWriter(ms)))
{
    XslCompiledTransform xct = new XslCompiledTransform();
    xct.Load(@"cdcatalog.xsl");
    xct.Transform(@"cdcatalog.xml", xw);
    xw.Flush();
}
string xmlOutput = Encoding.UTF8.GetString(ms.GetBuffer());
Console.WriteLine(xmlOutput);

 

You can download an example project here.

  kick it on DotNetKicks.com