I ran into a problem recently that seemed very familiar because I am sure I have had encountered it before. So, once I re-solved it, I decided I would write a blog entry in the hope that I will find this post when I next run into the issue in a few years' time.
The problem manifested itself in the simple scenario below: I serialised a simple object into UTF-8 XML and then tried to parse the result using XDocument.Parse.
public class SimpleClass
{
public string SomeProperty { get; set; }
public int AnotherProperty { get; set; }
}
[TestMethod]
public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception()
{
var someObject = new SimpleClass { SomeProperty = "Abc", AnotherProperty = 42 };
using (var memoryStream = new MemoryStream())
using (var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8))
{
var serialiser = new XmlSerializer(someObject.GetType());
serialiser.Serialize(xmlTextWriter, someObject);
var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray());
XDocument.Parse(utf8Xml); //fails at this point with exception System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1
}
}
The document in utf8Xml seems fine:
<?xml version="1.0" encoding="utf-8"?><SimpleClass xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SomeProperty>Abc</SomeProperty><AnotherProperty>42</AnotherProperty></SimpleClass>
However, the test fails when trying to parse the XML.
I found the cause of the problem to be the byte order mark (BOM) added to the start of the UTF-8 string. The solution was to construct a new instance of UTF8Encoding rather than using the static Encoding.UTF8. One of the the UTF8Encoding constructor overloads takes a parameter encoderShouldEmitUTF8Identifier. Set this to false, and everything works!
This is the passing test:
public class SimpleClass
{
public string SomeProperty { get; set; }
public int AnotherProperty { get; set; }
}
[TestMethod]
public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception()
{
var someObject = new SimpleClass { SomeProperty = "Abc", AnotherProperty = 42 };
using (var memoryStream = new MemoryStream())
using (var xmlTextWriter = new XmlTextWriter(memoryStream, new UTF8Encoding(false, true)))
{
var serialiser = new XmlSerializer(someObject.GetType());
serialiser.Serialize(xmlTextWriter, someObject);
var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray());
XDocument.Parse(utf8Xml);
}
}