I ran into a problem recently that seemed very familiar because I am sure I have had encountered it before. So, once I re-solved it, I decided I would write a blog entry in the hope that I will find this post when I next run into the issue in a few years' time.
The problem manifested itself in the simple scenario below: I serialised a simple object into UTF-8 XML and then tried to parse the result using XDocument.Parse.
public class SimpleClass { public string SomeProperty { get; set; } public int AnotherProperty { get; set; } } [TestMethod] public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception() { var someObject = new SimpleClass { SomeProperty = "Abc", AnotherProperty = 42 }; using (var memoryStream = new MemoryStream()) using (var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8)) { var serialiser = new XmlSerializer(someObject.GetType()); serialiser.Serialize(xmlTextWriter, someObject); var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray()); XDocument.Parse(utf8Xml); //fails at this point with exception System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1 } }
The document in utf8Xml seems fine:
<?xml version="1.0" encoding="utf-8"?><SimpleClass xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SomeProperty>Abc</SomeProperty><AnotherProperty>42</AnotherProperty></SimpleClass>However, the test fails when trying to parse the XML.
I found the cause of the problem to be the byte order mark (BOM) added to the start of the UTF-8 string. The solution was to construct a new instance of UTF8Encoding rather than using the static Encoding.UTF8. One of the the UTF8Encoding constructor overloads takes a parameter encoderShouldEmitUTF8Identifier. Set this to false, and everything works!
This is the passing test:
public class SimpleClass { public string SomeProperty { get; set; } public int AnotherProperty { get; set; } } [TestMethod] public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception() { var someObject = new SimpleClass { SomeProperty = "Abc", AnotherProperty = 42 }; using (var memoryStream = new MemoryStream()) using (var xmlTextWriter = new XmlTextWriter(memoryStream, new UTF8Encoding(false, true))) { var serialiser = new XmlSerializer(someObject.GetType()); serialiser.Serialize(xmlTextWriter, someObject); var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray()); XDocument.Parse(utf8Xml); } }