Wednesday, November 20, 2013

System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1

I ran into a problem recently that seemed very familiar because I am sure I have had encountered it before. So, once I re-solved it, I decided I would write a blog entry in the hope that I will find this post when I next run into the issue in a few years' time.

The problem manifested itself in the simple scenario below: I serialised a simple object into UTF-8 XML and then tried to parse the result using XDocument.Parse.

    public class SimpleClass
    {
        public string SomeProperty { get; set; }
        public int AnotherProperty { get; set; }
    }

    [TestMethod]
    public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception()
    {
        var someObject = new SimpleClass  { SomeProperty = "Abc", AnotherProperty = 42 };

        using (var memoryStream = new MemoryStream())
        using (var xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8))
        {
            var serialiser = new XmlSerializer(someObject.GetType());
            serialiser.Serialize(xmlTextWriter, someObject);

            var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray());
            XDocument.Parse(utf8Xml);   //fails at this point with exception System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1
        }
    }

The document in utf8Xml seems fine:

<?xml version="1.0" encoding="utf-8"?><SimpleClass xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SomeProperty>Abc</SomeProperty><AnotherProperty>42</AnotherProperty></SimpleClass>
However, the test fails when trying to parse the XML.

I found the cause of the problem to be the byte order mark (BOM) added to the start of the UTF-8 string. The solution was to construct a new instance of UTF8Encoding rather than using the static Encoding.UTF8. One of the the UTF8Encoding constructor overloads takes a parameter encoderShouldEmitUTF8Identifier. Set this to false, and everything works!

This is the passing test:

    public class SimpleClass
    {
        public string SomeProperty { get; set; }
        public int AnotherProperty { get; set; }
    }

    [TestMethod]
    public void Given_a_simple_object_When_serialised_to_XML_and_deserialised_Then_it_should_not_throw_an_exception()
    {
        var someObject = new SimpleClass  { SomeProperty = "Abc", AnotherProperty = 42 };

        using (var memoryStream = new MemoryStream())
        using (var xmlTextWriter = new XmlTextWriter(memoryStream, new UTF8Encoding(false, true)))
        {
            var serialiser = new XmlSerializer(someObject.GetType());
            serialiser.Serialize(xmlTextWriter, someObject);

            var utf8Xml = Encoding.UTF8.GetString(memoryStream.ToArray());
            XDocument.Parse(utf8Xml);
        }
    }

0 comments:

Post a Comment