Monday, 27 September 2010

XML Character Support

One of the projects I'm working on heavily uses a ReST API and performs a lot of XML processing. Recently we've had some serialisation errors when processing some of the XML, resulting in a .Net error throwing:

System.InvalidOperationException: There is an error in XML document (1, XXX). ---> System.Xml.XmlException: XXX, hexadecimal value XXX, is an invalid character.

So I examined the XML payload and could see it contained a non-pritable/control character. I raised the case with the API developers but I found myself curious to find a definitive list of what characters are valid in XML documents. I soon found the Extensible Markup Language (XML) 1.0 Specification, which contained a section regarding the supported XML characters.

You can encode a massive range of characters in XML documents, but there are some non-pritable/control characters that are not supported. Legal non-pritable/control characters include tab (xx09), carriage return (xx0D) and line feed (xx0A). For printable characters it's the legal characters of Unicode and ISO/IEC 10646.

