Fluent XML Parsing Using C#'s Dynamic Type Part 1

by Kevin Hazzard

Parsing XML documents using a fluent interface is a very compelling idea. Given some simple XML that contains information about books and their authors (see the end of this article for a sample), I'd like to be able to parse it with something like this:

// for brevity, not all of the XML is shown here - use your imagination
string xml = "......";
 
// create a dynamic XML parser that enables a fluent interface
dynamic dx = new DynamicXml( xml );
 
// show the last name of the 1st author of the 3rd book
Console.WriteLine( dx.book[2].authors.author[0].name.last );

Being able to use the dot and [] operators to traverse an XML document makes a lot of sense. But how could we write a class that makes this possible for any XML document, no matter what schema it has? The dynamic type in C# opens up some great possibilities in this case. All you need is a class that will handle the member access (dot operator) and indexer access ([] operation) at runtime. Using dynamic typing in C#, these things can be late-bound, so if a class like the one called DynamicXML in this example existed to do the actual XML parsing, it could make XML handling much simpler and more intuitive. For reference purposes, the full XML sample that we'll be working with is shown at the end of this article.

The source code for my DynamicXml class follows. Now, be warned, I've deliberately left all of the error handling code out of this version. In a subsequent article, I'll add some robustness to the DynamicXml class along with some other dynamic goodies. Scroll down to get a complete analysis of what this class does.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Dynamic;
using System.Xml.Linq;
using System.Collections;
 
public class DynamicXml : DynamicObject, IEnumerable
{
  private readonly ListXElement> _elements;
 
  public DynamicXml( string text )
  {
    var doc = XDocument.Parse( text );
    _elements = new ListXElement> { doc.Root };
  }
 
  protected DynamicXml( XElement element )
  {
    _elements = new ListXElement> { element };
  }
 
  protected DynamicXml( IEnumerableXElement> elements )
  {
    _elements = new ListXElement>( elements );
  }
 
  public override bool TryGetMember( GetMemberBinder binder, out object result )
  {
    result = null;
    if (binder.Name == "Value")
      result = _elements[0].Value;
    else if (binder.Name == "Count")
      result = _elements.Count;
    else
    {
      var attr = _elements[0].Attribute( XName.Get(binder.Name ) );
      if (attr != null)
        result = attr;
      else
      {
        var items = _elements.Descendants( XName.Get(binder.Name ) );
        if (items == null || items.Count() == 0) return false;
        result = new DynamicXml( items );
      }
    }
    return true;
  }
 
  public override bool TryGetIndex( GetIndexBinder binder, object[] indexes, out object result )
  {
    int ndx = (int)indexes[0];
    result = new DynamicXml( _elements[ndx] );
    return true;
  }
 
  public IEnumerator GetEnumerator()
  {
    foreach (var element in _elements)
      yield return new DynamicXml( element );
  }
}

OK, let's analyze the DynamicXml class section by section. First of all, notice the inclusion of the new System.Dynamic namespace. This is where the base class called DynamicObject comes from. The DynamicObject class wraps up a lot of the messy details that you would have to take care of if you were writing an implementation of the IDynamicMetaObjectProvider interface. This is an interface that the runtime binder looks for when applying special treatment to dynamically typed objects. Since the DynamicObject class provides an implementation of that interface, using it as a base class for our DynamicXml class does most of the work for us.

Now scan down into the class and notice the TryGetMember and TryGetIndex overrides in the DynamicXml class. The DynamicObject base class provides virtual methods for the various Dynamic Language Runtime (DLR) "verbs" that we want to implement. The DLR uses these verbs to define the border between languages with the common operations that we need to communicate. For example, here are some of the operations that the DLR defines:

  • InvokeMember - call a method
  • GetMember - call the accessor for a property
  • SetMember - call the mutator for a property
  • CreateInstance - this one's pretty self-explanatory; think C#'s new operator
  • GetIndex - when treating the object like a collection, get the element at a specific index
  • SetIndex - set the element at a specific index

Remember that I said earlier that if you wanted a fluent interface for parsing an XML document, you would need to handle the member access (using the dot operator) and index access (using C#'s [ and ] subscripting syntax). Well, these two overrides, TryGetMember and TryGetIndex provide just the implementations we're interested in. So, when the runtime binder accesses members and indexes on a DynamicXml object, it will be routed to these methods. Let's start with the TryGetMember method.

public override bool TryGetMember( GetMemberBinder binder, out object result )
{
  result = null;
  if (binder.Name == "Value")
    result = _elements[0].Value;
  else if (binder.Name == "Count")
    result = _elements.Count;
  else
  {
    var attr = _elements[0].Attribute( XName.Get( binder.Name ));
    if (attr != null)
      result = attr;
    else
    {
      var items = _elements.Descendants( XName.Get( binder.Name ) );
      if (items == null || items.Count() == 0) return false;
      result = new DynamicXml( items );
    }
  }
  return true;
}

In a sense, the DynamicXml class is just a facade for a List as you can see in the class's declaration. That list is managed in a field called _elements. The TryGetMember override processes requests to access the XElement objects in _elements in a certain order. First, it looks for requests to the Value and Count members. The Count member comes straight off of the _elements list. But the Value member is assumed to be the Value of the first XElement in the list. This is because sometimes, we want the fluent interface to be dealing with a single XML element and other times we want it to be expressing array-like access using the [ and ] symbols. Next, the TryGetMember function gives preference to attributes that may exist in the XML. This is just a design choice. We could have chosen to process child elements before the attributes of the current one instead. If an XML attribute is found by the name that the runtime binder is seeking, it will be returned.

This is an interesting point that you should stop to reflect on. When a dynamic object returns a non-dynamic object, the result is still considered dynamic because the runtime binder is controlling the execution. One way to think of the runtime binder is that it's like late compilation. Even if a statically typed object like an XAttribute is returned, accesses into that object are handled by the runtime binder. Since the XAttribute is not a dynamic type, the C#-specific binder would be used to find and invoke it's Value property, most likely through some type of reflection. So, it's OK to return non-dynamic objects in a dynamic "pipeline". You can even switch back and forth if you desire. The first object in the dynamic chain could return a non-dynamic object, which could return a dynamic object that, through the next invocation, returns a non-dynamic object, etc. The point is that once you start using the dynamic runtime binder, you'll keep using is until the call chain is complete. You should think long and hard about the performance impact that might have on your applications. Runtime binding is slower than the static, compile-time binding that you are accustomed to. There are things we can do to make runtime binding faster but that's a story for another day.

To finish up with describing how TryGetMember works, the XML Descendents of the current element are queried by the name presented by the runtime binder in it's Name property. The resulting iterator is used to create a new instance of the DynamicXml type which is returned from the member access. The reason we must always treat the result as a list is that we can't tell from the source XML if the member being sought is supposed to be a single element or a collection. If we had an XML Schema Definition for the source document, perhaps we could make an optimization for the single instance cases. But this code works OK. Now on to the much simpler TryGetIndex implementation.

public override bool TryGetIndex( GetIndexBinder binder, object[] indexes, out object result )
{
  int ndx = (int)indexes[0];
  result = new DynamicXml( _elements[ndx] );
  return true;
}

Now, let's look at TryGetIndex override. It's very simple by comparison to TryGetMember. It expects the indexing type to be a single dimension integer. So it performs that cast then passes the index to the _elements list to create (you guessed it) another DynamicXml object. In this case, a special constructor does the work of converting that single element into a list for reasons described above.

public IEnumerator GetEnumerator()
{
  foreach (var element in _elements)
    yield return new DynamicXml( element );
}

The last important thing to understand is the implementation of IEnumerable. In those cases when the DynamicXml instance is acting like a collection of elements, we would like to be able to enumerate over them. However, internally, the DynamicXml manages XElements which are not, by definition, dynamic. During the iteration of XML in our fluent interface, we want to be able to continue being fluent. To do that, the iterator has to return not XElement objects but DynamicXml objects. So, the enumerator yields each XElement by instantiating a DynamicXml wrapper around it.

You may be asking why I didn't implement strongly-typed enumeration using IEnumerable instead. The reason for that is simple. In a dynamic processing context, types get thrown out the window, so to speak. We could implement strong typing but it wouldn't usually matter because iteration that yields DynamicXml types is probably already using the dynamic runtime binder anyhow. For the XML document below, here's a snippet of C# code that parses the entire document using our dynamic, fluent interface.

// the variable xml contains the string text of an XML document
dynamic dx = new DynamicXml(xml);
Console.WriteLine("PublicationDate='{0}'", dx.pubdate.Value);
Console.WriteLine("BookCount='{0}'", dx.book.Count);
foreach (dynamic b in dx.book)
{
  Console.WriteLine("----- Begin Book -----");
  Console.WriteLine("Price='{0}'", b.price.Value);
  Console.WriteLine("Title='{0}'", b.title.Value);
  Console.WriteLine("AuthorCount='{0}'", b.authors.author.Count);
  foreach (dynamic a in b.authors.author)
  {
    Console.WriteLine("---- Begin Author ----");
    Console.WriteLine("EmailAddress='{0}'", a.email.address.Value);
    Console.WriteLine("FirstName='{0}'", a.name.first.Value);
    Console.WriteLine("MiddleName='{0}'", a.name.middle.Value);
    Console.WriteLine("LastName='{0}'", a.name.last.Value);
    Console.WriteLine("----- End Author -----");
  }
  Console.WriteLine("------ End Book ------");
}

This code parses the XML fluently and produces the following output from the sample XML below:

PublicationDate='2009-05-20'
BookCount='3'
----- Begin Book -----
Price='45.99'
Title='Open Heart Surgery for Dummies'
AuthorCount='1'
  ---- Begin Author ----
  EmailAddress='mort@surgery.com'
  FirstName='Mortimer'
  MiddleName='Q.'
  LastName='Snerdly'
  ----- End Author -----
------ End Book ------
----- Begin Book -----
Price='32.75'
Title='Skydiving on a Budget'
AuthorCount='2'
  ---- Begin Author ----
  EmailAddress='tfreefall@jump.com'
  FirstName='Trudy'
  MiddleName='L.'
  LastName='Freefall'
  ----- End Author -----
  ---- Begin Author ----
  EmailAddress='bernie@airborne.com'
  FirstName='Bernard'
  MiddleName='M.'
  LastName='Fallson'
  ----- End Author -----
------ End Book ------
----- Begin Book -----
Price='22.40'
Title='How to Dismantle a Bomb'
AuthorCount='1'
  ---- Begin Author ----
  EmailAddress='bono@u2.com'
  FirstName='Bono'
  MiddleName=''
  LastName='Vox'
  ----- End Author -----
------ End Book ------
<> pubdate="2009-05-20">
  <> price="45.99" title="Open Heart Surgery for Dummies">
    <> isbn10="4389880339"/>
    <>>
      <>>
        <>>
          <>>Mortimer>
          <>>Q.>
          <>>Snerdly>
        >
        emailaddress="mort@surgery.com"/>
      >
    >
  >
  <> price="32.75" title="Skydiving on a Budget">
    <> isbn="2129034454"/>
    <>>
      <>>
        <>>
          <>>Trudy>
          <>>L.>
          <>>Freefall>
        >
        emailaddress="tfreefall@jump.com"/>
      >
      <>>
        <>>
          <>>Bernard>
          <>>M.>
          <>>Fallson>
        >
        emailaddress="bernie@airborne.com"/>
      >
    >
  >
  <> price="22.40" title="How to Dismantle a Bomb">
    <>>
      <>>
        <>>
          <>>Bono>
          <>/>
          <>>Vox>
        >
        emailaddress="bono@u2.com"/>
      >
    >
  >
>