How Parse XML Using Regular Expressions And C#

by JasonRShaver 28. February 2008 16:41
I regularly find myself needing a quick way to parse a smallish block of XML and extract the contents of a number of known elements. Now at this point I must confess that I am not good with regular expressions, but I am trying to learn so here is one that makes my life easier:
(?<=Identifier>)[\S\s]*?(?=\<\/Identifier)
What this block does is mach on everything in the element, while NOT matching the element itself. So if you take the following block of XML:
 
Jason R. Shaver
Male
30-39
Quite handsome really...
 
 
And pass it into the following ProcessParameters method:
private void ProcessParameters(String xml)
{
_Identifier = ExtractContentFromXml(xml, "Identifier", String.Empty);
_Notes = ExtractContentFromXml(xml, "Notes", String.Empty);
_Gender = ExtractContentFromXml(xml, "Gender", String.Empty);
_GivenAge = ExtractContentFromXml(xml, "GivenAge", String.Empty);
}
private String ExtractContentFromXml(String xml, String element, String defaultValue)
{
String RegExpression = String.Format(@"(?<={0}>)[\S\s]*?(?=\<\/{0})", element);
Match ThisMatch = Regex.Match(xml, RegExpression);
if (ThisMatch.Success)
return ThisMatch.Value;
else
return defaultValue;
}
You get the outputs, and this SHOULD be easier to read (for the programmer, not the program) then using the XML classes. I have not checked the performance of this method over the System.Xml namespace classes, but in a program that looped through 100 xml files and extracting the above it took less than a second.

Tags:

Blog

About the author

I am a software developer working for Microsoft in Redmond, WA.  In addition, my wife and I own TTXOnline, what is likely the 3rd largest table tennis store in the US.

Month List

Page List