Let’s all say it together: XML::Parser Sucks! There, that was cleansing.

After a much prodding of my XML buddies (hi jmac!) and an evil notion of using goto (thankfully Perl doesn’t let you jump into the middle of a function), I came across a seemingly little used XML::Parser function parse_start which returns a new XML::Parser::ExpatNB object (with oh so little documentation) that does EXACTLY WHAT I NEED! I need a parser that parses a stream in increments. Consider how useful this is for dealing with XML messages coming across the network that might f’ing HUGE! This parsing method will at least give me an opportunity to chunk the data into smaller bits (save for the pathological 45TB between a single [even then, there may be options]). Anyway, this is a BEAUTIFUL, LOVERLY THING!!!!

So, here’s a very goofy example of how to work with this bod boy. I’ll be looking to shove this into Frontier::RPC2 in a most Eee-VEIL way. ;-)

use strict;
use warnings;
use XML::Parser;

my $p = XML::Parser->new(
             Style => 'My::Pkg',
            );

print "Reading from __DATA__\n";
my $data; # A place for my text data

# Don't be fooled: it's an 
# object constructor
my $nb_p = $p->parse_start(data => \$data);

while(my $l = <DATA>){
  chomp($l);
  $nb_p->parse_more($l);
  if(my $s = ${$nb_p->{data}}){
    print "Back at the range, I got $s\n";
  }
}
$nb_p->parse_done; 

package My::Pkg;

sub Init {
  my($expat) = @_;

  print "Hello!\n";
}

sub Start {
  my($expat, $tag, %attrs) = @_;
  ${$expat->{data}} = undef;
  print "Start: $tag\n";
}

sub Char {
  my($expat, $text) = @_;
  ${$expat->{data}} = undef;
  return if $text =~ /^\s*$/;

  $expat->{char_bag} = $text;
}

sub End {
  my($expat, $tag) = @_;
  print "End: $tag\n";
  ${$expat->{data}} = 
     $expat->{char_bag};

  # clean up
  $expat->{char_bag} = '';
  return;
}


__END__
<?xml version="1.0" ?>
<a>
  <b>
    <c>
         <d>fiddlesticks</d>
    </c>
  </b>
</a>

[Original use.perl.org post and comments.]