i need merge span
tags same style in following xml document:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">t</span><span style="font-size:10pt;">h</span><span style="font-size:10pt;">e</span></p> <p><span style="font-style:italic;">o</span><span style="font-style:italic;">f</span><span style="font-size:10pt;">e</span></p> </book>
my desired output is:
<?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">the</span></p> <p><span style="font-style:italic;">of</span><span style="font-size:10pt;">e</span></p> </book>
this have tried far:
use strict; use xml::twig; $document = xml::twig->new( keep_encoding=>1, twig_handlers =>{ }, pretty_print => 'indented', ); $document->parsefile("book.xml"); $document->print();
i having difficulty understanding concepts of module. trying possible?
well, you're not removing xml tags there - far xml concerned, each span
independent entity.
however, can use xml::twig::elt
method prev_sibling
- because looks @ nodes @ same level. , if previous node of right type, , same style - concat current text, , delete node. i'm not sure this'll work use cases, it'll ask.
use strict; use warnings; use xml::twig; $previous_span; $previous_style; sub merge_span { ( $twig, $span ) = @_; $prev = $span->prev_sibling; if ( $prev , $prev->tag eq $span->tag , $prev->att('style') eq $span->att('style') , not $prev -> has_children , not $span -> has_children ) { $prev->set_text( $prev->text . $span->text ); $span->delete; } } $xml = xml::twig->new( 'pretty_print' => 'indented', 'twig_handlers' => { 'span' => \&merge_span, }, ); $xml->parse( \*data ); $xml->print; __data__ <?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">t</span><span style="font-size:10pt;">h</span><span style="font-size:10pt;">e</span></p> <p><span style="font-style:italic;">o</span><span style="font-style:italic;">f</span><span style="font-size:10pt;">e</span></p> </book>
Comments
Post a Comment