perl - How to remove duplicate span tag in xml twig? -


i need merge span tags same style in following xml document:

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">t</span><span style="font-size:10pt;">h</span><span style="font-size:10pt;">e</span></p> <p><span style="font-style:italic;">o</span><span style="font-style:italic;">f</span><span style="font-size:10pt;">e</span></p> </book> 

my desired output is:

<?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">the</span></p> <p><span style="font-style:italic;">of</span><span style="font-size:10pt;">e</span></p> </book> 

this have tried far:

use strict; use xml::twig; $document = xml::twig->new(         keep_encoding=>1,                                                       twig_handlers =>{         }, pretty_print => 'indented', ); $document->parsefile("book.xml"); $document->print(); 

i having difficulty understanding concepts of module. trying possible?

well, you're not removing xml tags there - far xml concerned, each span independent entity.

however, can use xml::twig::elt method prev_sibling - because looks @ nodes @ same level. , if previous node of right type, , same style - concat current text, , delete node. i'm not sure this'll work use cases, it'll ask.

use strict; use warnings; use xml::twig;  $previous_span; $previous_style;  sub merge_span {     ( $twig, $span ) = @_;     $prev = $span->prev_sibling;     if (    $prev         , $prev->tag eq $span->tag         , $prev->att('style') eq $span->att('style')         , not $prev -> has_children         , not $span -> has_children         )     {         $prev->set_text( $prev->text . $span->text );         $span->delete;     } }  $xml = xml::twig->new(     'pretty_print'  => 'indented',     'twig_handlers' => { 'span' => \&merge_span, }, ); $xml->parse( \*data ); $xml->print;   __data__ <?xml version="1.0" encoding="utf-8" standalone="yes"?> <book> <p><span style="font-size:10pt;">t</span><span style="font-size:10pt;">h</span><span style="font-size:10pt;">e</span></p> <p><span style="font-style:italic;">o</span><span style="font-style:italic;">f</span><span style="font-size:10pt;">e</span></p> </book> 

Comments