Quote:
Originally Posted by Hadrien
We don't do this, Wired does. Cleaning up RSS feeds is incredibly annoying believe me: messy XHTML, wrong character encoding, entities encoded twice etc...
|
It's OK, I try and cope with badly coded (at the source) RSS feeds by RegEx'ing a workable solution as well as quirks and limitations of the eBook Publisher software I rely on within Mobi2IMP.
I'm currently updating Mobi2IMP to properly convert your Feedbooks.com feeds (stored in mobipocket format) and I think I can say I'm winning the battle.
Most of the times, the resulting conversion does work as it's supposed to!
BTW, Hadrien, there's one quirk that you may try and fix. I did notice (though I can't off hand remember where I saw this) in some exploded .mobi RSS feeds that the HTML tag <br \> was used. I needed to substitute <br /> instead.
Here's my solution, utilized to properly convert your RSS feeds, written as Perl RegEx:
Code:
#fix up feedbooks.com news feeds quirks
$html =~ s/<br(\s)*\\>/<br \/>/gi;
$html =~ s/<a href([^>]*)><a name([^>]*)><\/a>/<a name$2><\/a><a href$1>/gi;