MobileRead Forums - View Single Post

nrapallo · 03-17-2009, 08:38 AM

Quote:

Originally Posted by Hadrien

We don't do this, Wired does. Cleaning up RSS feeds is incredibly annoying believe me: messy XHTML, wrong character encoding, entities encoded twice etc...

It's OK, I try and cope with badly coded (at the source) RSS feeds by RegEx'ing a workable solution as well as quirks and limitations of the eBook Publisher software I rely on within Mobi2IMP.

I'm currently updating Mobi2IMP to properly convert your Feedbooks.com feeds (stored in mobipocket format) and I think I can say I'm winning the battle.

Most of the times, the resulting conversion does work as it's supposed to!

BTW, Hadrien, there's one quirk that you may try and fix. I did notice (though I can't off hand remember where I saw this) in some exploded .mobi RSS feeds that the HTML tag <br \> was used. I needed to substitute <br /> instead.

Here's my solution, utilized to properly convert your RSS feeds, written as Perl RegEx:

Code:

#fix up feedbooks.com news feeds quirks 
$html =~ s/<br(\s)*\\>/<br \/>/gi;
$html =~ s/<a href([^>]*)><a name([^>]*)><\/a>/<a name$2><\/a><a href$1>/gi;