Thread: html problem
View Single Post
Old 03-17-2009, 08:38 AM   #13
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Hadrien View Post
We don't do this, Wired does. Cleaning up RSS feeds is incredibly annoying believe me: messy XHTML, wrong character encoding, entities encoded twice etc...
It's OK, I try and cope with badly coded (at the source) RSS feeds by RegEx'ing a workable solution as well as quirks and limitations of the eBook Publisher software I rely on within Mobi2IMP.

I'm currently updating Mobi2IMP to properly convert your Feedbooks.com feeds (stored in mobipocket format) and I think I can say I'm winning the battle.

Most of the times, the resulting conversion does work as it's supposed to!

BTW, Hadrien, there's one quirk that you may try and fix. I did notice (though I can't off hand remember where I saw this) in some exploded .mobi RSS feeds that the HTML tag <br \> was used. I needed to substitute <br /> instead.

Here's my solution, utilized to properly convert your RSS feeds, written as Perl RegEx:
Code:
#fix up feedbooks.com news feeds quirks 
$html =~ s/<br(\s)*\\>/<br \/>/gi;
$html =~ s/<a href([^>]*)><a name([^>]*)><\/a>/<a name$2><\/a><a href$1>/gi;

Last edited by nrapallo; 03-17-2009 at 10:24 AM. Reason: typo
nrapallo is offline   Reply With Quote