View Single Post
Old 02-19-2008, 10:07 PM   #299
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
tompe:

I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.

In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.

I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?

I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g;                   # Unix line endings
$book = ~s/\n/\x01/g;                # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g;  # Separate paragraphs
$book = ~s/\x01/ /g;                 # Insert whitespace

$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick

Last edited by nrapallo; 02-19-2008 at 10:10 PM.
nrapallo is offline   Reply With Quote