View Single Post
Old 02-20-2008, 12:09 AM   #300
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by nrapallo View Post
tompe:

I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.

In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.

I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?

I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g;                   # Unix line endings
$book = ~s/\n/\x01/g;                # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g;  # Separate paragraphs
$book = ~s/\x01/ /g;                 # Insert whitespace

$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick
Generally PalmDOC files (the ones you process) are expected to be wrapped by the reader and only contain returns at paragraph boundaries thus there is no line end. Why would you want line endings?
DaleDe is offline   Reply With Quote