Quote:
Originally Posted by nrapallo
tompe:
I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.
In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.
I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?
I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g; # Unix line endings
$book = ~s/\n/\x01/g; # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g; # Separate paragraphs
$book = ~s/\x01/ /g; # Insert whitespace
$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick
|
Generally PalmDOC files (the ones you process) are expected to be wrapped by the reader and only contain returns at paragraph boundaries thus there is no line end. Why would you want line endings?