tompe:
I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.
In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.
I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?
I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g; # Unix line endings
$book = ~s/\n/\x01/g; # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g; # Separate paragraphs
$book = ~s/\x01/ /g; # Insert whitespace
$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick