Quote:
Originally Posted by nrapallo
tompe:
I recently processed a .pdb (TEXt/REAd) and got a long series of words with no line breaks.
In 'mobi2html' I tried using '--rawhtml' and saw that there were <CR><LF> line endings in the text, but they seem to disappear when processed.
I couldn't find where the line endings were being stripped and replaced with spaces. Since the text feed to HTML::TreeBuilder had no HTML tags, would that be the culprit?
I tried using substituitions on the raw text to produce basic HTML code, but it didn't work.
Code:
my $book = $text;
$book = ~s/\cM//g; # Unix line endings
$book = ~s/\n/\x01/g; # Collapse lines
$book = ~s/\x01\x01/<\/p>\n\n<p>/g; # Separate paragraphs
$book = ~s/\x01/ /g; # Insert whitespace
$text = "<html><body><p>" . $book . "</p></body></html>";
-Nick
|
Here is a test file to see what can be done to convert .pdb properly (text to HTML code internally). All I get is one long line of words in the resulting .html with no form feeds/para boundaries.
-Nick