View Single Post
Old 07-07-2011, 05:40 AM   #2
RobbieRobot
Junior Member
RobbieRobot began at the beginning.
 
RobbieRobot's Avatar
 
Posts: 5
Karma: 10
Join Date: May 2011
Location: Queensland, Australia
Device: Kindle 3
Lightbulb HTML

You could use some very simple HTML tags to mark the beginnings of the paragraphs. All the lines in the paragraphs will then flow because of the nature of HTML.

Here is a little filter program written in perl which expects to read plain text from STDIN and prints simple HTML to STDOUT

#!/usr/bin/perl
#
# Convert plain text with a blank line between paragraphs into html
#
use strict;

my ($rope, @html);

while (<STDIN>) {
$_ =~ s/\r//; # make all text look like unix text
$_ =~ s/\x0c//;
$_ =~ s/\n/\xff/;
push(@html,$_);
}

$rope = join("\xff",@html); # Make one huge string

$rope =~ s/\xff\xff\xff/\n<p>/ig;# Convert double new-line into paragraph
#print $rope; exit;
$rope =~ s/\xff\s+/\n<p>/ig; # Convert single new-line followed by whitespace into paragraph
$rope =~ s/\xff/ /ig; # Convert remaining new-lines into spaces

$rope =~ s/\[\d+\]//g; # [32] etc tags from .PDF saved as .txt

print "<HTML><HEAD><TITLE>From text2HTML</TITLE></HEAD><BODY>\n\n";
print $rope;
print"\n\n</BODY></HTML>\n";
#EOF
RobbieRobot is offline