Hello All,
I just came in this old thread by following a link...
I use the folowing ruby script (conv.rb) to convert Gutenberg txt files to simple html:
Code:
#!/usr/bin/ruby
txt = IO.read(ARGV[0])
txt.gsub!(/\r/,'')
parts = txt.split(/\n\n\n\n/)
parts.shift
parts.pop
$stderr.print "%s: bytes=%d, parts=%d\n" % [ARGV[0], txt.size, parts.size]
print "<html>\n<head><title>#{ARGV[0]}</title></head>\n<body>\n"
parts.each do |part|
pars = part.split(/\n\n+/)
head = pars.shift
print "<h1>#{head}</h1>\n"
pars.each do |par|
par.gsub!(/\[\d+\].+/, '')
par.gsub!(/_(.*?)_/m, '<i>\1</i>')
print "<p>#{par}</p>\n"
end
end
After converting you can convert it to a nice custom pdf (with toc) with html2pdf:
Code:
ruby conv.rb xxx.txt >xxx.html
htmldoc -f xxx.pdf --header "" --footer "" --top 3mm --bottom 1mm --left 1mm --right 1mm --size 12x15cm xxx.html
You might have to experiment a bit with the script for optimal results (depending on the exact text lay-out)...
Hope this helps somebody