View Single Post
Old 01-14-2009, 09:32 AM   #6
hansel
JSR FFD2
hansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheesehansel can extract oil from cheese
 
hansel's Avatar
 
Posts: 305
Karma: 1045
Join Date: Aug 2008
Location: Rotterdam, Netherlands, Europe, Sol 3
Device: iliad
Hello All,
I just came in this old thread by following a link...

I use the folowing ruby script (conv.rb) to convert Gutenberg txt files to simple html:
Code:
#!/usr/bin/ruby
txt = IO.read(ARGV[0])
txt.gsub!(/\r/,'')
parts = txt.split(/\n\n\n\n/)
parts.shift
parts.pop

$stderr.print "%s: bytes=%d, parts=%d\n" % [ARGV[0], txt.size, parts.size]
print "<html>\n<head><title>#{ARGV[0]}</title></head>\n<body>\n"
parts.each do |part|
  pars = part.split(/\n\n+/)
  head = pars.shift
  print "<h1>#{head}</h1>\n"
  pars.each do |par|
   par.gsub!(/\[\d+\].+/, '')
   par.gsub!(/_(.*?)_/m, '<i>\1</i>')
   print "<p>#{par}</p>\n"
  end
end
After converting you can convert it to a nice custom pdf (with toc) with html2pdf:

Code:
ruby conv.rb xxx.txt >xxx.html
htmldoc -f xxx.pdf --header "" --footer "" --top 3mm --bottom 1mm --left 1mm --right 1mm --size 12x15cm xxx.html
You might have to experiment a bit with the script for optimal results (depending on the exact text lay-out)...

Hope this helps somebody
hansel is offline   Reply With Quote