Get from PDF to HTML as fast as you can. Some PDF books are hell to convert. Easy are books with large blocks of text, like philosophical texts. Computer books are hell: every little change in spacing gets a style="...." in the html. Stripping some of it may result in a 50% file size.
I use UltraEdit to edit the HTML.
It sometimes may be faster, I suspect, to start an empty HTML file and cutting and pasting in the blocks of text fro the PDF.
A 500page philosophical text took me nearly two weeks (it being a first try for me)...
ABBYY Finereader is a good OCR program, but still: a lot of editor work afterwards.
It can OCR PDF files to text-file..
|