Sure, it's just 3 scripts that are not overly impressive, but do the job for me
Make sure you've installed Tesseract and have it in your path.
xpdf2txt_byPage.pl : Extracts one page at a time and converts it to a text. The final product is a text file and a jpg.
xpng2txt_byPage: Converts any PNG file in the same directory to text.
xTxt2HTML: Creates one HTML file from the generated text file (NOTE: You might have to run dos2unix first, on cygwin you do)
(NOTE: Some of the executable have their paths hard coded. If the scripts do not work remove the paths.
=X=
|