View Single Post
Old 04-15-2009, 03:49 PM   #5
=X=
Wizard
=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.=X= ought to be getting tired of karma fortunes by now.
 
=X='s Avatar
 
Posts: 3,672
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
Sure, it's just 3 scripts that are not overly impressive, but do the job for me
Make sure you've installed Tesseract and have it in your path.


xpdf2txt_byPage.pl : Extracts one page at a time and converts it to a text. The final product is a text file and a jpg.

xpng2txt_byPage: Converts any PNG file in the same directory to text.

xTxt2HTML: Creates one HTML file from the generated text file (NOTE: You might have to run dos2unix first, on cygwin you do)


(NOTE: Some of the executable have their paths hard coded. If the scripts do not work remove the paths.

=X=
Attached Files
File Type: zip PDF2TXT.zip (2.3 KB, 312 views)
=X= is offline   Reply With Quote