Quote:
Originally Posted by willus
You can do something like this for batch text extraction:
k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100
For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.
|
Ho Ho.
Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage.


So many tools, and so little time...