MobileRead Forums - View Single Post

rfog · 07-19-2020, 04:42 AM

Quote:

Originally Posted by willus

You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.

Ho Ho.

Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage.

So many tools, and so little time...