View Single Post
Old 07-19-2020, 03:42 AM   #10
rfog
Guru
rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.rfog ought to be getting tired of karma fortunes by now.
 
Posts: 696
Karma: 2383012
Join Date: Aug 2007
Location: Schiedam (The Netherlands)
Device: Lots of eInk devices and iOS stuff
Quote:
Originally Posted by willus View Post
You can do something like this for batch text extraction:

k2pdfopt -ocrout %s_text.txt -o dummy.pdf "*.pdf" -mode copy -n -dpi 100

For every file, e.g. myfile.pdf, this will create myfile_text.txt which will have the extracted text layer.
Ho Ho.

Impressive. Even faster if I add -p 10-20 (for example), to only get the text of some pages and see if they contains text or garbage.



So many tools, and so little time...

Last edited by rfog; 07-19-2020 at 03:47 AM.
rfog is offline   Reply With Quote