Quote:
Originally Posted by willus
Actually, that PDF is very good for processing. It's perfectly straight, consistent from page to page, and very clean. First, I recommend downloading the Greek Tesseract OCR data set and installing it per these instructions. Then you can run one of the following commands depending how large you want the text.
1. Separate each page into two but don't do any text re-flow.
k2pdfopt -grid 2x1 -n- -ocr t -lang grc source.pdf
2. Same but with text re-flow
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc source.pdf
3. Even larger text (50% larger with -mag 1.5)
k2pdfopt -grid 2x1 -fc- -n- -f2p 0 -wrap -ocr t -lang grc -mag 1.5 source.pdf
I've attached the results of these three methods for just page 5 of your PDF. You'll notice the text is selectable and searchable, unlike your original.
|
Thank you so much, Willus. If I select to crop the borders, what I have to do? Because in this pdf file the borders are not exactly in the same position.