View Single Post
Old 09-07-2018, 08:14 PM   #74
sealbeater
Banned
sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.
 
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
Quote:
Originally Posted by DuckieTigger View Post
The mistake is the assumption that the pdf you want to convert comes with text inside, only because most do. The correct approach would be to always start full page images, then run them with OCR, then afterwards extract the text from PDF to improve the OCR results. Universal script with overall best results - as soon as a step fails you are done with the best automatic result.

No assumptions being made, pdfs are either one or the other and I don't disagree, you would have to do a 2 stage run on the pdf to get the best automatic result. However, I've never seen a pdf that had both.
sealbeater is offline   Reply With Quote