View Single Post
Old 09-06-2018, 11:37 PM   #67
sealbeater
Banned
sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.
 
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
Quote:
Originally Posted by HarryT View Post
Generally a lot better than those which attempt to extract text from the PDF itself. Of course no OCR is perfect, and a proofing/editing run through the converted file is essential.
Have you much experience extracting txt from pdfs? I have ocr to not be as good as extracting text. As I already stated, most pdfs come in two flavors, images of txt and the actual txt itself. The actual txt itself is as good as the pdf source is. Going further, extracting to xml yields so far, the best results when it comes to preserving layout but I haven't played much with converting to Postscript..yet.
sealbeater is offline   Reply With Quote