Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 05-27-2015, 03:27 PM   #1
crazybrit
Member
crazybrit doesn't littercrazybrit doesn't littercrazybrit doesn't litter
 
Posts: 23
Karma: 208
Join Date: Oct 2014
Device: Nexus5, Nexus9
Tool to rewrite a PDF as new text after OCR

I have some old scanned CS papers in PDF format (I don't have an original printed copy) and the quality isn't great. I can upload them into Google Docs (after turning on translate into native google docs format option) and it will perform OCR but it's obviously adding this as a separate layer in the PDF as the text quality remains unchanged.

Is there any Open Source (or freeware; Windows or Linux) tool that can take this OCR layer and generate a new PDF (or better; a Word/Open Office/LaTeX doc) with the same general layout that I can then hand edit (to clean up the conversion errors)?

EDIT: I tried to use this online site (http://www.free-online-ocr.com/) and results were horrible though it was clearly trying to do what I asked above. http://www.onlineocr.net/ was better but is limited to one page at a time unless I register. I'd prefer something that I can run locally vs uploading to the web.

Last edited by crazybrit; 05-27-2015 at 03:55 PM.
crazybrit is offline   Reply With Quote
Old 06-10-2015, 02:22 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
In this particular matter you do get what you pay for. ABBYY Finereader (commercial paid software) comes recommended by all the members here who work with OCR.

You can try Google's tesseract engine, with various frontends. It is regarded as the best FOSS OCR, but the prevailing opinion is Finereader is worth the cost.
eschwartz is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
no text extraction for pdf with images and OCR fxp33 Conversion 7 12-15-2015 07:22 AM
Free PDF to text OCR Converter Thasaidon Deals and Resources (No Self-Promotion or Affiliate Links) 1 04-02-2012 11:58 AM
Scanned text pdf with OCR but graphical layer instead vectorial whopper PDF 2 09-10-2011 06:32 PM
PDF Image -> OCR -> text frikk Workshop 9 07-08-2009 07:21 PM


All times are GMT -4. The time now is 07:44 AM.


MobileRead.com is a privately owned, operated and funded community.