![]() |
#1 |
Member
![]() ![]() ![]() Posts: 23
Karma: 208
Join Date: Oct 2014
Device: Nexus5, Nexus9
|
Tool to rewrite a PDF as new text after OCR
I have some old scanned CS papers in PDF format (I don't have an original printed copy) and the quality isn't great. I can upload them into Google Docs (after turning on translate into native google docs format option) and it will perform OCR but it's obviously adding this as a separate layer in the PDF as the text quality remains unchanged.
Is there any Open Source (or freeware; Windows or Linux) tool that can take this OCR layer and generate a new PDF (or better; a Word/Open Office/LaTeX doc) with the same general layout that I can then hand edit (to clean up the conversion errors)? EDIT: I tried to use this online site (http://www.free-online-ocr.com/) and results were horrible though it was clearly trying to do what I asked above. http://www.onlineocr.net/ was better but is limited to one page at a time unless I register. I'd prefer something that I can run locally vs uploading to the web. Last edited by crazybrit; 05-27-2015 at 03:55 PM. |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
In this particular matter you do get what you pay for. ABBYY Finereader (commercial paid software) comes recommended by all the members here who work with OCR.
You can try Google's tesseract engine, with various frontends. It is regarded as the best FOSS OCR, but the prevailing opinion is Finereader is worth the cost. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
no text extraction for pdf with images and OCR | fxp33 | Conversion | 7 | 12-15-2015 07:22 AM |
Free PDF to text OCR Converter | Thasaidon | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 04-02-2012 11:58 AM |
Scanned text pdf with OCR but graphical layer instead vectorial | whopper | 2 | 09-10-2011 06:32 PM | |
PDF Image -> OCR -> text | frikk | Workshop | 9 | 07-08-2009 07:21 PM |