Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-24-2013, 01:59 AM   #1
noisy
Member
noisy began at the beginning.
 
Posts: 22
Karma: 12
Join Date: Oct 2011
Device: kindle 3
PDF with OCR to MOBI

There are a lot of documents in PDF formats, which contains scans of very old documents. Part of them also contains OCR layer, like in this document: http://polona.pl/archive_prod?uid=1095122&cid=1095117

I have tried convert it to mobi in Calibre, however I got mobi file only with scans, without any text, which can be get from ocr layer.

Is there any way to pull out ocr text from this PDF and convert only this text to mobi?
noisy is offline   Reply With Quote
Old 06-24-2013, 10:45 AM   #2
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
Someone more knowledgeable than myself needs to comment on this issue as I don't really use calibre for conversions.

However, I don't think you're going to find this a simple solution. Or one that, with the right settings, can be handled just in calibre. I think you're going to have to strip out the text, use ebook creation software to take the text in and, with extra mark-up and effort, create a new ebook based on the old PDF material. This will be a complex project assuming you want to do it correctly rather than quickly. Particularly if this is meant for long term usage and data retention.

But I would keep the PDFs and the resulting ebook as well. Just in case.
Sabardeyn is offline   Reply With Quote
Old 06-24-2013, 06:14 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Have a look at this thread https://www.mobileread.com/forums/sho...d.php?t=212056

BR
BetterRed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
no text extraction for pdf with images and OCR fxp33 Conversion 7 12-15-2015 07:22 AM
Free PDF to text OCR Converter Thasaidon Deals and Resources (No Self-Promotion or Affiliate Links) 1 04-02-2012 11:58 AM
remove OCR from a PDF? soondai PDF 9 10-08-2011 12:42 PM
Google Adds OCR for PDF Files kjk News 0 06-22-2010 02:27 PM
PDF Image -> OCR -> text frikk Workshop 9 07-08-2009 07:21 PM


All times are GMT -4. The time now is 09:28 AM.


MobileRead.com is a privately owned, operated and funded community.