Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 04-21-2023, 09:42 PM   #1
manotroll
Junior Member
manotroll began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2020
Device: Kindle Paperwhite 2021
converter for OCR from images

I have some books that the publisher in my country does not sell in digital format and I would like to convert them to use on kindle
I have the advantage of increasing the size of the text I can read at night with its own light and the book is 98% text.
but I did not find a way to do this in caliber does it support this feature?
manotroll is offline   Reply With Quote
Old 04-21-2023, 10:03 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No calibre does not have any OCR capabilities.
kovidgoyal is offline   Reply With Quote
Advert
Old 04-22-2023, 09:13 AM   #3
manotroll
Junior Member
manotroll began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2020
Device: Kindle Paperwhite 2021
and there is no possibility of having this support in the future?
manotroll is offline   Reply With Quote
Old 04-22-2023, 10:01 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre is open source anyone can contribute code to it. I will say this is not on my horizon.
kovidgoyal is offline   Reply With Quote
Old 04-28-2023, 02:01 AM   #5
feuille
Connoisseur
feuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enough
 
Posts: 52
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet
Quote:
Originally Posted by manotroll View Post
and there is no possibility of having this support in the future?
I'm currently working on a GUI plugin that adds a text layer to the PDF format for a selected book. I will publish it in the next few days.
feuille is offline   Reply With Quote
Advert
Old 04-28-2023, 02:20 PM   #6
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
However I think text from PDF images via OCR is a workflow best done before the final version is added to the Library. It's not conversion in the sense mobi, azw3, epub etc to each other are. It needs human proofing and editing.
Quoth is offline   Reply With Quote
Old 04-29-2023, 04:07 AM   #7
feuille
Connoisseur
feuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enough
 
Posts: 52
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet
Quote:
Originally Posted by Quoth View Post
However I think text from PDF images via OCR is a workflow best done before the final version is added to the Library.
Agreed when it comes to creating a new format from a scan. I do it so, too.
In my use case, however, it is about adding a text layer to a PDF whose layout should not be changed, for example to enable full-text search (FTS) and text extraction.
In my experience, with a good scan and the correct configuration of the OCR software (Tesseract), the recognition errors are relatively small and hardly affect the FTS. If you copy a piece of text for a quote, you can easily check it against the original layout. Also, before writing the text layer, I plan to offer the scan result in a text editor, with the possibility of proofreading.
feuille is offline   Reply With Quote
Old 04-29-2023, 05:49 AM   #8
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,158
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
My experience of unproofed OCR text layer is the Internet Archive. Indeed it's only any use for text search and not great for that.
Quoth is offline   Reply With Quote
Old 05-09-2023, 08:08 AM   #9
mvivar
Junior Member
mvivar began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2023
Device: kindle
I have recently converted a pdf non text just image into an epub. It took me a couple of hours to convert it into text.
There are some ocr softwares very accurate, but the one I had less errors was the one of google drive.
Make sure your pdf do not exceed 2 mb. You can split it.
Upload the files to google drive.
Right button, open with google docs.
And done.
Then you can download it as txt, or docx, etc... and convert it into epub.
Good luck!
mvivar is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Anyone used iPad for OCR, proofreading or editing book images? graycyn Apple Devices 6 09-11-2020 06:35 PM
no text extraction for pdf with images and OCR fxp33 Conversion 7 12-15-2015 07:22 AM
Can you OCR the images inside of .pdf files? klmmc13 Workshop 39 10-30-2014 08:07 PM
Free PDF to text OCR Converter Thasaidon Deals and Resources (No Self-Promotion or Affiliate Links) 1 04-02-2012 11:58 AM
free PDF to EPUB converter with images rmm1 Apple Devices 1 05-15-2010 12:43 AM


All times are GMT -4. The time now is 03:47 PM.


MobileRead.com is a privately owned, operated and funded community.