Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-29-2023, 11:33 AM   #1
Slash
Junior Member
Slash began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
Unhappy PDF to EPUB don't keep OCR

Hello,
I am using the last version of Calibre and I have a PDF with an OCR so I can search text in it.

When converting to Epub, the OCR disappeared.

How can I kepp my PDF OCR in the EPUB ?

Thank you
Slash is offline   Reply With Quote
Old 10-29-2023, 12:26 PM   #2
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Extract the OCR. There are separate tools depending on OS.

Also don't convert PDFs at all, except by OCR or scraping OCR into a wordprocessor.

PDFs are an end use format to print or print preview.
Quoth is offline   Reply With Quote
Old 11-03-2023, 03:35 AM   #3
Slash
Junior Member
Slash began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
Hello thank you for your message, the thing is that I have scanned a book into multiples jpeg and I would like to convert to epub and pdf.

Should I convert to epub first? Because the tools I'm using with ocr recognition is for pdf only, I didn't find the way to make an ocr from my original jpeg files in Calibre
Slash is offline   Reply With Quote
Old 11-03-2023, 10:44 AM   #4
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 387
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
Quote:
Originally Posted by Slash View Post
the tools I'm using with ocr recognition is for pdf only, I didn't find the way to make an ocr from my original jpeg files in Calibre
If you are using Linux, a cool way to do OCR on jpeg images is OCRFeeder as a front end for Tesseract. It gives you fine control, handles paragraphing and end-of-line hyphens very well. Lets you do double-column and other ugly things.

If you have a pdf with OCR text in it, Calibre will use the pdftohtml tool to extract the text. Sometimes this does not work, for some reason, so try using the pdftotext tool outside Calibre. That will give you a text file, but you are on your own for paragraphing and formatting...as always with pdf.

Anything OCR'd needs proofing and editing, an that is usually the hardest part of the project.
retiredbiker is offline   Reply With Quote
Old 11-04-2023, 11:37 AM   #5
Slash
Junior Member
Slash began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
unfortunately i am using windows
Slash is offline   Reply With Quote
Old 11-04-2023, 02:37 PM   #6
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
https://tesseract-ocr.github.io/tess...-3rdParty.html
Sarmat89 is offline   Reply With Quote
Old 11-04-2023, 02:59 PM   #7
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,103
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by Slash View Post
Hello thank you for your message, the thing is that I have scanned a book into multiples jpeg and I would like to convert to epub and pdf.

Should I convert to epub first? Because the tools I'm using with ocr recognition is for pdf only, I didn't find the way to make an ocr from my original jpeg files in Calibre
There is a thread discussing ocr conversions which is probably worth a read.
Here is my post in that thread that details my workflow and links to software you need...
https://www.mobileread.com/forums/sh...93#post4341993
Karellen is offline   Reply With Quote
Old 11-04-2023, 03:44 PM   #8
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Quote:
Originally Posted by Slash View Post
unfortunately i am using windows
Well, tesseract can be run on windows (I'm sure I did years ago), also Linux is free and can even be run from a USB stick.
Quoth is offline   Reply With Quote
Reply

Tags
epub, ocr


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
OCR'd PDF to EPUB/TXT/etc. not copying text over (text under image). Tenome Conversion 1 10-24-2022 10:17 AM
epub 2 PDF conversion with OCR in PDF possible? hobi2000 Conversion 2 03-25-2019 03:20 AM
PDF (with OCR) to ePub, is it possible to make a real ePub? foice Conversion 9 05-01-2018 06:34 AM
Best practice to OCR and convert PDF to text or html or epub crankypants ePub 15 12-14-2015 08:00 PM
Free (ADE-DRM ePub) Don't Look, Don't Touch, Don't Eat [Biology & Evo Psych & Anthro] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 1 05-31-2015 06:41 AM


All times are GMT -4. The time now is 10:13 AM.


MobileRead.com is a privately owned, operated and funded community.