10-29-2023, 11:33 AM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
|
PDF to EPUB don't keep OCR
Hello,
I am using the last version of Calibre and I have a PDF with an OCR so I can search text in it. When converting to Epub, the OCR disappeared. How can I kepp my PDF OCR in the EPUB ? Thank you |
10-29-2023, 12:26 PM | #2 |
the rook, bossing Never.
Posts: 11,703
Karma: 87663461
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Extract the OCR. There are separate tools depending on OS.
Also don't convert PDFs at all, except by OCR or scraping OCR into a wordprocessor. PDFs are an end use format to print or print preview. |
Advert | |
|
11-03-2023, 03:35 AM | #3 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
|
Hello thank you for your message, the thing is that I have scanned a book into multiples jpeg and I would like to convert to epub and pdf.
Should I convert to epub first? Because the tools I'm using with ocr recognition is for pdf only, I didn't find the way to make an ocr from my original jpeg files in Calibre |
11-03-2023, 10:44 AM | #4 | |
Addict
Posts: 390
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
|
Quote:
If you have a pdf with OCR text in it, Calibre will use the pdftohtml tool to extract the text. Sometimes this does not work, for some reason, so try using the pdftotext tool outside Calibre. That will give you a text file, but you are on your own for paragraphing and formatting...as always with pdf. Anything OCR'd needs proofing and editing, an that is usually the hardest part of the project. |
|
11-04-2023, 11:37 AM | #5 |
Junior Member
Posts: 6
Karma: 10
Join Date: Feb 2015
Device: Kobo Aura HD
|
unfortunately i am using windows
|
Advert | |
|
11-04-2023, 02:37 PM | #6 |
Evangelist
Posts: 484
Karma: 2267928
Join Date: Nov 2015
Device: none
|
|
11-04-2023, 02:59 PM | #7 | |
Wizard
Posts: 1,171
Karma: 4949904
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
Quote:
Here is my post in that thread that details my workflow and links to software you need... https://www.mobileread.com/forums/sh...93#post4341993 |
|
11-04-2023, 03:44 PM | #8 |
the rook, bossing Never.
Posts: 11,703
Karma: 87663461
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
|
Tags |
epub, ocr |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
OCR'd PDF to EPUB/TXT/etc. not copying text over (text under image). | Tenome | Conversion | 1 | 10-24-2022 10:17 AM |
epub 2 PDF conversion with OCR in PDF possible? | hobi2000 | Conversion | 2 | 03-25-2019 03:20 AM |
PDF (with OCR) to ePub, is it possible to make a real ePub? | foice | Conversion | 9 | 05-01-2018 06:34 AM |
Best practice to OCR and convert PDF to text or html or epub | crankypants | ePub | 15 | 12-14-2015 08:00 PM |
Free (ADE-DRM ePub) Don't Look, Don't Touch, Don't Eat [Biology & Evo Psych & Anthro] | ATDrake | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 05-31-2015 06:41 AM |