Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-28-2026, 04:40 AM   #1
ehsangh70
Junior Member
ehsangh70 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: May 2026
Device: PC
How can I convert a scanned PDF into a searchable/text-based PDF in Calibre?

Hello everyone,

I have a scanned PDF file where the pages are basically images, and I want to convert it into a searchable/text-based PDF with selectable text.

Is there a way to do this using Calibre, or do I need an OCR plugin/tool alongside it?
If possible, could someone explain the best method or workflow?
ehsangh70 is offline   Reply With Quote
Old 05-28-2026, 05:13 AM   #2
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,856
Karma: 9600930
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
You can't. You need to OCR the document. You end up with text and then it is up to you what you do with it next - create an epub or a pdf.

Like all OCR, it is not perfect, so a proof read and formatting will be required.

The method I use is detailed here...
https://www.mobileread.com/forums/sh...3&postcount=23
Karellen is offline   Reply With Quote
Old 05-29-2026, 02:04 AM   #3
Jaws
JCL Punch-Card Collector
Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.Jaws ought to be getting tired of karma fortunes by now.
 
Posts: 109
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
As an alternative to Karellen's method, I've had good success using PDF24, a free alternative for Windows systems at

https://tools.pdf24.org/en/ocr-pdf

There is both an online version (highly dependent upon how fast your connection is) and an installable program. On the good side, it's pretty decent with math symbols and superscripts and subscripts (fairly important in chemistry, for example); however, it's also quite font-dependent for non-Roman/Greek characters — it doesn't handle san serif well, and results are a lot better if the exact font used in the source is already installed (and active) in the \windows\font folder. Unfortunately, that means "exact" as in "same vendor, same character set" — "Minion Pro" is not the same as "Minion", even though they're from the same vendor.

That said, it's really essential if you're trying to get decent, reflowable, selectable-text output (whether as PDF or in another format) to use a specialist program. Calibre is many things, but not everything...
Jaws is offline   Reply With Quote
Old 05-29-2026, 03:00 AM   #4
feuille
Connoisseur
feuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enough
 
Posts: 70
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
OCRthisPDF (not yet published(

I have been using a plugin for this purpose for a while now—one that I haven't released yet because I still want to add a proofreading step. It is based on OCRmyPDF, which, in turn, relies on Tesseract for the OCR component. If anyone wants to try this out, I could release it even without the proofreading feature (which utilizes the hOCR files generated by Tesseract). For high-quality scans, the results are already excellent out-of-the-box.
feuille is offline   Reply With Quote
Old 05-29-2026, 03:15 AM   #5
rantanplan
Weirdo
rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.rantanplan ought to be getting tired of karma fortunes by now.
 
Posts: 1,124
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
I’d love to have that as one of my projects is to scan the original German translation for the A Song of Ice and Fire books.
rantanplan is offline   Reply With Quote
Old 06-09-2026, 07:48 AM   #6
feuille
Connoisseur
feuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enough
 
Posts: 70
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
OCRthisPDF

OCR plugin is released: https://www.mobileread.com/forums/sh...76#post4590976
feuille is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert from Text(Markdown) to PDF: Can not copy text in the PDF hellomichibye Conversion 2 08-26-2019 12:29 PM
Non-searchable text and Scroll-bar inside the converted PDF output, unusual? Why? Amortization Calibre 6 05-07-2017 01:53 AM
Convert epub to pdf, with notes with main text in the pdf? 8140david ePub 1 06-18-2015 01:13 PM
Urgent: How To Convert Wikibook PDF Into a Searchable Index? deerayolia Kindle Formats 4 05-28-2012 06:52 AM


All times are GMT -4. The time now is 06:40 PM.


MobileRead.com is a privately owned, operated and funded community.