|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Junior Member
![]() Posts: 2
Karma: 10
Join Date: May 2026
Device: PC
|
How can I convert a scanned PDF into a searchable/text-based PDF in Calibre?
Hello everyone,
I have a scanned PDF file where the pages are basically images, and I want to convert it into a searchable/text-based PDF with selectable text. Is there a way to do this using Calibre, or do I need an OCR plugin/tool alongside it? If possible, could someone explain the best method or workflow? |
|
|
|
|
|
#2 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,856
Karma: 9600930
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
|
You can't. You need to OCR the document. You end up with text and then it is up to you what you do with it next - create an epub or a pdf.
Like all OCR, it is not perfect, so a proof read and formatting will be required. The method I use is detailed here... https://www.mobileread.com/forums/sh...3&postcount=23 |
|
|
|
|
|
#3 |
|
JCL Punch-Card Collector
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 109
Karma: 606560
Join Date: Jun 2014
Location: Antarctica
Device: Aggressively Device Independent
|
As an alternative to Karellen's method, I've had good success using PDF24, a free alternative for Windows systems at
https://tools.pdf24.org/en/ocr-pdf There is both an online version (highly dependent upon how fast your connection is) and an installable program. On the good side, it's pretty decent with math symbols and superscripts and subscripts (fairly important in chemistry, for example); however, it's also quite font-dependent for non-Roman/Greek characters — it doesn't handle san serif well, and results are a lot better if the exact font used in the source is already installed (and active) in the \windows\font folder. Unfortunately, that means "exact" as in "same vendor, same character set" — "Minion Pro" is not the same as "Minion", even though they're from the same vendor. That said, it's really essential if you're trying to get decent, reflowable, selectable-text output (whether as PDF or in another format) to use a specialist program. Calibre is many things, but not everything... |
|
|
|
|
|
#4 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 70
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
OCRthisPDF (not yet published(
I have been using a plugin for this purpose for a while now—one that I haven't released yet because I still want to add a proofreading step. It is based on OCRmyPDF, which, in turn, relies on Tesseract for the OCR component. If anyone wants to try this out, I could release it even without the proofreading feature (which utilizes the hOCR files generated by Tesseract). For high-quality scans, the results are already excellent out-of-the-box.
|
|
|
|
|
|
#5 |
|
Weirdo
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,124
Karma: 12503116
Join Date: Nov 2019
Location: Wuppertal, Germany
Device: Kobo Sage, Kobo Libra 2, reMarkable PaperPro
|
I’d love to have that as one of my projects is to scan the original German translation for the A Song of Ice and Fire books.
|
|
|
|
|
|
#6 |
|
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() Posts: 70
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet with Moon Reader and ReadEra Apps.
|
OCRthisPDF
OCR plugin is released: https://www.mobileread.com/forums/sh...76#post4590976
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Convert from Text(Markdown) to PDF: Can not copy text in the PDF | hellomichibye | Conversion | 2 | 08-26-2019 12:29 PM |
| Non-searchable text and Scroll-bar inside the converted PDF output, unusual? Why? | Amortization | Calibre | 6 | 05-07-2017 01:53 AM |
| Convert epub to pdf, with notes with main text in the pdf? | 8140david | ePub | 1 | 06-18-2015 01:13 PM |
| Urgent: How To Convert Wikibook PDF Into a Searchable Index? | deerayolia | Kindle Formats | 4 | 05-28-2012 06:52 AM |