12-10-2022, 03:10 PM | #1 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
Does fulltext indexing include DjVus?
Does fulltext indexing include the OCR layers of DjVu files? If not, is there a plugin to allow this support?
|
12-10-2022, 03:16 PM | #2 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
It appears to. Searching for
Code:
formats:#=1 and (format:djvu or format:djv) So, Calibre seems to have some DjVu support. It seems the DJVU Input plugin (DjVu→HTML converter) was used during indexing. |
Advert | |
|
12-10-2022, 09:51 PM | #3 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
if the djv has a text layer then it is used. Any format that has an input plugin will be indexed via that input plugin.
|
04-17-2023, 05:14 PM | #4 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
Get Calibre to index PDFs with non-standard suffixes (e.g., pdf_original, etc.)?
How do I get it to index PDFs with extensions besides PDF (e.g., PDF_ORIGINAL etc.)?
|
04-17-2023, 10:34 PM | #5 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
IIRC it indexes *_ORIGINAL automatically although since those a created by doing format to same format conversions, the text content should usually be identical.
|
Advert | |
|
04-18-2023, 12:09 AM | #6 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
"Set the cover for the book from the selected format" doesn't work on *.pdf_original files, though.
How do I make it index PDF files with custom suffixes? Last edited by Geremia; 04-18-2023 at 12:29 AM. |
04-18-2023, 12:47 AM | #7 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
04-21-2023, 10:52 PM | #8 | |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
ChatGPT answer to kovidgoyal's instructions:
Quote:
Last edited by Geremia; 04-21-2023 at 11:41 PM. |
|
04-21-2023, 11:54 PM | #9 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You would need to subclass conversion input plugins not filetype plugins.
|
04-22-2023, 12:42 AM | #10 | |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
How's this?
Quote:
|
|
04-22-2023, 12:43 AM | #11 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
its wrong in several ways. I'm afraid GPT isnt gonna be able to do this for you.
|
04-22-2023, 02:56 PM | #12 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
|
04-23-2023, 12:33 AM | #13 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The indexer works on text, there has to be some way to get text from an unknown format. Ergo conversion *input* plugins.
|
04-23-2023, 12:55 AM | #14 |
Addict
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
It seems something like this would work
Code:
from calibre.ebooks.conversion.plugins.pdf_input import PDFInput class MyPDFInput(PDFInput): name = "MyPDFInput" description = "Adds support for additional file extensions to PDF input" supported_extensions = ['pdf_original', 'pdf_answers', 'pdf_booklet', 'pdf_oa', 'pdf_ocr', 'pdf_old', 'pdf_original', 'pdf_tesseract'] |
04-23-2023, 02:04 AM | #15 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
convert a file with tthat extension
|
Tags |
calibre, djvu, fulltext, indexing |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best Choice for PDF/Djvus under 120$ | ehesvagyokmar | Which one should I buy? | 1 | 05-17-2017 01:47 AM |
Fulltext Search on M92 | Valec | Onyx Boox | 1 | 05-21-2012 05:14 AM |
solution for reading .pdfs, .djvus and .tiffs? | pan.sapiens | Which one should I buy? | 5 | 04-21-2012 04:26 PM |
fulltext search | guly | Calibre | 2 | 01-10-2010 06:49 AM |