|
|
#1 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
Does fulltext indexing include the OCR layers of DjVu files? If not, is there a plugin to allow this support?
|
|
|
|
|
|
#2 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
It appears to. Searching for
Code:
formats:#=1 and (format:djvu or format:djv) So, Calibre seems to have some DjVu support. It seems the DJVU Input plugin (DjVu→HTML converter) was used during indexing. |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
if the djv has a text layer then it is used. Any format that has an input plugin will be indexed via that input plugin.
|
|
|
|
|
|
#4 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
How do I get it to index PDFs with extensions besides PDF (e.g., PDF_ORIGINAL etc.)?
|
|
|
|
|
|
#5 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
IIRC it indexes *_ORIGINAL automatically although since those a created by doing format to same format conversions, the text content should usually be identical.
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
"Set the cover for the book from the selected format" doesn't work on *.pdf_original files, though.
How do I make it index PDF files with custom suffixes? Last edited by Geremia; 04-18-2023 at 01:29 AM. |
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
|
|
#8 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
ChatGPT answer to kovidgoyal's instructions:
Quote:
Last edited by Geremia; 04-22-2023 at 12:41 AM. |
|
|
|
|
|
|
#9 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You would need to subclass conversion input plugins not filetype plugins.
|
|
|
|
|
|
#10 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
How's this?
Quote:
|
|
|
|
|
|
|
#11 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
its wrong in several ways. I'm afraid GPT isnt gonna be able to do this for you.
|
|
|
|
|
|
#12 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
|
|
|
|
|
|
#13 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The indexer works on text, there has to be some way to get text from an unknown format. Ergo conversion *input* plugins.
|
|
|
|
|
|
#14 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 263
Karma: 100000
Join Date: Oct 2012
Device: Calibre
|
It seems something like this would work
Code:
from calibre.ebooks.conversion.plugins.pdf_input import PDFInput
class MyPDFInput(PDFInput):
name = "MyPDFInput"
description = "Adds support for additional file extensions to PDF input"
supported_extensions = ['pdf_original', 'pdf_answers', 'pdf_booklet', 'pdf_oa', 'pdf_ocr', 'pdf_old', 'pdf_original', 'pdf_tesseract']
|
|
|
|
|
|
#15 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,616
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
convert a file with tthat extension
|
|
|
|
![]() |
| Tags |
| calibre, djvu, fulltext, indexing |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Best Choice for PDF/Djvus under 120$ | ehesvagyokmar | Which one should I buy? | 1 | 05-17-2017 02:47 AM |
| Fulltext Search on M92 | Valec | Onyx Boox | 1 | 05-21-2012 06:14 AM |
| solution for reading .pdfs, .djvus and .tiffs? | pan.sapiens | Which one should I buy? | 5 | 04-21-2012 05:26 PM |
| fulltext search | guly | Calibre | 2 | 01-10-2010 07:49 AM |