Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-10-2022, 03:10 PM   #1
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
Question Does fulltext indexing include DjVus?

Does fulltext indexing include the OCR layers of DjVu files? If not, is there a plugin to allow this support?
Geremia is offline   Reply With Quote
Old 12-10-2022, 03:16 PM   #2
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
It appears to. Searching for
Code:
formats:#=1 and (format:djvu or format:djv)
and doing a fulltext search with "🗹 Restrict searched books" checked yielded results.

So, Calibre seems to have some DjVu support. It seems the DJVU Input plugin (DjVu→HTML converter) was used during indexing.
Geremia is offline   Reply With Quote
Advert
Old 12-10-2022, 09:51 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
if the djv has a text layer then it is used. Any format that has an input plugin will be indexed via that input plugin.
kovidgoyal is offline   Reply With Quote
Old 04-17-2023, 05:14 PM   #4
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
Question Get Calibre to index PDFs with non-standard suffixes (e.g., pdf_original, etc.)?

How do I get it to index PDFs with extensions besides PDF (e.g., PDF_ORIGINAL etc.)?
Geremia is offline   Reply With Quote
Old 04-17-2023, 10:34 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC it indexes *_ORIGINAL automatically although since those a created by doing format to same format conversions, the text content should usually be identical.
kovidgoyal is offline   Reply With Quote
Advert
Old 04-18-2023, 12:09 AM   #6
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
Question

Quote:
Originally Posted by kovidgoyal View Post
IIRC it indexes *_ORIGINAL automatically
"Set the cover for the book from the selected format" doesn't work on *.pdf_original files, though.

How do I make it index PDF files with custom suffixes?

Last edited by Geremia; 04-18-2023 at 12:29 AM.
Geremia is offline   Reply With Quote
Old 04-18-2023, 12:47 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Geremia View Post
How do I make it index PDF files with custom suffixes?
Write your input plugin for such files that calls the builtin pdf input plugin under the hood.
kovidgoyal is offline   Reply With Quote
Old 04-21-2023, 10:52 PM   #8
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
ChatGPT answer to kovidgoyal's instructions:
Quote:
To create an input plugin for Calibre that calls the built-in PDF input plugin under the hood, you can follow these steps:
  1. Create a new folder in your Calibre plugins directory (usually located in the Calibre configuration folder) and name it something like “mypdfplugin”.
  2. Inside the “mypdfplugin” folder, create a new file named “__init__.py” (note the double underscores before and after “init”). This file will contain the code for your plugin.
  3. In the "__init__.py” file, you can import the built-in PDF input plugin like this:
    Code:
    from calibre_plugins.pdf_input import PDFInput
  4. You can then define a new class that inherits from the PDFInput class and overrides any methods that you need to modify. For example, if you want to change the way the plugin handles the “description” metadata field, you could define a new “get_metadata” method like this:
    Code:
    class MyPDFInput(PDFInput):
        def get_metadata(self, stream, filename=None, title=None, author=None, description=None):
            # Call the original get_metadata method from the PDFInput class
            metadata = super().get_metadata(stream, filename, title, author, description)
            
            # Modify the description field
            metadata['description'] = "My custom description"
            
            # Return the modified metadata
            return metadata
  5. Finally, you can register your plugin by creating a new file named “plugin.py” (again, in the “mypdfplugin” folder) and adding the following code:
    Code:
    from calibre.customize import FileTypePlugin
    
    class MyPDFPlugin(FileTypePlugin):
        name = 'My PDF Plugin'
        description = 'My custom PDF plugin'
        file_types = set(['pdf'])
        can_split = False
        can_convert = False
        input_plugin = 'mypdfplugin.MyPDFInput'
    
    # Register the plugin
    FileTypePlugin.register_plugin('pdf', MyPDFPlugin)
    This code defines a new FileTypePlugin that registers your PDF input plugin under the name “My PDF Plugin”. The “file_types” set specifies which file types your plugin can handle (in this case, just PDF files), and the “input_plugin” parameter specifies the name of your custom PDFInput subclass.
And that’s it! After restarting Calibre, your plugin should be available to use for any PDF files you add to your library.

Last edited by Geremia; 04-21-2023 at 11:41 PM.
Geremia is offline   Reply With Quote
Old 04-21-2023, 11:54 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You would need to subclass conversion input plugins not filetype plugins.
kovidgoyal is offline   Reply With Quote
Old 04-22-2023, 12:42 AM   #10
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
How's this?
Quote:
If you need to subclass a conversion input plugin instead of a file type plugin, the process is similar, but there are a few key differences.
  1. Create a new folder in your Calibre plugins directory (usually located in the Calibre configuration folder) and name it something like “myinputplugin”.
  2. Inside the “myinputplugin” folder, create a new file named “__init__.py” (note the double underscores before and after “init”). This file will contain the code for your plugin.
  3. In the “__init__.py” file, you can import the conversion input plugin you want to subclass like this:
    Code:
    from calibre.customize.conversion import InputFormatPlugin
    from calibre_plugins.pdf_input import PDFInput
    In this example, we’re importing the “InputFormatPlugin” class from the “calibre.customize.conversion” module, as well as the built-in PDF input plugin from the “calibre_plugins.pdf_input” module.
  4. You can then define a new class that inherits from the input plugin you want to subclass, and override any methods you need to modify. For example, to modify the way the PDF input plugin handles metadata, you could define a new “get_metadata” method like this:
    Code:
    class MyPDFInput(PDFInput):
        def get_metadata(self, stream, filename=None, title=None, author=None, description=None):
            # Call the original get_metadata method from the PDFInput class
            metadata = super().get_metadata(stream, filename, title, author, description)
            
            # Modify the description field
            metadata['description'] = "My custom description"
            
            # Return the modified metadata
            return metadata
    Note that in this example, we’re creating a new class called “MyPDFInput” that inherits from the “PDFInput” class.
  5. Finally, you can register your plugin by creating a new file named “plugin.py” (again, in the “myinputplugin” folder) and adding the following code:
    Code:
    from calibre.customize import InputFormatPlugin
    
    class MyPDFPlugin(InputFormatPlugin):
        name = 'My PDF Plugin'
        description = 'My custom PDF plugin'
        supported_extensions = set(['pdf'])
        input_plugin = 'myinputplugin.MyPDFInput'
    
    # Register the plugin
    InputFormatPlugin.register_plugin('pdf', MyPDFPlugin)
    This code defines a new InputFormatPlugin that registers your custom PDFInput subclass under the name “My PDF Plugin”. The “supported_extensions” set specifies which file extensions your plugin can handle (in this case, just PDF files), and the “input_plugin” parameter specifies the name of your custom PDFInput subclass.
And that’s it! After restarting Calibre, your plugin should be available to use for any PDF files you convert in the conversion dialog.
Geremia is offline   Reply With Quote
Old 04-22-2023, 12:43 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
its wrong in several ways. I'm afraid GPT isnt gonna be able to do this for you.
kovidgoyal is offline   Reply With Quote
Old 04-22-2023, 02:56 PM   #12
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
Quote:
Originally Posted by kovidgoyal View Post
You would need to subclass conversion input plugins not filetype plugins.
Why conversion? I want the indexer to index already-imported PDFs with non-standard filename suffixes.
Geremia is offline   Reply With Quote
Old 04-23-2023, 12:33 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The indexer works on text, there has to be some way to get text from an unknown format. Ergo conversion *input* plugins.
kovidgoyal is offline   Reply With Quote
Old 04-23-2023, 12:55 AM   #14
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 235
Karma: 100000
Join Date: Oct 2012
Device: Calibre
It seems something like this would work
Code:
from calibre.ebooks.conversion.plugins.pdf_input import PDFInput

class MyPDFInput(PDFInput):
    name = "MyPDFInput"
    description = "Adds support for additional file extensions to PDF input"
    supported_extensions = ['pdf_original', 'pdf_answers', 'pdf_booklet', 'pdf_oa', 'pdf_ocr', 'pdf_old', 'pdf_original', 'pdf_tesseract']
But I don't know how to test it.
Geremia is offline   Reply With Quote
Old 04-23-2023, 02:04 AM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
convert a file with tthat extension
kovidgoyal is offline   Reply With Quote
Reply

Tags
calibre, djvu, fulltext, indexing


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best Choice for PDF/Djvus under 120$ ehesvagyokmar Which one should I buy? 1 05-17-2017 01:47 AM
Fulltext Search on M92 Valec Onyx Boox 1 05-21-2012 05:14 AM
solution for reading .pdfs, .djvus and .tiffs? pan.sapiens Which one should I buy? 5 04-21-2012 04:26 PM
fulltext search guly Calibre 2 01-10-2010 06:49 AM


All times are GMT -4. The time now is 04:08 PM.


MobileRead.com is a privately owned, operated and funded community.