03-08-2023, 11:54 AM | #1 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
FTS for UTF-8 Text Formats in Non-TXT File Extensions
Kovid,
I would like FTS to index RIS "Citation" file formats, which are simply UTF-8 Text in a Non-TXT File Extension/format in Calibre, but that have a rigidly defined content structure via RIS Tags (e.g. AB - "Abstract") for which there is a great deal of functionality in Calibre now via 3 different plugins. I have not been able to determine any method to do so other than writing a new InputFormatPlugin (and a new OutputFormatPlugin) to "convert" UTF-8 Text RIS into UTF-8 Text TXT, saving a .ris rather than a .txt. Is there another way I haven't found, such as an internal dict in an already-existing Calibre module that relates an input format to its equivalent input and output format which could be updated by an existing plugin, such as: {'RIS':'TXT'} ? Or by MIME Type? {'RIS':'application/x-research-info-systems'} and {'application/x-research-info-systems':'text/plain'}. Or should I write 2 new conversion plugins just to do the above? Thanks for your help. DaltonST Last edited by DaltonST; 03-08-2023 at 12:35 PM. Reason: MIME Type |
03-08-2023, 10:40 PM | #2 |
creator of calibre
Posts: 43,966
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You need an input format plugin not an output one. FTS requires a way to normalize a book to a common HTML format which it can scan.
|
Advert | |
|
03-10-2023, 01:11 PM | #3 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
Deleted.
Last edited by DaltonST; 03-10-2023 at 02:59 PM. Reason: resolved |
03-10-2023, 01:45 PM | #4 | |
Grand Sorcerer
Posts: 11,770
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
|
|
03-10-2023, 02:59 PM | #5 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
Class must be in the __init__.py and not in its own .py file
Chaley,
Thanks. As the 'Language Cleaner' demonstrates, the Class must be in the __init__.py and not in its own .py file. No wonder the "built-ins" were no help; they are not in a .zip file for me to autopsy. DaltonST |
Advert | |
|
03-12-2023, 05:24 PM | #6 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
Input conversion plugin: how to get it to run in debug mode and output its log
Debugging works easily and reliably in both classes of plugins I have written, GUI and FileType. Except, now, an input conversion plugin. I cannot get a single debugging line out.
The Documentation for its particular plugin Class says "use the log". I have, in so many ways, and nothing ever happens. After executing calibre-customize -b as normal, I then have tried three ways that I know of to run in debug mode: calibre-debug -g calibre-debug --gui-debug="C:\Users\xxx\Desktop\debug.log" calibre.exe, then restart in debug mode using the menu option to do so. None has ever worked, except that it shows that the FTS scanner fails because its (FTS') input conversion plugin that I am trying to develop and test is failing. Given that I have never seen a single debug line, or any other line, printed or logged successfully, I would be shocked if it were successful. I have tried using the 'log' argument passed from whomever is calling the plugin that I cannot debug, including: log.debug; log.message; and log.error. Results: Nada. Nichts. Nothing. I tried using my own Python logger via logging.getLogger(), etc., but still nothing. I tried writing out the accumulated debugging text to a .txt file using fd.write(), but nothing appeared in its destination folder, and since I have zero debugging output (actually, any output of any type), I have no clue as to why. And I never will without debugging. So, might someone please graciously tell me how to output a debugging log from an input conversion plugin? Thank you. DaltonST Last edited by DaltonST; 03-12-2023 at 05:27 PM. |
03-12-2023, 07:24 PM | #7 |
Grand Sorcerer
Posts: 11,770
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Using this trivial plugin:
Code:
from calibre.customize.conversion import InputFormatPlugin class TestInput(InputFormatPlugin): name = "TestIt Input" author = "me" version = (1, 0, 0) minimum_calibre_version = (2, 0, 0) supported_platforms = ["windows", "osx", "linux"] description = "Test logging" file_types = {"foo",} def __init__(self, *args, **kwargs): InputFormatPlugin.__init__(self, *args, **kwargs) print('here in TestInput') def convert(self, stream, options, file_ext, log, accelerators): print('Hi There printed') log.error("Hi There tp log.error") raise ValueError('my exception') Spoiler:
Both the print and the log.error() messages are there. The call on import by FTS to get the text does seem to throw away the logs, but you can debug the plugin using convert. |
03-12-2023, 07:40 PM | #8 |
Deviser
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
|
You solved it. The FTS Scanner is eating the log and print.
Running a conversion works. The online documentation should have a "how to debug" section for input conversion plugins to avoid future wasted effort and time figuring it out. Thank you, Chaley. DaltonST Last edited by DaltonST; 03-12-2023 at 10:34 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Chinese UTF-8 TXT file into ePub?? | C.Jones81 | Calibre | 4 | 12-05-2010 06:32 AM |
comic.txt UTF-8 | kookiie | Sony Reader | 0 | 11-15-2010 10:21 AM |
comic.txt UTF-8 | kookiie | Calibre | 0 | 11-15-2010 10:16 AM |
Kobo Read - Can't delete Text file - txt | fglaysher | Calibre | 0 | 08-15-2010 06:08 PM |
File Size Limits for TXT files or other formats? | flamaest | Sony Reader | 3 | 08-21-2007 07:37 PM |