Although being an old hand in providing ebooks to PG via Distributed Proofreaders, I only recently got into converting and reading them on an actual ebook reader (Sony PRS-505). Naturally, I want to look the books the best they can on the reader, so I mused a bit about doing some additional processing when feeding the HTML files through Calibre.
My overall plan is to write a Calibre plugin that takes the HTML input on importing and does some very basic, regular-expression based substitution on the file so some Browser specific tags can be kicked out, and possibly some CSS-constructs not supported by epub may be replaced. I have seen similar ideas floating around on this forum, but nobody (yet) seems to have tried to address it in the form of a Calibre plugin.
My results (about three hours of playing around with writing plugins) have been mixed. Here are my findings so far (maybe the first three should go into the documentation):
1) The plugin name
must end with
_plugin.py, otherwise Calibre won't find it in the zip container.
2) When writing plugins on a windows machine I had to be sure to save the .py file in UNIX format. Windows/DOS format will
not work, when adding such a plugin to Calibre it will choke right on line 1 with a weird error.
3) The Hello World example differs between the example given on the web page and the one available as a file download (from the same page):
example page: set_metadata(file, mi, ext)
downloadable plugin: set_metadata(file, ext, mi)
Note the differing argument order (my Python knowledge is still limited, and I know that you can have a random order of arguments when naming them, however I don't think this applies here).
4) I managed to install the HelloWorld plugin in Calibre, but it didn't seem to do anything for me (Both versions, Calibre run under Vista, using German localization)
5) Why does the example use a FileTypePlugin when it actually modifies the Metadata? Shouldn't it rather be a Metadata Plugin?
6) I tried to write a simple plugin which simply replaces the word "Hello" by "World". I can nicely install it in Calibre, alas, it won't do anything. Is my approach completely wrong (do I have to do some XML tree processing, do I have to do something special to get to the content, is the temporary_file() method used correctly?):
Code:
import os, re
from calibre.customize import FileTypePlugin
class CleanupLitPlugin(FileTypePlugin):
name = 'Regular Expression plugin' # Name of the plugin
description = 'Apply Regular Expression to input'
supported_platforms = ['windows', 'osx', 'linux'] # Platforms this plugin will run on
author = 'Markus Brenner' # The author of this plugin
version = (1, 0, 0) # The version number of this plugin
file_types = set(['html']) # The file types that this plugin will be applied to
on_import = True
def run(self, path_to_ebook):
file = open(path_to_ebook, 'r+b')
outfile = temporary_file("mab")
for line in file:
output = re.sub(r'Hello',r'World',line)
outfile.write(output)
return outfile.name
7) Is there a way to debug Calibre plugins, like writing some debugging text to a console when the run() method is called?
Any hints what I did wrong would be very welcome! (And I have a feeling other people would benefit, too).
Thanks,
-markus