MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Plugin guide (https://www.mobileread.com/forums/showthread.php?t=263081)

CalibUser 07-20-2015 10:52 AM

Plugin guide
 
4 Attachment(s)
Recently I was pleased to see that development on Sigil is continuing and that the latest version of Sigil accepts plugins.

When I convert scanned PDF files into ePubs, a number of formatting errors arise and I run a series of Find/Replace operations to correct some of these errors. As I sometimes forget to run some searches I decided to put some of these operations into a plugin. However, information about writing plugins for Sigil is sparse, so I have have produced a very brief guide (attached) on how to write plugins that incorporate regular expressions and hope this will be useful to others.

I am also attaching a plugin that corrects some formatting errors that are produced when converting PDFs to ePubs. I am new to Python so I would be interested in comments on my code - there may be better ways of coding the plugin.

I will continue to develop the plugin (eg I will be including code for dealing with erroneous line breaks when I get time).

KevinH 07-20-2015 12:20 PM

Hi,

Please see the Plugin Development thread for the documentation on writing plugins.

https://www.mobileread.com/forums/sho...d.php?t=251452

https://www.mobileread.com/forums/sho...14&postcount=4

Edits/fixes/changes to the official documentation are always welcome.

Simply use Sigil to edit and update the the Plugin Framework Documentation and increment the revision number and I would be happy to review and upload it to the main Sigil site and link to it from the Plugin Developer's thread.

Thanks,

KevinH

CalibUser 07-20-2015 03:16 PM

Hi KevinH,

I'm not sure whether my Guide will be suitable for you to include in the document Sigil_Plugin_Framework; I am learning about both Python and how to develop plugins for Sigil so my Guide is really a record of how I approach the development of a plugin. I will be including brief notes on Python code for Sigil Plugins as I learn about it [eg the function re.sub()] and I'm not sure that this would be appropriate for the document Sigil_Plugin_Framework. I thought that this record may be helpful to others.

Having said that, I am willing to provide my text if you want to adapt it or include it (fully or partly) in the Sigil_Plugin_Framework. Which format would you prefer this document to be sent?

KevinH 07-20-2015 03:25 PM

Hi,

If you have an epub version that would be wonderful.

Thanks,

KevinH

Quote:

Originally Posted by CalibUser (Post 3137232)
Hi KevinH,

I'm not sure whether my Guide will be suitable for you to include in the document Sigil_Plugin_Framework; I am learning about both Python and how to develop plugins for Sigil so my Guide is really a record of how I approach the development of a plugin. I will be including brief notes on Python code for Sigil Plugins as I learn about it [eg the function re.sub()] and I'm not sure that this would be appropriate for the document Sigil_Plugin_Framework. I thought that this record may be helpful to others.

Having said that, I am willing to provide my text if you want to adapt it or include it (fully or partly) in the Sigil_Plugin_Framework. Which format would you prefer this document to be sent?


CalibUser 07-22-2015 10:14 AM

I have updated the plugin and the guide for developing plugins. The new versions are in the first posting of this thread.

The new plugin has the following updates for correcting text that has been scanned in with issues:

1. Some regular expressions for correcting the formatting of ePub files have been updated and a new one for dealing with quotes that should not be together has been added. The plugin cannot deal with paragraphs that begin with multiple tags eg <p><b><i> - there are too many combinations for this and it would require many more regular expressions to fix.

2. A new function for fixing incorrect breaks in sentences at the end of paragraphs has been added. This function is not perfect and will not detect all line breaks. There is an option in the json file to use one of the regular expressions that will automatically join paragraphs without a full stop; if a full stop was missing the paragraphs will be joined regardless.

3. A function to give consistency to italicised text. Sometimes scanned files start/end italics inconsistently eg an opening quote may be in italics, and the closing quote may not. I prefer to have only text in italics so this function achieves that objective. It may be disabled by editing the json file

4. A function to replace HTML items eg &mdash; with a long hyphen. Again, this may be deactivated by editing the json file.


I have also produced the guide in Sigil format as requested. This has been attached tot he first post.

CalibUser 08-09-2015 06:55 AM

I have updated the guide and the plugin to include a user interface.

Make sure you keep a backup of your epub file before running this plugin.

The interface provides three options.
1. Fix all broken line endings
The plugin will fix some situations where a sentence is split across two paragraphs. It will not fix some errors eg where a capital letter is at the start of the second paragraph (this can happen with place names, for example). If you select his option, then all broken line endings will be repaired; however, this may repair some incorrectly. If you don't select this option, then you will need to check for incorrect paragraph endings manually searching for: ([a-z])</p>\s+<p>

2. Replace HTML
Choosing this option will replace &mdash; with a long hyphen and &nbsp; with a space.

3. Process italics
Sometimes the use of italics is inconsistent. For example, in quote marks, the leading quote may be italicised and the ending quote may not. Choosing this option will ensure that the quote marks are in normal text.

CalibUser 08-23-2015 10:44 AM

I have developed this plugin more fully and think that it should go in a thread of its own since I will not be using the whole plugin for the guide in this thread.

Consequently, I have put the developed file at: Plugin for tidying ePub files (https://www.mobileread.com/forums/sho...d.php?t=264378)

KevinH 08-28-2015 10:50 AM

Hi,
If your plugin is pure python, you should think about adding osx, and unx to your oslist so that that you plugin will actually be useful to everyone. In fact, if you want testers on other platforms like Linux and Mac OS X, you need to add them as Sigil will disable/remove any of its plugins that are not supported on that os based on that list. So when I tried to load it, a plugin.xml error was reported.

I will add your thread to the Sigil Plugin thread this weekend when I get a free moment.

Thanks,

KevinH

CalibUser 08-31-2015 02:38 PM

I have updated the plugin at https://www.mobileread.com/forums/sho...d.php?t=264378. It should work on the other Operating Systems, although I have not tested it on these.


All times are GMT -4. The time now is 08:30 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.