08-23-2015, 09:39 AM | #1 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Plugin for tidying ePub files
Hi,
I have developed this plugin as a tool to help tidy up ePub files that have been converted from pdf documents but contain ocr errors. The plugin has the following features:
The instructions for using the plugin are in the attached file named ePub tidy tool v3.0.1.0.epub. Update 20th July 2020 The plugin has been updated to version 3.0.1.0. This version has an option to scan ePub files for hyphenated words and add them to a file of hyphenated words that must not be removed by this plugin. Update 11th October 2020 There was an error in version 3.0.1.2 that affected lines that were commented with <!-- this is an html comment -->, corrupting the ePub. I have made a quick correction in the attached file, version 3.0.1.3, although the error reporting facility will report the following for each comment found: "Replaced a series of short/long hyphens with one long hyphen 2 Replaced <space><long hypen><space> with one long hyphen 2" Update 21 November 2020 Bug fixes The number of changes reported under Replaced a series of short/long hyphens with one long hyphen and Replaced <space><long hypen><space> with one long hyphen was incorrect; this has been fixed. A quote mark next to a speech mark (eg ’") caused one of these marks to be moved to a line by itself; this has been fixed. Important: Please ensure that you keep a back up of your original ePub file before running this plugin. When some old publications are OCR'd some words are frequently misspelt in the same way in every scan. I am attaching a file that can be used with the plugin to correct the spelling of these words. It is based on a file provided by martyger at https://www.mobileread.com/forums/sh...d.php?t=265830 and includes updates from Steadyhands at https://www.mobileread.com/forums/sh...&postcount=154 Gipsy has put files containing Greek words for this plugin in this thread at: https://www.mobileread.com/forums/sh...65#post3208365 Update 18 April 2022 Bug fix A fix has been made to address the issue raised by Thasaidon Enjoy! Last edited by CalibUser; 04-18-2022 at 05:51 AM. Reason: Bug fixes |
08-31-2015, 01:36 PM | #2 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have updated the plugin. It corrects a few more errors in ePub files and also has a new tool to help with formatting chapter titles. I have put the new plugin in the first post in this thread.
As always, ensure you have a backup of your ePub book before running this plugin. |
Advert | |
|
08-31-2015, 04:25 PM | #3 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
IMHO, it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done. |
|
08-31-2015, 05:48 PM | #4 |
Guru
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
Should this plugin work under Windows 10? I'm getting no setup screen, then if I run anyway it fails with:
TclError: Can't find a usable init.tcl in the following directories: C:/Python34/lib/tcl8.6 C:/lib/tcl8.6 C:/lib/tcl8.6 C:/library C:/library C:/tcl8.6.1/library C:/tcl8.6.1/library This probably means that Tcl wasn't installed properly. |
08-31-2015, 06:06 PM | #5 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
@CalibUser: Did you install the official Python 3.4.x build from the official Python website (python.org)? |
|
Advert | |
|
08-31-2015, 06:49 PM | #6 |
Guru
Posts: 878
Karma: 2457540
Join Date: Nov 2011
Device: none
|
Sorted. By installing the latest release of Python 3.4 from python.org.
|
09-02-2015, 12:22 PM | #7 |
Sigil Developer
Posts: 7,659
Karma: 5433388
Join Date: Nov 2009
Device: many
|
This plugin thread has been to the official Sigil Plugin Index thread here:
https://www.mobileread.com/forums/sho...d.php?t=247431 KevinH Last edited by KevinH; 09-02-2015 at 12:59 PM. |
09-02-2015, 02:26 PM | #8 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Hi,
@ Doitsu: I am using Python version 3.4.0 from the Python Software Foundation. "it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done" In Windows 7 my plugin shuts itself down, although the Sigil Plugin Runner Window stays open. I use this to report the changes made. Is it the the Sigil Plugin Runner Window that needs to be closed using the cancel button, or is it my plugin? On my system I click the OK button to close the Sigil Window. |
09-02-2015, 04:47 PM | #9 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Maybe I don't understand how to use the plugin correctly or how the plugin works.
I created the following test file: Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> <p>I went to</p> <p>California for my holiday.</p> <p>I went to</p> <p>my favorite bar yesterday.</p> </body> </html> The plugin displayed the following message in the Plugin Runner dialog box: Code:
ID: Section0001.xhtml href: Text/Section0001.xhtml Open quote: " Close quote: " Apostrophe: ' I had to click the Cancel button in the TK window to terminate the plugin. |
09-03-2015, 01:56 PM | #10 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Thanks for the feedback.
I removed my debugging code from the plugin and this seems to have caused a problem - I probably removed something that I should have left in place. I will try to work out what has happened. |
09-03-2015, 03:06 PM | #11 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have fixed a bug in this plugin and uploaded it to the first post in this thread.
The plugin should close automatically, update the ePub file and display the changes made in the Plugin Runner dialog box. |
09-03-2015, 07:00 PM | #12 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Can someone who uses a Linux distro other than Debian Jessie or a Mac please test the plugin with the following test file? Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> <p>I went to</p> <p>California for my holiday.</p> <p>I went to</p> <p>my favorite bar yesterday.</p> </body> </html> (This should merge the two broken sentences.) |
|
09-03-2015, 10:19 PM | #13 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Doitsu -- I am running Arch Linux, so my python is the latest version (3.4.3).
Installed the plugin, entered your test file, ran the plugin.... clicked OK... Code:
ID: Section0001.xhtml href: Text/Section0001.xhtml Open quote: " Close quote: " Apostrophe: ' ... Ah, but if I click Cancel it reports success. No changes, just success. Last edited by eschwartz; 09-03-2015 at 10:22 PM. |
09-05-2015, 04:07 AM | #14 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
At first... Thanks for your work
It save me some time from manual editing :P I want to ask you something... In greek sometimes the epub contains 'Ε or "Ε for Έ. There is any way to add it to the checks of the plugin? It's not necessary to add it to the plugin for all. I want to try it at first if it works fine Thanks EDIT: Found it :P Last edited by gipsy; 09-05-2015 at 07:16 AM. |
09-05-2015, 10:44 AM | #15 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I believe the problem on Linux is the path specified in the plugin for the dictionary (I don't have Linux so I can't confirm this). I have updated the plugin in the first post in this thread so that when the plugin is run for the first time, it asks for the location and filename of the dictionary (see the epub in the first post for details) that is used for correcting hyphenated words that should not be hyphenated. Hopefully this will resolve the problem in Linux so that it will not run and run, nor require the Cancel button to be pressed to exit.
I have improved the plugin for working with Chapter headings: Some words such as 'an' do not normally start with a capital letter when the heading is in titlecase. I have amended the plugin so that these words are now in lower case when titlecase is selected in the plugin. If you come across any words that should be lowercase but appear in titlecase then please let me know and I will update in the next version of this plugin. With the previous version of the plugin when titlecase is applied to a chapter heading the first Roman numeral is capitalised and the remainder are in lower case; I have added an option to the 'Format chapter titles' dialog so that the user can select the required case for Roman numerals when title case is applied. The plugin does require version 3.4 of Python - I should have mentioned this sooner. @DiapDealer: Please remove the posts concerning the debate on the version of Python that is used as this detracts from the purpose of this thread. Thanks. @davidfor: This plugin is for Sigil - my user name is misleading. Originally I joined the forum when there were no plans to develop Sigil further, so I chose my user name as CalibUser; when I found out that Sigil would continue to be developed I carried on using Sigil as my preferred ePub editor - I don't think it's possible to change user names. However, I do use Calibre for other functions. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM |
developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM |
Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM |
Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM |
Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |