Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-23-2015, 10:39 AM   #1
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
ePub Tidy

Hi,

I have developed this plugin as a tool to help tidy up ePub files that have been converted from pdf documents but contain ocr errors.

EDIT 1 The plugin has been updated to version 3.0.0.2. This version includes several updates as follows:
  • removes more hidden codes from scanned files
  • has an option to format the xhtml files
  • inserts tags <colgroup> and </colgroup> in tables where they are missing
  • removes blank lines immediately following the <p...> tag
  • has new code for checking whether a new version of Sigil is available that presents a dialog box to enable you to either go to the donwload page for the plugin or ask to be reminded again later
  • enables you to specify words that must be hyphenated.

The instructions for using the plugin are in the attached file named ePub tidy tool v3.0.0.0.epub.

EDIT 2 The plugin has been amended so that it will not remove hyphens if one of the hyphenated words is only one character.



Important: Please ensure that you keep a back up of your original ePub file before running this plugin.

When some old publications are OCR'd some words are frequently misspelt in the same way in every scan. I am attaching a file that can be used with the plugin to correct the spelling of these words. It is based on a file provided by martyger at https://www.mobileread.com/forums/sh...d.php?t=265830 and includes updates form Steadyhands at https://www.mobileread.com/forums/sh...&postcount=154

Gipsy has put files containing Greek words for this plugin in this thread at:
https://www.mobileread.com/forums/sh...65#post3208365



Enjoy!
Attached Files
File Type: txt IncorrectWords.txt (1.3 KB, 1142 views)
File Type: epub ePub tidy tool v3.0.0.0.epub (16.4 KB, 26 views)
File Type: zip ePubTidyTool_v3.0.0.2.zip (38.3 KB, 27 views)

Last edited by CalibUser; 11-10-2018 at 03:06 PM. Reason: Plugin update
CalibUser is offline   Reply With Quote
Old 08-31-2015, 02:36 PM   #2
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
I have updated the plugin. It corrects a few more errors in ePub files and also has a new tool to help with formatting chapter titles. I have put the new plugin in the first post in this thread.

As always, ensure you have a backup of your ePub book before running this plugin.
CalibUser is offline   Reply With Quote
Advert
Old 08-31-2015, 05:25 PM   #3
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,308
Karma: 14277313
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by CalibUser View Post
I have updated the plugin at https://www.mobileread.com/forums/sho...d.php?t=264378. It should work on the other Operating Systems, although I have not tested it on these.
The plugin installed fine with the latest Linux version of Sigil and appears to be working as designed.

IMHO, it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done.
Doitsu is offline   Reply With Quote
Old 08-31-2015, 06:48 PM   #4
exaltedwombat
Guru
exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.
 
Posts: 659
Karma: 1771606
Join Date: Nov 2011
Device: none
Should this plugin work under Windows 10? I'm getting no setup screen, then if I run anyway it fails with:

TclError: Can't find a usable init.tcl in the following directories:
C:/Python34/lib/tcl8.6 C:/lib/tcl8.6 C:/lib/tcl8.6 C:/library C:/library C:/tcl8.6.1/library C:/tcl8.6.1/library
This probably means that Tcl wasn't installed properly.
exaltedwombat is offline   Reply With Quote
Old 08-31-2015, 07:06 PM   #5
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,308
Karma: 14277313
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by exaltedwombat View Post
Should this plugin work under Windows 10? I'm getting no setup screen, then if I run anyway it fails with:

TclError: Can't find a usable init.tcl in the following directories:
C:/Python34/lib/tcl8.6 C:/lib/tcl8.6 C:/lib/tcl8.6 C:/library C:/library C:/tcl8.6.1/library C:/tcl8.6.1/library
This probably means that Tcl wasn't installed properly.
I got the same error on my Windows 10 machine. Did you by any chance also install ActivePython 2.7.x and 3.4.x on your machine?

@CalibUser: Did you install the official Python 3.4.x build from the official Python website (python.org)?
Doitsu is offline   Reply With Quote
Advert
Old 08-31-2015, 07:49 PM   #6
exaltedwombat
Guru
exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.exaltedwombat ought to be getting tired of karma fortunes by now.
 
Posts: 659
Karma: 1771606
Join Date: Nov 2011
Device: none
Sorted. By installing the latest release of Python 3.4 from python.org.
exaltedwombat is offline   Reply With Quote
Old 09-02-2015, 01:22 PM   #7
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 3,101
Karma: 1931746
Join Date: Nov 2009
Device: many
This plugin thread has been to the official Sigil Plugin Index thread here:

https://www.mobileread.com/forums/sho...d.php?t=247431

KevinH

Last edited by KevinH; 09-02-2015 at 01:59 PM.
KevinH is offline   Reply With Quote
Old 09-02-2015, 03:26 PM   #8
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
Hi,

@ Doitsu: I am using Python version 3.4.0 from the Python Software Foundation.

"it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done"

In Windows 7 my plugin shuts itself down, although the Sigil Plugin Runner Window stays open. I use this to report the changes made. Is it the the Sigil Plugin Runner Window that needs to be closed using the cancel button, or is it my plugin? On my system I click the OK button to close the Sigil Window.
CalibUser is offline   Reply With Quote
Old 09-02-2015, 05:47 PM   #9
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,308
Karma: 14277313
Join Date: Dec 2010
Device: Kindle PW2
Maybe I don't understand how to use the plugin correctly or how the plugin works.

I created the following test file:

Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>
  <p>I went to</p>

  <p>California for my holiday.</p>

  <p>I went to</p>

  <p>my favorite bar yesterday.</p>
</body>
</html>
I then started the plugin and selected only "Fix ALL broken line endings" and clicked OK.

The plugin displayed the following message in the Plugin Runner dialog box:

Code:
ID: Section0001.xhtml	href: Text/Section0001.xhtml
Open quote:  "
Close quote:  "
Apostrophe:  '
but nothing got changed and both the Plugin Runner dialog box and the TK dialog box remained visible.

I had to click the Cancel button in the TK window to terminate the plugin.
Attached Thumbnails
Click image for larger version

Name:	dialog.png
Views:	412
Size:	13.4 KB
ID:	141605  
Doitsu is offline   Reply With Quote
Old 09-03-2015, 02:56 PM   #10
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
Thanks for the feedback.
I removed my debugging code from the plugin and this seems to have caused a problem - I probably removed something that I should have left in place.

I will try to work out what has happened.
CalibUser is offline   Reply With Quote
Old 09-03-2015, 04:06 PM   #11
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
I have fixed a bug in this plugin and uploaded it to the first post in this thread.

The plugin should close automatically, update the ePub file and display the changes made in the Plugin Runner dialog box.
CalibUser is offline   Reply With Quote
Old 09-03-2015, 08:00 PM   #12
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 4,308
Karma: 14277313
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by CalibUser View Post
I have fixed a bug in this plugin and uploaded it to the first post in this thread.
The new version works with Windows, but not with my Linux version (Debian Jessie), however, this is most likely caused by some incompatible library on my system or maybe because Debian Jessie comes with Python 3.4.2 and Windows with Python 3.4.3.

Can someone who uses a Linux distro other than Debian Jessie or a Mac please test the plugin with the following test file?

Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>
  <p>I went to</p>

  <p>California for my holiday.</p>

  <p>I went to</p>

  <p>my favorite bar yesterday.</p>
</body>
</html>
Select only "Fix ALL broken line endings" and click OK.
(This should merge the two broken sentences.)
Doitsu is offline   Reply With Quote
Old 09-03-2015, 11:19 PM   #13
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,247
Karma: 83049305
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Doitsu -- I am running Arch Linux, so my python is the latest version (3.4.3).

Installed the plugin, entered your test file, ran the plugin.... clicked OK...

Code:
ID: Section0001.xhtml	href: Text/Section0001.xhtml
Open quote:  "
Close quote:  "
Apostrophe:  '
Still running and running and running.


...


Ah, but if I click Cancel it reports success. No changes, just success.

Last edited by eschwartz; 09-03-2015 at 11:22 PM.
eschwartz is offline   Reply With Quote
Old 09-05-2015, 05:07 AM   #14
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
At first... Thanks for your work
It save me some time from manual editing :P

I want to ask you something... In greek sometimes the epub contains 'Ε or "Ε for Έ.
There is any way to add it to the checks of the plugin? It's not necessary to add it to the plugin for all. I want to try it at first if it works fine

Thanks

EDIT: Found it :P

Last edited by gipsy; 09-05-2015 at 08:16 AM.
gipsy is offline   Reply With Quote
Old 09-05-2015, 11:44 AM   #15
CalibUser
Zealot
CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.CalibUser has a propeller beanie that spins backward.
 
Posts: 143
Karma: 61844
Join Date: Jul 2015
Device: Sony
I believe the problem on Linux is the path specified in the plugin for the dictionary (I don't have Linux so I can't confirm this). I have updated the plugin in the first post in this thread so that when the plugin is run for the first time, it asks for the location and filename of the dictionary (see the epub in the first post for details) that is used for correcting hyphenated words that should not be hyphenated. Hopefully this will resolve the problem in Linux so that it will not run and run, nor require the Cancel button to be pressed to exit.

I have improved the plugin for working with Chapter headings: Some words such as 'an' do not normally start with a capital letter when the heading is in titlecase. I have amended the plugin so that these words are now in lower case when titlecase is selected in the plugin. If you come across any words that should be lowercase but appear in titlecase then please let me know and I will update in the next version of this plugin.

With the previous version of the plugin when titlecase is applied to a chapter heading the first Roman numeral is capitalised and the remainder are in lower case; I have added an option to the 'Format chapter titles' dialog so that the user can select the required case for Roman numerals when title case is applied.

The plugin does require version 3.4 of Python - I should have mentioned this sooner.

@DiapDealer: Please remove the posts concerning the debate on the version of Python that is used as this detracts from the purpose of this thread. Thanks.

@davidfor: This plugin is for Sigil - my user name is misleading. Originally I joined the forum when there were no plans to develop Sigil further, so I chose my user name as CalibUser; when I found out that Sigil would continue to be developed I carried on using Sigil as my preferred ePub editor - I don't think it's possible to change user names. However, I do use Calibre for other functions.
CalibUser is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tidying Up My Kindle selectortone Calibre 2 07-17-2013 11:35 AM
developping a Plugin for Presentation files abdlink Plugins 4 04-15-2013 12:27 PM
Plugin to fix fb2 files oviksna Plugins 3 01-28-2013 09:53 AM
Tidying Up My Library JayLaFunk Library Management 2 09-20-2011 10:12 AM
Calibre 0.7.50 can't see plugin files mb_webguy Calibre 5 04-29-2011 04:41 AM


All times are GMT -4. The time now is 01:16 AM.


MobileRead.com is a privately owned, operated and funded community.