MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Plugin for tidying ePub files (https://www.mobileread.com/forums/showthread.php?t=264378)

CalibUser 08-23-2015 10:39 AM

Plugin for tidying ePub files
 
3 Attachment(s)
Hi,

I have developed this plugin as a tool to help tidy up ePub files that have been converted from pdf documents but contain ocr errors. The plugin has the following features:
  • processes span tags, allowing tags to be removed or changed
  • corrects false line breaks
  • corrects miscellaneous errors, for example, removing unnecessary spaces, correcting the direction of apostrophe's, and inserting the tags <colgroup> and </colgroup> in tables where they are missing
  • reformats chapter titles
  • reassign header tags
  • uses a customised list of words to correct common misspellings in the OCR process
  • imports a customised css file
  • corrects incorrectly hyphenated words
  • has an option to format the xhtml files

The instructions for using the plugin are in the attached file named ePub tidy tool v3.0.1.0.epub.

Update 20th July 2020 The plugin has been updated to version 3.0.1.0. This version has an option to scan ePub files for hyphenated words and add them to a file of hyphenated words that must not be removed by this plugin.

Update 11th October 2020 There was an error in version 3.0.1.2 that affected lines that were commented with <!-- this is an html comment -->, corrupting the ePub. I have made a quick correction in the attached file, version 3.0.1.3, although the error reporting facility will report the following for each comment found:

"Replaced a series of short/long hyphens with one long hyphen 2
Replaced <space><long hypen><space> with one long hyphen 2"

Update 21 November 2020
Bug fixes
The number of changes reported under Replaced a series of short/long hyphens with one long hyphen and Replaced <space><long hypen><space> with one long hyphen was incorrect; this has been fixed.

A quote mark next to a speech mark (eg ’") caused one of these marks to be moved to a line by itself; this has been fixed.

Important: Please ensure that you keep a back up of your original ePub file before running this plugin.

When some old publications are OCR'd some words are frequently misspelt in the same way in every scan. I am attaching a file that can be used with the plugin to correct the spelling of these words. It is based on a file provided by martyger at https://www.mobileread.com/forums/sh...d.php?t=265830 and includes updates from Steadyhands at https://www.mobileread.com/forums/sh...&postcount=154

Gipsy has put files containing Greek words for this plugin in this thread at:
https://www.mobileread.com/forums/sh...65#post3208365


Enjoy!

CalibUser 08-31-2015 02:36 PM

I have updated the plugin. It corrects a few more errors in ePub files and also has a new tool to help with formatting chapter titles. I have put the new plugin in the first post in this thread.

As always, ensure you have a backup of your ePub book before running this plugin.

Doitsu 08-31-2015 05:25 PM

Quote:

Originally Posted by CalibUser (Post 3161518)
I have updated the plugin at https://www.mobileread.com/forums/sho...d.php?t=264378. It should work on the other Operating Systems, although I have not tested it on these.

The plugin installed fine with the latest Linux version of Sigil and appears to be working as designed.

IMHO, it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done.

exaltedwombat 08-31-2015 06:48 PM

Should this plugin work under Windows 10? I'm getting no setup screen, then if I run anyway it fails with:

TclError: Can't find a usable init.tcl in the following directories:
C:/Python34/lib/tcl8.6 C:/lib/tcl8.6 C:/lib/tcl8.6 C:/library C:/library C:/tcl8.6.1/library C:/tcl8.6.1/library
This probably means that Tcl wasn't installed properly.

Doitsu 08-31-2015 07:06 PM

Quote:

Originally Posted by exaltedwombat (Post 3161698)
Should this plugin work under Windows 10? I'm getting no setup screen, then if I run anyway it fails with:

TclError: Can't find a usable init.tcl in the following directories:
C:/Python34/lib/tcl8.6 C:/lib/tcl8.6 C:/lib/tcl8.6 C:/library C:/library C:/tcl8.6.1/library C:/tcl8.6.1/library
This probably means that Tcl wasn't installed properly.

I got the same error on my Windows 10 machine. Did you by any chance also install ActivePython 2.7.x and 3.4.x on your machine?

@CalibUser: Did you install the official Python 3.4.x build from the official Python website (python.org)?

exaltedwombat 08-31-2015 07:49 PM

Sorted. By installing the latest release of Python 3.4 from python.org.

KevinH 09-02-2015 01:22 PM

This plugin thread has been to the official Sigil Plugin Index thread here:

https://www.mobileread.com/forums/sho...d.php?t=247431

KevinH

CalibUser 09-02-2015 03:26 PM

Hi,

@ Doitsu: I am using Python version 3.4.0 from the Python Software Foundation.

"it's a bit confusing, though, that the user has to press Cancel to close the UI. Ideally, the UI should self-destroy after the plugin is done"

In Windows 7 my plugin shuts itself down, although the Sigil Plugin Runner Window stays open. I use this to report the changes made. Is it the the Sigil Plugin Runner Window that needs to be closed using the cancel button, or is it my plugin? On my system I click the OK button to close the Sigil Window.

Doitsu 09-02-2015 05:47 PM

1 Attachment(s)
Maybe I don't understand how to use the plugin correctly or how the plugin works.

I created the following test file:

Code:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>
  <p>I went to</p>

  <p>California for my holiday.</p>

  <p>I went to</p>

  <p>my favorite bar yesterday.</p>
</body>
</html>

I then started the plugin and selected only "Fix ALL broken line endings" and clicked OK.

The plugin displayed the following message in the Plugin Runner dialog box:

Code:

ID: Section0001.xhtml        href: Text/Section0001.xhtml
Open quote:  "
Close quote:  "
Apostrophe:  '

but nothing got changed and both the Plugin Runner dialog box and the TK dialog box remained visible.

I had to click the Cancel button in the TK window to terminate the plugin.

CalibUser 09-03-2015 02:56 PM

Thanks for the feedback.
I removed my debugging code from the plugin and this seems to have caused a problem - I probably removed something that I should have left in place.

I will try to work out what has happened.

CalibUser 09-03-2015 04:06 PM

I have fixed a bug in this plugin and uploaded it to the first post in this thread.

The plugin should close automatically, update the ePub file and display the changes made in the Plugin Runner dialog box.

Doitsu 09-03-2015 08:00 PM

Quote:

Originally Posted by CalibUser (Post 3163777)
I have fixed a bug in this plugin and uploaded it to the first post in this thread.

The new version works with Windows, but not with my Linux version (Debian Jessie), however, this is most likely caused by some incompatible library on my system or maybe because Debian Jessie comes with Python 3.4.2 and Windows with Python 3.4.3.

Can someone who uses a Linux distro other than Debian Jessie or a Mac please test the plugin with the following test file?

Code:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>
  <p>I went to</p>

  <p>California for my holiday.</p>

  <p>I went to</p>

  <p>my favorite bar yesterday.</p>
</body>
</html>

Select only "Fix ALL broken line endings" and click OK.
(This should merge the two broken sentences.)

eschwartz 09-03-2015 11:19 PM

Doitsu -- I am running Arch Linux, so my python is the latest version (3.4.3). :D

Installed the plugin, entered your test file, ran the plugin.... clicked OK...

Code:

ID: Section0001.xhtml        href: Text/Section0001.xhtml
Open quote:  "
Close quote:  "
Apostrophe:  '

Still running and running and running.


...


Ah, but if I click Cancel it reports success. No changes, just success. :rolleyes:

gipsy 09-05-2015 05:07 AM

At first... Thanks for your work :)
It save me some time from manual editing :P

I want to ask you something... In greek sometimes the epub contains 'Ε or "Ε for Έ.
There is any way to add it to the checks of the plugin? It's not necessary to add it to the plugin for all. I want to try it at first if it works fine :)

Thanks

EDIT: Found it :P

CalibUser 09-05-2015 11:44 AM

I believe the problem on Linux is the path specified in the plugin for the dictionary (I don't have Linux so I can't confirm this). I have updated the plugin in the first post in this thread so that when the plugin is run for the first time, it asks for the location and filename of the dictionary (see the epub in the first post for details) that is used for correcting hyphenated words that should not be hyphenated. Hopefully this will resolve the problem in Linux so that it will not run and run, nor require the Cancel button to be pressed to exit.

I have improved the plugin for working with Chapter headings: Some words such as 'an' do not normally start with a capital letter when the heading is in titlecase. I have amended the plugin so that these words are now in lower case when titlecase is selected in the plugin. If you come across any words that should be lowercase but appear in titlecase then please let me know and I will update in the next version of this plugin.

With the previous version of the plugin when titlecase is applied to a chapter heading the first Roman numeral is capitalised and the remainder are in lower case; I have added an option to the 'Format chapter titles' dialog so that the user can select the required case for Roman numerals when title case is applied.

The plugin does require version 3.4 of Python - I should have mentioned this sooner.

@DiapDealer: Please remove the posts concerning the debate on the version of Python that is used as this detracts from the purpose of this thread. Thanks.

@davidfor: This plugin is for Sigil - my user name is misleading. Originally I joined the forum when there were no plans to develop Sigil further, so I chose my user name as CalibUser; when I found out that Sigil would continue to be developed I carried on using Sigil as my preferred ePub editor - I don't think it's possible to change user names. However, I do use Calibre for other functions.

Doitsu 09-05-2015 12:07 PM

@CalibUser: I've just tested the updated plugin with my Linux machine and appears to be working fine. (I only tested the line break fix.)

gipsy 09-05-2015 04:01 PM

The Fix for false line breaks doesn't work in greek language.

I use the following regex to fix the lines breaks.
Code:

Find: ([\p{Greek},'–’“”][</ib>]*)</p>\s+<p>([<ib>]*[\p{Greek},'–’“”])
Replace:\1 \2

I try to change the
Code:

        if allBreaks == 'Yes':
                CorrectText("Fixed false line breaks:", r'([a-z])</p>\s+<p[^>]*>([A-Z])', r'\1 \2')

with the in HTMLProcessor.py

Code:

        if allBreaks == 'Yes':
                CorrectText("Fixed false line breaks:", r'([\p{Greek}\,\'–’“”][</ib>]*)</p>\s+<p>([<ib>]*[\p{Greek},\'–’“”])', r'\1 \2')

but the lines doesn't combine :(
I don't know any python. Is my code ok?
Thanks :)

CalibUser 09-05-2015 05:33 PM

@Doitsu: Thanks for testing the plugin

@gipsy: In your code:

Code:

r'([\p{Greek}\,\'–’“”][</ib>]*)</p>\s+<p>([<ib>]*[\p{Greek},\'–’“”])'
you have escaped the comma with a slash - there is no need to do this as the r in front of the code (stands for raw) means that you do not need to escape characters; however, you do need to escape the single quote mark otherwise this would signify the end of the expression. I presume that {Greek} represents Greek characters?

gipsy 09-05-2015 05:53 PM

You are right. But again they don't compine :(
Yes is for greek characters.

Code:

  <p>ο Πυθέας ήπιε το υπόλοιπο</p>

  <p>γάλα από το κύπελλο, σκούπισε</p>

  <p>δυο σταγόνες στα χείλη του με την ανάστροφη του</p>

  <p>χεριού του και σηκώθηκε.</p>

With the regex it's change to
Code:

  <p>ο Πυθέας ήπιε το υπόλοιπο γάλα από το κύπελλο, σκούπισε δυο σταγόνες στα χείλη του με την ανάστροφη του χεριού του και σηκώθηκε.</p>
but it doesn't with the changes in py :(

Doitsu 09-05-2015 09:32 PM

Quote:

Originally Posted by gipsy (Post 3165206)
The Fix for false line breaks doesn't work in greek language.

Sigil and Python use different Regex engines. Sigil uses PCRE and Python uses an older, less powerful version.
AFAIK, Python doesn't support the \p{Greek} syntax. I.e., Greek letters need to be explicitly expressed as Unicode ranges (0370–03FF).

eschwartz 09-05-2015 11:51 PM

Quote:

Originally Posted by CalibUser (Post 3165047)
I believe the problem on Linux is the path specified in the plugin for the dictionary (I don't have Linux so I can't confirm this). I have updated the plugin in the first post in this thread so that when the plugin is run for the first time, it asks for the location and filename of the dictionary (see the epub in the first post for details) that is used for correcting hyphenated words that should not be hyphenated. Hopefully this will resolve the problem in Linux so that it will not run and run, nor require the Cancel button to be pressed to exit.

Yep:

Code:

      userProfile = (os.environ['USERPROFILE'])      #Get path to user profile

      try:
              f = open(userProfile+'\\AppData\\Local\sigil-ebook\\sigil\\user_dictionaries\\WordDictionary.txt', 'r', encoding='utf-8')

There was no way that would ever work on anything other than Windows. It is highly os-specific.
For that matter, it is highly environment-specific -- it would also break hard on a PortableApps.com install, for example.


Is there any way in Sigil/the plugin container to access the value of the Sigil configuration folder? This would be a far, far better way of handling it. (If there isn't a way, then it would be a generally useful thing to have...)
Asking the user to manually select the dictionary just to get around the issue of finding the configuration directory is overkill (and slightly onerous) -- although it could be useful if one has multiple dictionaries and wants to use a specific one, that is probably an edge case. EDIT: And of course the instructions already make it clear that that won't work.

KevinH 09-06-2015 12:52 AM

FWIW,
The next release of Sigil will include an interface to the hunspell spellchecker and will provide a list of paths to the hunspell dictionaries.

If I can figure out how best to bundle sigil's version of gumbo for use by plugins, and if DiapDealer and I can fix some bugs, we should have a release out in 2 or 3 weeks.

Kevin

Doitsu 09-06-2015 07:19 AM

1 Attachment(s)
Quote:

Originally Posted by eschwartz (Post 3165372)
There was no way that would ever work on anything other than Windows. It is highly os-specific.
For that matter, it is highly environment-specific -- it would also break hard on a PortableApps.com install, for example.

This is certainly a valid point, but since CalibUser doesn't have a Mac or a Linux machine, I appreciate it that he at least made an effort to make his plugin OSX/Linux compatible, even though he doesn't have a way of testing it.

@CalibUser: Python has a boatload of built-in functions for cross-platform file handling that make it really easy to implement cross-platform file support.

Since the Sigil plugin root directory and the user_dictionary directory are sibling directories it's relatively easy to get the user_dictionary directory location.

For example, you could use the following code to get the dictionary folder:

Code:

import os, inspect
def run(bk):
    # get plugin directory path
    plugin_path = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
    print(plugin_path)
   
    # get rid of the last two directories
    tmp_path = plugin_path.split(os.path.sep)[:7]
    print(tmp_path)
   
    # add the dictionary path
    tmp_path.extend(['user_dictionaries', 'WordDictionary.txt'])
   
    # convert list back to file path
    dictionary_path = os.path.sep.join(tmp_path)
    print(dictionary_path)

The above code will produce the following output:

Windows:
Code:

C:\Users\Doitsu\AppData\Local\sigil-ebook\sigil\plugins\test
['C:', 'Users', 'Doitsu', 'AppData', 'Local', 'sigil-ebook', 'sigil']
C:\Users\Doitsu\AppData\Local\sigil-ebook\sigil\user_dictionaries\WordDictionary.txt

Linux (DiapDealer's build):
Code:

/home/doitsu/.local/share/sigil-ebook/sigil/plugins/test
['', 'home', 'doitsu', '.local', 'share', 'sigil-ebook', 'sigil']
/home/doitsu/.local/share/sigil-ebook/sigil/user_dictionaries/WordDictionary.txt

I've attached a test plugin that you can play with.

gipsy 09-06-2015 08:23 AM

Quote:

Originally Posted by Doitsu (Post 3165330)
Sigil and Python use different Regex engines. Sigil uses PCRE and Python uses an older, less powerful version.
AFAIK, Python doesn't support the \p{Greek} syntax. I.e., Greek letters need to be explicitly expressed as Unicode ranges (0370–03FF).

Thanks Doitsu.

I change it to
Code:

        if allBreaks == 'Yes':
                CorrectText("Fixed false line breaks:", r'(([\x{0370}-\x{03FF}\x{1F00}-\x{1FFF},\'–’“”][</ib>]*)</p>\s+<p>([<ib>]*[\x{0370}-\x{03FF}\x{1F00}-\x{1FFF},\'–’“”]))', r'\1 \2')

from a regex sample someone had priveded me here to get rid of hyphen and IT WORKED! :thanks:

Quote:

Originally Posted by eschwartz (Post 3165372)
For that matter, it is highly environment-specific -- it would also break hard on a PortableApps.com install, for example.

I run a portable installation. When you run the Sigil. It copies the sigil setting (hunspell, user dictionaries, plugins etc to appdata\local\sigil-ebook and when you exit from sigil they copied back to portable location.

gipsy 09-06-2015 08:32 AM

@CalibUser: Those are some fixes in greek language if you want to place them in your plygin. I try to find a solution and for some other things and i keep you posted :D


Code:

#Greek line break fix
        if allBreaks == 'Yes':
                CorrectText("Fixed false line breaks:", r'(([\x{0370}-\x{03FF}\x{1F00}-\x{1FFF},\'–’“”][</ib>]*)</p>\s+<p>([<ib>]*[\x{0370}-\x{03FF}\x{1F00}-\x{1FFF},\'–’“”]))', r'\1 \2')
        return(0)

Code:

        #Fixes Έ when PDFd as 'Ε or "Ε
        CorrectText("Changed 'Ε,\"Ε to Έ", r'(\'Ε|\"Ε)', r'Έ')

        #Fixes Ύ when PDFd as 'Υ or "Υ
        CorrectText("Changed 'Υ,\"Υ to Ύ", r'(\'Υ|\"Υ)', r'Ύ')

        #Fixes Ί when PDFd as 'Ι or "Ι
        CorrectText("Changed 'Ι,\"Ι to Ί", r'(\'Ι|\"Ι)', r'Ί')

        #Fixes Ό when PDFd as 'Ο or "Ο
        CorrectText("Changed 'Ο,\"Ο to Ό", r'(\'Ο|\"Ο)', r'Ό')

        #Fixes Ά when PDFd as 'Α or "Α
        CorrectText("Changed 'Α,\"Α to Ά", r'(\'Α|\"Α)', r'Ά')

        #Fixes Ή when PDFd as 'Η or "Η
        CorrectText("Changed 'Η,\"Η to Ή", r'(\'Η|\"Η)', r'Ή')

        #Fixes Ώ when PDFd as 'Ω or "Ω
        CorrectText("Changed 'Ω,\"Ω to Ώ", r'(\'Ω|\"Ω)', r'Ώ')

        #Fixes ύ when PDFd as ΰ
        CorrectText("Changed ΰ to ύ", r'ΰ', r'ύ')

EDIT: The fix line breaks doesn't work fine. I'm gonna check it again

DiapDealer 09-06-2015 10:01 AM

Quote:

Originally Posted by Doitsu (Post 3165528)
@CalibUser: Since the Sigil plugin root directory and the user_dictionary directory are sibling directories it's relatively easy to get the user_dictionary directory location.

For example, you could use the following code to get the dictionary folder:

Code:

import os, inspect
def run(bk):
    # get plugin directory path
    plugin_path = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
    print(plugin_path)
   
    # get rid of the last two directories
    tmp_path = plugin_path.split(os.path.sep)[:7]
    print(tmp_path)
   
    # add the dictionary path
    tmp_path.extend(['user_dictionaries', 'WordDictionary.txt'])
   
    # convert list back to file path
    dictionary_path = os.path.sep.join(tmp_path)
    print(dictionary_path)

The above code will produce the following output:

Windows:
Code:

C:\Users\Doitsu\AppData\Local\sigil-ebook\sigil\plugins\test
['C:', 'Users', 'Doitsu', 'AppData', 'Local', 'sigil-ebook', 'sigil']
C:\Users\Doitsu\AppData\Local\sigil-ebook\sigil\user_dictionaries\WordDictionary.txt

Linux (DiapDealer's build):
Code:

/home/doitsu/.local/share/sigil-ebook/sigil/plugins/test
['', 'home', 'doitsu', '.local', 'share', 'sigil-ebook', 'sigil']
/home/doitsu/.local/share/sigil-ebook/sigil/user_dictionaries/WordDictionary.txt

I've attached a test plugin that you can play with.

Thanks for mentioning the 'inspect' module method of finding the current script directory. Saves me the trouble. :) Though there may be shorter ways to get the current directory of the script that's being run, it's the only one that's guaranteed to work even when a script was invoked as a module.

I would, however suggest something other than the relatively fragile method of converting a path to a list of strings and then using the [:7] slice to strip off the last two directories. If the depth of that path ever increases, it won't point to the sigil preferences directory anymore. To be clear: it's the [:7] slice I find fragile, not the list of strings conversion and eventual re-joining.

I would suggest using [:-2] if you're going to split the path into a list of strings that later get rejoined. Or just use os.path.dirname twice without converting to a list of strings and rejoining.

It's all a bit fragile I guess (even mine), considering that the plugin directory could conceivably change in relation to the Sigil preferences directory.

Code:

import os, inspect

# get plugin directory path
plugin_path = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))
print(plugin_path)
   
# get rid of the last two directories
tmp_path = os.path.dirname(os.path.dirname(plugin_path))
print(tmp_path)
   
# add the dictionary path
dictionary_path = os.path.join(tmp_path, 'user_dictionaries', 'WordDictionary.txt')
print(dictionary_path)

This will all be simplified when accessing hunspell/dictionaries is incorporated into the plugin launcher framework, but Doitsu's above suggestion could work in the meantime (cross-platform) and wouldn't break even when the new version is released.

You could also determine the path of the current plugin script in the run method of a plugin by using:

Code:

def run(bk):
    ppath = bk._w.plugin_dir

It's not really recommended to access those wrapper script properties/methods directly--as they could change at any time. Though in this particular instance ... I don't foresee the plugin_dir property of the wrapper script ever disappearing or having its name changed. ;)

gipsy 09-06-2015 12:43 PM

Code:

#Greek line break fix
        if allBreaks == 'Yes':
                CorrectText("Fixed false line breaks:", r'([\u0370-\u03FF,\u1F00-\u1FFF,\'–’“”][</ib>]*)</p>\s+<p[^>]*>([<ib>]*[\u0370-\u03FF,\u1F00-\u1FFF,\'–’“”])', r'\1 \2')
        return(0)

This work with greek in line breaks

eschwartz 09-06-2015 03:01 PM

Quote:

Originally Posted by Doitsu (Post 3165528)
This is certainly a valid point, but since CalibUser doesn't have a Mac or a Linux machine, I appreciate it that he at least made an effort to make his plugin OSX/Linux compatible, even though he doesn't have a way of testing it.

Oh certainly it is appreciated. I just thought I'd mention it as something to keep in mind in the future, in order to avoid cross-platform issues from the beginning. :)

Quote:

Originally Posted by gipsy (Post 3165542)
I run a portable installation. When you run the Sigil. It copies the sigil setting (hunspell, user dictionaries, plugins etc to appdata\local\sigil-ebook and when you exit from sigil they copied back to portable location.

I guess I was wrong then. :o That should teach me to make assumptions.

(I guess it really depends on the app. I know they prefer if at all possible to not do that, it reduces the "portability" angle by potentially leaving unwanted cruft on the host computer.)



:chinscratch: It doesn't look like there is any way to override the settings folder location in Sigil.
(And it uses the deprecated-since-5.4 DataLocation, rather than AppDataLocation on Windows and AppConfigLocation on unix -- did Qt have to split it? :blink: -- which explains why the config folder is in ~/.local/share/sigil-ebook -- I have always wondered at that non-standard location.)

DiapDealer 09-06-2015 03:22 PM

Quote:

Originally Posted by eschwartz (Post 3165713)
(And it uses the deprecated-since-5.4 DataLocation, rather than AppDataLocation on Windows and AppConfigLocation on unix -- did Qt have to split it? :blink: -- which explains why the config folder is in ~/.local/share/sigil-ebook -- I have always wondered at that non-standard location.)

It's just inherited. Whether through negligence or changes in Qt over the years. *shrug*

And there's just no "real" pressing need to convert and potentially lose user-settings/plugins in an upgrade (or create a one-time script to copy stuff to the new location). Maybe someday it will change, but it's just not high on the list of priorities at the moment.

eschwartz 09-06-2015 03:40 PM

Quote:

Originally Posted by DiapDealer (Post 3165720)
It's just inherited. Whether through negligence or changes in Qt over the years. *shrug*

And there's just no "real" pressing need to convert and potentially lose user-settings/plugins in an upgrade (or create a one-time script to copy stuff to the new location). Maybe someday it will change, but it's just not high on the list of priorities at the moment.

Sorry :o that was just me being random. I happened to notice that at the same time I noticed there was no way to override the settings dir. (And I was bemused to see Qt hasn't figured out cross-platorm config dirs yet.)

Whether either is *necessary*, I won't venture to say. I agree once it's been used you shouldn't break everyone's settings just to conform to more "proper" standards.

CalibUser 09-06-2015 05:05 PM

Thanks for all these suggestions and comments. When I get time, I will look at implementing some of the ideas presented above:

@Doitsu: Thanks for the directory code and experimental plugin - I will experiment with your plugin as soon as I have time.

@gipsy: Thanks for the code for Greek ePubs. I will incorporate this code in the next version of the plugin.

@DiapDealer: As you do not really recommended accessing script properties/methods directly, I will try the solution offered by Doitsu; I will update from Doitsu's solution when the hunspell/dictionaries is incorporated into the plugin launcher framework.

Doitsu 09-06-2015 05:10 PM

Quote:

Originally Posted by CalibUser (Post 3165756)
@Doitsu: Thanks for the directory code and experimental plugin - I will experiment with your plugin as soon as I have time.

Even though my code works, you may want to use the updated version by DiapDealer, because his version is more robust and also more elegant.

JSWolf 09-06-2015 06:43 PM

Quote:

Originally Posted by Doitsu (Post 3165758)
Even though my code works, you may want to use the updated version by DiapDealer, because his version is more robust and also more elegant.

But there's no attachment for the plugin with the changes.

CalibUser 09-09-2015 03:39 PM

The plugin has been updated so that it will automatically find the folder for the spelling dictionary using code suggested by Doitsu and DiapDealer.

I have also incorporated code from gipsy to manage Greek letters.

@gipsy:I had to represent the Greek characters as unicode numbers since my editor cannot handle unicode characters! If you get time, please check that the code works for Greek texts in case I have mistyped the unicode numbers.

gipsy 09-09-2015 07:25 PM

@CalibUser
Change them to this and there are fine :)

EDIT: Sorry they didn't work with the replace in unicode code

For example the "γΰρω" is changed to "γ\u03CDρω"

EDIT 2: For some reason the hyphen doesn't work at me now. :blink:

I think I found the reason...
In windows...
The ePubTidyTool.json has the DictFile path as
Code:

"DictFile": "C:\\Users\\pm\\AppData\\Local\\sigil-ebook\\sigil\\user_dictionaries\\WordDictionary.txt",
to the previously version was
Code:

  "DictFile": "C:/Users/pm/AppData/Local/sigil-ebook/sigil/hunspell_dictionaries/WordDictionary.txt",

Doitsu 09-10-2015 06:07 AM

Quote:

Originally Posted by gipsy (Post 3167868)
EDIT: Sorry they didn't work with the replace in unicode code

For example the "γΰρω" is changed to "γ\u03CDρω"

Because of the idiotic rather counterintuitive way that Python handles Unicode strings, you'll have to use the actual characters instead of the Unicode codes if you want to avoid the whole Python Unicode encode/decode mess.

Change the following line from:

Code:

        CorrectText("Changed \u03CD to \u03B0", r'\u03B0', r'\u03CD')
to

Code:

        CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ')
This'll change γΰρω to γύρω.

gipsy 09-10-2015 06:14 AM

That's correct Doitsu :P
i'm gonna send the code to CalibUser because his editor cannot handle greek characters.

gipsy 09-14-2015 05:03 AM

CalibUser if you can copy-paste them in your editor those are some fixes for now.
Or tell me how to send them to you :)
Code:

#------------------------ Greek character corrections -------------

        #Fixes '…' when PDFd as ...
        CorrectText("Changed ... to …", r'\.\.\.', r'…')

        #Fixes 'στη' when PDFd as σιη
        CorrectText("Changed σιη to στη", r'σιη', r'στη')

        #Fixes 'στη' when PDFd as σιη
        CorrectText("Changed σιη to στη", r' σι(ον|ο) ', r' στ\1 ')

        #Fixes 'στις' when PDFd as σιις
        CorrectText("Changed σιις to στις", r'σιις', r'στις')
       
        #Fixes 'Άκουσ' when PDFd as Ακόυσ
        CorrectText("Changed Ακόυσ to Άκουσ", r'Ακόυσ', r'Άκουσ')
       
        #Fixes 'γι’' when PDFd as γΓ,γΡ
        CorrectText("Changed γΓ γΡ to γι’", r'(γΓ|γΡ)', r'γι’')

        #Fixes 'ντι' when PDFd as νπ
        CorrectText("Changed νπ to ντι", r'νπ', r'ντι')
       
        #Fixes 'Γι’' when PDFd as ΓΓ
        CorrectText("Changed ΓΓ to Γι’", r'ΓΓ ', r'Γι’ ')

        #Fixes 'σχεδίαζ' when PDFd as σχέδιαζ
        CorrectText("Changed σχέδιαζ to σχεδίαζ", r'σχέδιαζ', r'σχεδίαζ')
       
        #Fixes '\u0388' when PDFd as 'E "E
        CorrectText("Changed 'E,\"E to \u0388", r'(\'|\")(\u0395)', r'Έ')

        #Fixes \u038E when PDFd as 'Y or "Y
        CorrectText("Changed 'Y,\"Y to \u038E", r'(\'|\")(\u03A5)', r'Ύ')

        #Fixes \u038A when PDFd as 'I or "I
        CorrectText("Changed 'I,\"I to \u038A", r'(\'|\")(\u0399)', r'Ί')

        #Fixes \u038C when PDFd as 'O or "O
        CorrectText("Changed 'O,\"O to \u038C", r'(\'|\")(\u039F)', r'Ό')

        #Fixes \u0386 when PDFd as 'A or "A
        CorrectText("Changed 'A,\"A to \u0386", r'(\'|\")(\u0391)', r'Ά')

        #Fixes \u0389 when PDFd as 'H or "H
        CorrectText("Changed 'H,\"H to \u0389", r'(\'|")(\u0397)', r'Ή')

        #Fixes \u038F when PDFd as '\u03C9 or "\u03C9
        CorrectText("Changed '\u03C9,\"\u03C9 to \u038F", r'(\'|\")(\u03C9)', r'Ώ')

        #Fixes \u03CD when PDFd as \u03B0
        CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ')

        #Fixes \u03CD when PDFd as \u03B0
        CorrectText("Changed ε' to έ", r'ε\'', r'έ')


CalibUser 09-14-2015 03:13 PM

I have updated the plugin to process Greek errors as suggested by Gipsy - I haven't been able to test the update using a Greek text as I am not familiar with this language.

gipsy 09-14-2015 03:15 PM

Quote:

Originally Posted by CalibUser (Post 3170562)
I have updated the plugin to process Greek errors as suggested by Gipsy - I haven't been able to test the update using a Greek text as I am not familiar with this language.

I'm gonna test them :P
Thanks CalibUser

They work fine. The only problem is that it doesn't process the Hyphens. Maybe windows doesn't recognize the path in ePubTidyTool.json
Code:

  "DictFile": "C:\\Users\\owner\\AppData\\Local\\sigil-ebook\\sigil\\user_dictionaries\\WordDictionary.txt",


All times are GMT -4. The time now is 08:29 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.