11-01-2015, 05:34 PM | #106 |
Grand Sorcerer
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I would hope so. Not to mention using the sigil_bs4 module that all plugins should already have access to. 0.8.901 should really have everything needed to run this plugin with no extra installations on Windows and OS X (and probably Linux if Sigil was built on the machine). But I've no idea if this plugin is constructed to make use of it all or not.
Last edited by DiapDealer; 11-01-2015 at 05:37 PM. |
11-02-2015, 03:18 AM | #107 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
Hi CalibUser,
something doesn't work fine in greek right now It changes the "δυνατόν περισσότερους ναυαγούς" to "δυνατόό περισσότεροο ναυαγοο" but i can't figure why. The code from 0.1.1.5 is the same Maybe it's something in Code:
def IsFixO(m): """ This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled. It is called by a regular expression function (re.sub) in FixCommonErrors() It returns the original expression if the checked word is not in the dictionary, otherwise it returns the word without the ώ fixed """ FixO=m.group(1)+"ώ"+m.group(3) FixO2=m.group(1)+m.group(2)+m.group(3) if spell(FixO2): return(m.group(1)+m.group(2)+m.group(3)) elif spell(FixO): print("FixΏ: ",FixO2, " changed to ", FixO) return(m.group(1)+"ώ"+m.group(3)) else: return(m.group(1)+m.group(2)+m.group(3)) -------------------------------------------- #Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) Code:
Changes made =============== ώ fixes 3 Code:
#------------------------ Greek character corrections ------------- #Fixes '…' when PDFd as ... CorrectText("Changed ... to …", r'\.\.\.', r'…') #Fixes 'η' when PDFd as ΐ] CorrectText("Changed ΐ] to η", r'ΐ]', r'η') #Fixes 'στη' when PDFd as σιη CorrectText("Changed σιη to στη", r'σιη', r'στη') #Fixes 'στ(η|ο|ον|α|ις|ην)' when PDFd as '"οτ(η|ο|ον|α|ις|ην)' CorrectText("Changed οτ(η|ο|ον|α|ις|ην) to στ(η|ο|ον|α|ις|ην)", r' οτ(η|ο|ον|α|ις|ην) ', r' στ\1 ') #Fixes 'των' when PDFd as 'τ(οι|οι)ν' CorrectText("Changed τ(οι|ιο)ν to των", r' τ(οι|ιο)ν ', r' των ') #Fixes 'ού' when PDFd as 'οιί' CorrectText("Changed οιί to ού", r'οιί', r'ού') #Fixes 'στις' when PDFd as σιις CorrectText("Changed σιις to στις", r'σιις', r'στις') #Fixes 'στ(η|ο|ον|ην)' when PDFd as οτ(η|ο|ον|ην) CorrectText("Changed οτ(η|ο|ον|ην) to στ(η|ο|ον|ην)", r' οτ(η|ο|ον|ην) ', r'στ\1') #Fixes 'στ(ο|ου|α)' when PDFd as σι(ο|ου|α) CorrectText("Changed σι(ο|ου|α) to στ(ο|ου|α)", r' σι(ο|ου|α)', r'στ\1') #Fixes 'ώ' when PDFd as ο'ι CorrectText("Changed ο'ι to ώ", r'(ο\'ι|\(ί\))', r'ώ') #Fixes 'Άκουσ' when PDFd as Ακόυσ CorrectText("Changed Ακόυσ to Άκουσ", r'Ακόυσ', r'Άκουσ') #Fixes 'γι’' when PDFd as γΓ,γΡ CorrectText("Changed γΓ γΡ to γι’", r'(γΓ|γΡ)', r'γι’') #Fixes 'ντι' when PDFd as νπ CorrectText("Changed νπ to ντι", r'νπ', r'ντι') #Fixes 'Γι’' when PDFd as ΓΓ CorrectText("Changed ΓΓ to Γι’", r'ΓΓ ', r'Γι’ ') #Fixes 'σχεδίαζ' when PDFd as σχέδιαζ CorrectText("Changed σχέδιαζ to σχεδίαζ", r'σχέδιαζ', r'σχεδίαζ') #Fixes '\u0388' when PDFd as 'E "E CorrectText("Changed 'E,\"E to \u0388", r'(\'|\")(\u0395)', r'Έ') #Fixes \u038E when PDFd as 'Y or "Y CorrectText("Changed 'Y,\"Y to \u038E", r'(\'|\")(\u03A5)', r'Ύ') #Fixes \u038A when PDFd as 'I or "I CorrectText("Changed 'I,\"I to \u038A", r'(\'|\")(\u0399)', r'Ί') #Fixes \u038C when PDFd as 'O or "O CorrectText("Changed 'O,\"O to \u038C", r'(\'|\")(\u039F)', r'Ό') #Fixes \u0386 when PDFd as 'A or "A CorrectText("Changed 'A,\"A to \u0386", r'(\'|\")(\u0391)', r'Ά') #Fixes \u0389 when PDFd as 'H or "H CorrectText("Changed 'H,\"H to \u0389", r'(\'|")(\u0397)', r'Ή') #Fixes \u038F when PDFd as '\u03C9 or "\u03C9 CorrectText("Changed '\u03C9,\"\u03C9 to \u038F", r'(\'|\")(\u03C9)', r'Ώ') #Fixes \u03CD when PDFd as \u03B0 CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ') #Fixes \u03CD when PDFd as \u03B0 CorrectText("Changed ε' to έ", r'ε\'', r'έ') #Fixes ς Character when PDFd as ςCharacter CorrectText("Changed ςCharacter to ς Character", r'ς([\u0370-\u03CE])', r'ς \1') |
Advert | |
|
11-02-2015, 03:30 AM | #108 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
It's from the FixW and FixO
I comment them and it works. It's possible that the plugin get "confused" because i use the same CorrectText? Code:
#Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) #Fixes ω in words that are misspelled CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) |
11-02-2015, 02:31 PM | #109 | |||
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Quote:
Quote:
Quote:
Unfortunately I am not able to test the plugin with Greek texts - I will try to look at what is happening when I get time! Does the checkbox for checking Greek code having an effect on the outcome? Apologies - I will include these in the next update!! |
|||
11-02-2015, 03:08 PM | #110 | |
Grand Sorcerer
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Yes. If you're importing BeautifulSoup now, when you're ready to make the switch, you'll should be able to use something like: Code:
from sigil_bs4 import BeautifulSoup If you need help making sure everything works with the bundled version of Python that comes with 0.8.9+ (while still working with an external Python 3.4), just ask. There's plenty of people that can help. Last edited by DiapDealer; 11-02-2015 at 07:11 PM. Reason: Fix egregious typo |
|
Advert | |
|
11-02-2015, 04:20 PM | #111 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
CalibUser i think i found it. I'm gonna test it tommorow.
Code:
#Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) #Fixes ω in words that are misspelled CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) |
11-07-2015, 04:17 AM | #112 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
@CalibUser
If you change the Code:
#Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) #Fixes ω in words that are misspelled CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) Code:
#Fixes ώ in words that are misspelled CorrectText("ώ fixes",r"(\w+)(ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO) #Fixes ω in words that are misspelled CorrectText("ω fixes",r"(\w+)(ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) |
11-07-2015, 02:22 PM | #113 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have corrected the plugin so that it should fix words that include ώ and ώ in Greek texts - thanks for the fix, gipsy.
I have also included the code that was supplied by gipsy that I omitted from the last version of the plugin for correcting Greek texts . The updated plugin can be found in the first post in this thread. Thanks, DiapDealer - however, I am a only a hobbyist programmer and when I test my code I sometimes make silly errors such as syntax errors. I think I would be wasting a lot of other's time if I posted code that I had not tested, so I prefer to wait until the next version of Sigil is stable so that I can release code that I have done some testing on. |
11-07-2015, 02:26 PM | #114 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
Ooops! Having just said that I will wait until the next version of Sigil is stable, I have just seen the post that stating that Sigil 0.9 is available.
I will update my plugin so that it uses the in-built features of Sigil 0.9 soon. |
11-07-2015, 03:46 PM | #115 |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Hi,
Thanks for a great plug in. I've been using an earlier release (0.1.1.1.1). I will wait for the 0.9.xx release of your plugin as I had a issue with Beautiful Soup install (win64 - Win 8.1) when I went to upgrade. For years now I've been slowly building a saved search to tidy up epubs. I think there are a few in my total list that could be added to ePubTidyTool. I've got searches for Joining Paragraphs, Split Names Mr. , Mrs. Etc, broken or Split Speach, Common OCR Spelling Mistakes. If you are interested I can send my sigil_searches.ini? I've snipped a small bit out of the Contractions section as a sample. I need to go back and standardise these, some (but not all) include lower or upper case and punctuation. Code:
15\Name=Common Fixes/Contractions/ard 15\Find=\\\x2018\x61rd 15\Replace=\x2019\x61rd 16\Name=Common Fixes/Contractions/bout 16\Find=\\\x2018([Bb])out 16\Replace=\x2019\\1out 17\Name=Common Fixes/Contractions/bye 17\Find=\\\x201c\\\x2018([B|b])ye([\\p{P}|\\s]) 17\Replace="\x201c \x2019\\1ye\\2" 18\Name=Common Fixes/Contractions/appen 18\Find=\\\x2018([Aa])ppen([\\p{P}|\\s]) 18\Replace=\x2019\\1ppen\\2 19\Name=Common Fixes/Contractions/atasad 19\Find=\\\x2018\x61([tsd])([\\p{P}|\\s]) 19\Replace=\x2019\x61\\1\\2 20\Name=Common Fixes/Contractions/Ave 20\Find="\x2018([Aa])ve " 20\Replace="\x2019\\1ve " 21\Name=Common Fixes/Contractions/Cept 21\Find=\\\x2018([Cc])ept 21\Replace=\x2019\\1ept 22\Name=Common Fixes/Contractions/couse 22\Find=\\\x2018([Cc])ourse 22\Replace=\x2019\\1ourse 23\Name=Common Fixes/Contractions/cos 23\Find="\x2018([Cc])os " 23\Replace="\x2019\\1os " 24\Name=Common Fixes/Contractions/cause 24\Find=\\\x2018([Cc])ause 24\Replace=\x2019\\1ause 25\Name=Common Fixes/Contractions/cause2 25\Find=\\\x201c([Cc])ause(?![\x201d\x2019]) 25\Replace=\x201c \x2019\\1ause 26\Name=Common Fixes/Contractions/Cuz 26\Find="\x201c([C|c])uz " 26\Replace="\x201c \x2019\\1uz " 27\Name=Common Fixes/Contractions/em 27\Find=([\x2018\x201c])em([\\p{P}|\\s]) 27\Replace=\x2019\x65m\\2 28\Name=Common Fixes/Contractions/ell 28\Find=\\\x2018\x65ll\\s 28\Replace="\x2019\x65ll " 29\Name=Common Fixes/Contractions/Ere 29\Find=\\\x2018([Ee])re 29\Replace=\x2019\\1re 30\Name=Common Fixes/Contractions/er 30\Find=\x2018\x65r([\\p{P}|\\s]) 30\Replace=\x2019\x65r\\1 31\Name=Common Fixes/Contractions/e 31\Find=\\\x2018([Ee])([\\p{P}|\\s]) 31\Replace=\x2019\\1\\2 32\Name=Common Fixes/Contractions/ee 32\Find=\x2018\x65\x65([\\p{P}|\\s]) 32\Replace=\x2019\x65\x65\\1 33\Name=Common Fixes/Contractions/ear 33\Find=\x2018\x65\x61r([\\p{P}|\\s]) 33\Replace=\x2019\x65\x61r\\1 size=293 34\Name=Common Fixes/Contractions/eard 34\Find=\\\x2018\x65\x61rd 34\Replace=\x2019\x65\x61rd 35\Name=Common Fixes/Contractions/Fraid 35\Find=\\\x2018([Ff])raid([\\p{P}|\\s]) 35\Replace=\x2019\\1raid\\2 36\Name=Common Fixes/Contractions/fore 36\Find=\x2018([Ff])ore\\s 36\Replace="\x2019\\1ore " 37\Name=Common Fixes/Contractions/im |
11-08-2015, 07:59 AM | #116 | |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
You're welcome. I'm glad you find it useful.
Quote:
Although the plugin already contains code for some of the things you mentioned (eg Joining Paragraphs and Split Names Mr. , Mrs. , etc) if your code improves on the code in the plugin or if your code can, eg, join paragraphs that are not covered by the plugin, then I would be very keen to include your code for these functions. Split speeches have been problematic; I have not had time to develop code that can cope with this problem. At present I use a few manual search and replace regex expressions for this (not yet included in the plugin) but I would like to automate this if possible. I would like to adapt your expressions for fixing split speeches if possible, particularly if these can automate the process. I have looked at your sample contractions; many of these could go in the file that contains a customised list of words to be corrected automatically; the contractions that could not go in this file are those that use the pipe (|) character - this is used to separate the incorrect word from the correct word in the customised word list; I need to consider an alternative character to use in this file so that the pipe character can be used in expressions. Can anybody see a problem if the character ¬ is used as the separator (other suggestions welcome)? Before I add any more features to the plugin I would like to rewrite the code so that it uses the facilities provided by Sigil 0.9; I will not be able to start on this before next weekend! Meanwhile, if you could post a file in the format that is described in the section 'Using a customised list of words that are corrected automatically' in the manual for this plugin that contains (1) common OCR spelling mistakes from your searches and (2) corrections to contractions (and anything else) that do not use the pipe character , then I can append it to the file IncorrectWords.txt that is in the first post for other users to use. |
|
11-09-2015, 04:16 AM | #117 | |
Connoisseur
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
|
Glad you think they will be useful, here are the OCR errors. I'll work on the tidying the others.
The items in the list above with the pipe are usually just where the regex allowed for upper or lower case at the beginning or when it allowed for punctuation or a space after the text.([\p{P}|\s]). I assume you are doing this in the tool? Quote:
|
|
11-10-2015, 02:35 PM | #118 |
Addict
Posts: 201
Karma: 62362
Join Date: Jul 2015
Device: Sony
|
I have updated IncorrectWords.txt with the list provided by Steadyhands except for:
al!|all This is because the plugin will only replace words that are incorrectly spelt and ignores the surrounding punctuations marks. If the plugin did not do this, for example, if the plugin replaced the expression al! with all then correctly spelt words would be amended (eg dismal! would be replaced with dismall). Also, if a book contained the expression et al! then this would become et all. NB The plugin will examine an ePub book to determine the type of apostrophe that is used (straight or curly) and will use the appropriate type when the text is replaced in the book. A straight apostrophe should be used in IncorrectWords.txt so that the plugin uses this feature. |
11-10-2015, 05:15 PM | #119 |
Enthusiast
Posts: 28
Karma: 10
Join Date: Dec 2011
Device: PRS-T1
|
Thanks for this plugin.
I can't get it to work, though. It is probably a problem on my end. I'm running Windows 7 64 bit. I've tried it with Sigil 0.8.7, 0.8.9, and 0.9. I've tried it with the auto set option and manually telling it where python 3.4.3 is and I keep getting the same result: Status: failed Traceback (most recent call last): File "C:\Program Files\Sigil\plugin_launchers\python\launcher.py", line 134, in launch target_script = __import__(script_module) File "C:\Users\Edwin\AppData\Local\sigil-ebook\sigil\plugins\ePubTidyTool\plugin.py", line 27, in <module> from ManualWordChecker import cManualWordCheck File "C:\Users\Edwin\AppData\Local\sigil-ebook\sigil\plugins\ePubTidyTool\ManualWordChecker .py", line 9, in <module> from bs4 import BeautifulSoup ImportError: No module named 'bs4' Error: No module named 'bs4' Anyone know what I did wrong? Thanks. |
11-10-2015, 05:59 PM | #120 |
Connoisseur
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
|
You must have installed BeautifulSoup to use the ManualWordCheck
See Appendix 1 in ePub tidy tool v0.1.1.6.epub |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM |
developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM |
Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM |
Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM |
Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |