|  11-01-2015, 05:34 PM | #106 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I would hope so. Not to mention using the sigil_bs4 module that all plugins should already have access to. 0.8.901 should really have everything needed to run this plugin with no extra installations on Windows and OS X (and probably Linux if Sigil was built on the machine). But I've no idea if this plugin is constructed to make use of it all or not.
		 Last edited by DiapDealer; 11-01-2015 at 05:37 PM. | 
|   |   | 
|  11-02-2015, 03:18 AM | #107 | 
| Connoisseur  Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD | 
			
			Hi CalibUser,  something doesn't work fine in greek right now  It changes the "δυνατόν περισσότερους ναυαγούς" to "δυνατόό περισσότεροο ναυαγοο" but i can't figure why. The code from 0.1.1.5 is the same  Maybe it's something in Code: def IsFixO(m):
	"""
	This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ώ fixed
	"""
	FixO=m.group(1)+"ώ"+m.group(3)
	FixO2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixO2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixO):
		print("FixΏ: ",FixO2, " changed to ", FixO)
		return(m.group(1)+"ώ"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))
--------------------------------------------
				#Fixes ώ in words that are misspelled
				CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)Code: Changes made =============== ώ fixes 3 Code: #------------------------ Greek character corrections -------------
	#Fixes '…' when PDFd as ...
	CorrectText("Changed ... to …", r'\.\.\.', r'…')
	#Fixes 'η' when PDFd as ΐ]
	CorrectText("Changed ΐ] to η", r'ΐ]', r'η')
	
	#Fixes 'στη' when PDFd as σιη
	CorrectText("Changed σιη to στη", r'σιη', r'στη')
	#Fixes 'στ(η|ο|ον|α|ις|ην)' when PDFd as  '"οτ(η|ο|ον|α|ις|ην)'
	CorrectText("Changed οτ(η|ο|ον|α|ις|ην) to στ(η|ο|ον|α|ις|ην)", r' οτ(η|ο|ον|α|ις|ην) ', r' στ\1 ')
	#Fixes 'των' when PDFd as  'τ(οι|οι)ν'
	CorrectText("Changed τ(οι|ιο)ν to των", r' τ(οι|ιο)ν ', r' των ')
	#Fixes 'ού' when PDFd as  'οιί'
	CorrectText("Changed οιί to ού", r'οιί', r'ού')
	#Fixes 'στις' when PDFd as σιις
	CorrectText("Changed σιις to στις", r'σιις', r'στις')
	#Fixes 'στ(η|ο|ον|ην)' when PDFd as οτ(η|ο|ον|ην)
	CorrectText("Changed οτ(η|ο|ον|ην) to στ(η|ο|ον|ην)", r' οτ(η|ο|ον|ην) ', r'στ\1')
	#Fixes 'στ(ο|ου|α)' when PDFd as  σι(ο|ου|α)
	CorrectText("Changed σι(ο|ου|α) to στ(ο|ου|α)", r' σι(ο|ου|α)', r'στ\1')
	#Fixes 'ώ' when PDFd as ο'ι
	CorrectText("Changed ο'ι to ώ", r'(ο\'ι|\(ί\))', r'ώ')
	
	#Fixes 'Άκουσ' when PDFd as Ακόυσ
	CorrectText("Changed Ακόυσ to Άκουσ", r'Ακόυσ', r'Άκουσ')
	
	#Fixes 'γι’' when PDFd as γΓ,γΡ
	CorrectText("Changed γΓ γΡ to γι’", r'(γΓ|γΡ)', r'γι’')
	#Fixes 'ντι' when PDFd as νπ
	CorrectText("Changed νπ to ντι", r'νπ', r'ντι')
	
	#Fixes 'Γι’' when PDFd as ΓΓ
	CorrectText("Changed ΓΓ to Γι’", r'ΓΓ ', r'Γι’ ')
	#Fixes 'σχεδίαζ' when PDFd as σχέδιαζ
	CorrectText("Changed σχέδιαζ to σχεδίαζ", r'σχέδιαζ', r'σχεδίαζ')
	
	#Fixes '\u0388' when PDFd as 'E "E
	CorrectText("Changed 'E,\"E to \u0388", r'(\'|\")(\u0395)', r'Έ')
	#Fixes \u038E when PDFd as 'Y or "Y
	CorrectText("Changed 'Y,\"Y to \u038E", r'(\'|\")(\u03A5)', r'Ύ')
	#Fixes \u038A when PDFd as 'I or "I
	CorrectText("Changed 'I,\"I to \u038A", r'(\'|\")(\u0399)', r'Ί')
	#Fixes \u038C when PDFd as 'O or "O
	CorrectText("Changed 'O,\"O to \u038C", r'(\'|\")(\u039F)', r'Ό')
	#Fixes \u0386 when PDFd as 'A or "A
	CorrectText("Changed 'A,\"A to \u0386", r'(\'|\")(\u0391)', r'Ά')
	#Fixes \u0389 when PDFd as 'H or "H
	CorrectText("Changed 'H,\"H to \u0389", r'(\'|")(\u0397)', r'Ή')
	#Fixes \u038F when PDFd as '\u03C9 or "\u03C9
	CorrectText("Changed '\u03C9,\"\u03C9 to \u038F", r'(\'|\")(\u03C9)', r'Ώ')
	#Fixes \u03CD when PDFd as \u03B0
	CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ')
	#Fixes \u03CD when PDFd as \u03B0
	CorrectText("Changed ε' to έ", r'ε\'', r'έ')
	#Fixes ς Character when PDFd as ςCharacter
	CorrectText("Changed ςCharacter to ς Character", r'ς([\u0370-\u03CE])', r'ς \1') | 
|   |   | 
|  11-02-2015, 03:30 AM | #108 | 
| Connoisseur  Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD | 
			
			It's from the FixW and FixO I comment them and it works. It's possible that the plugin get "confused" because i use the same CorrectText?  Code: 				#Fixes ώ in words that are misspelled
				CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
				#Fixes ω in words that are misspelled
				CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW) | 
|   |   | 
|  11-02-2015, 02:31 PM | #109 | |||
| Addict            Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony | Quote: 
 Quote: 
 Quote: 
 Unfortunately I am not able to test the plugin with Greek texts - I will try to look at what is happening when I get time! Does the checkbox for checking Greek code having an effect on the outcome? Apologies - I will include these in the next update!! | |||
|   |   | 
|  11-02-2015, 03:08 PM | #110 | |
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
 Yes. If you're importing BeautifulSoup now, when you're ready to make the switch, you'll should be able to use something like: Code: from sigil_bs4 import BeautifulSoup If you need help making sure everything works with the bundled version of Python that comes with 0.8.9+ (while still working with an external Python 3.4), just ask. There's plenty of people that can help. Last edited by DiapDealer; 11-02-2015 at 07:11 PM. Reason: Fix egregious typo | |
|   |   | 
|  11-02-2015, 04:20 PM | #111 | 
| Connoisseur  Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD | 
			
			CalibUser i think i found it. I'm gonna test it tommorow. Code: 				#Fixes ώ in words that are misspelled
				CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
				#Fixes ω in words that are misspelled
				CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW)  | 
|   |   | 
|  11-07-2015, 04:17 AM | #112 | 
| Connoisseur  Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD | 
			
			@CalibUser If you change the Code: 				#Fixes ώ in words that are misspelled
				CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
				#Fixes ω in words that are misspelled
				CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW)Code: 				#Fixes ώ in words that are misspelled
				CorrectText("ώ fixes",r"(\w+)(ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
				#Fixes ω in words that are misspelled
				CorrectText("ω fixes",r"(\w+)(ιίι|\(ό|ο\)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW)  | 
|   |   | 
|  11-07-2015, 02:22 PM | #113 | 
| Addict            Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony | 
			
			I have corrected the plugin so that it should fix words that include ώ and ώ in Greek texts - thanks for the fix, gipsy. I have also included the code that was supplied by gipsy that I omitted from the last version of the plugin for correcting Greek texts . The updated plugin can be found in the first post in this thread. Thanks, DiapDealer - however, I am a only a hobbyist programmer and when I test my code I sometimes make silly errors such as syntax errors. I think I would be wasting a lot of other's time if I posted code that I had not tested, so I prefer to wait until the next version of Sigil is stable so that I can release code that I have done some testing on. | 
|   |   | 
|  11-07-2015, 02:26 PM | #114 | 
| Addict            Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony | 
			
			Ooops! Having just said that I will wait until the next version of Sigil is stable, I have just seen the post that stating that Sigil 0.9 is available. I will update my plugin so that it uses the in-built features of Sigil 0.9 soon. | 
|   |   | 
|  11-07-2015, 03:46 PM | #115 | 
| Connoisseur  Posts: 57 Karma: 10 Join Date: Dec 2011 Device: Samsung Tablet | 
			
			Hi, Thanks for a great plug in. I've been using an earlier release (0.1.1.1.1). I will wait for the 0.9.xx release of your plugin as I had a issue with Beautiful Soup install (win64 - Win 8.1) when I went to upgrade. For years now I've been slowly building a saved search to tidy up epubs. I think there are a few in my total list that could be added to ePubTidyTool. I've got searches for Joining Paragraphs, Split Names Mr. , Mrs. Etc, broken or Split Speach, Common OCR Spelling Mistakes. If you are interested I can send my sigil_searches.ini? I've snipped a small bit out of the Contractions section as a sample. I need to go back and standardise these, some (but not all) include lower or upper case and punctuation. Code: 15\Name=Common Fixes/Contractions/ard
15\Find=\\\x2018\x61rd
15\Replace=\x2019\x61rd
16\Name=Common Fixes/Contractions/bout
16\Find=\\\x2018([Bb])out
16\Replace=\x2019\\1out
17\Name=Common Fixes/Contractions/bye
17\Find=\\\x201c\\\x2018([B|b])ye([\\p{P}|\\s])
17\Replace="\x201c \x2019\\1ye\\2"
18\Name=Common Fixes/Contractions/appen
18\Find=\\\x2018([Aa])ppen([\\p{P}|\\s])
18\Replace=\x2019\\1ppen\\2
19\Name=Common Fixes/Contractions/atasad
19\Find=\\\x2018\x61([tsd])([\\p{P}|\\s])
19\Replace=\x2019\x61\\1\\2
20\Name=Common Fixes/Contractions/Ave
20\Find="\x2018([Aa])ve "
20\Replace="\x2019\\1ve "
21\Name=Common Fixes/Contractions/Cept
21\Find=\\\x2018([Cc])ept
21\Replace=\x2019\\1ept
22\Name=Common Fixes/Contractions/couse
22\Find=\\\x2018([Cc])ourse
22\Replace=\x2019\\1ourse
23\Name=Common Fixes/Contractions/cos
23\Find="\x2018([Cc])os "
23\Replace="\x2019\\1os "
24\Name=Common Fixes/Contractions/cause
24\Find=\\\x2018([Cc])ause
24\Replace=\x2019\\1ause
25\Name=Common Fixes/Contractions/cause2
25\Find=\\\x201c([Cc])ause(?![\x201d\x2019])
25\Replace=\x201c \x2019\\1ause
26\Name=Common Fixes/Contractions/Cuz
26\Find="\x201c([C|c])uz "
26\Replace="\x201c \x2019\\1uz "
27\Name=Common Fixes/Contractions/em
27\Find=([\x2018\x201c])em([\\p{P}|\\s])
27\Replace=\x2019\x65m\\2
28\Name=Common Fixes/Contractions/ell
28\Find=\\\x2018\x65ll\\s
28\Replace="\x2019\x65ll "
29\Name=Common Fixes/Contractions/Ere
29\Find=\\\x2018([Ee])re
29\Replace=\x2019\\1re
30\Name=Common Fixes/Contractions/er
30\Find=\x2018\x65r([\\p{P}|\\s])
30\Replace=\x2019\x65r\\1
31\Name=Common Fixes/Contractions/e
31\Find=\\\x2018([Ee])([\\p{P}|\\s])
31\Replace=\x2019\\1\\2
32\Name=Common Fixes/Contractions/ee
32\Find=\x2018\x65\x65([\\p{P}|\\s])
32\Replace=\x2019\x65\x65\\1
33\Name=Common Fixes/Contractions/ear
33\Find=\x2018\x65\x61r([\\p{P}|\\s])
33\Replace=\x2019\x65\x61r\\1
size=293
34\Name=Common Fixes/Contractions/eard
34\Find=\\\x2018\x65\x61rd
34\Replace=\x2019\x65\x61rd
35\Name=Common Fixes/Contractions/Fraid
35\Find=\\\x2018([Ff])raid([\\p{P}|\\s])
35\Replace=\x2019\\1raid\\2
36\Name=Common Fixes/Contractions/fore
36\Find=\x2018([Ff])ore\\s
36\Replace="\x2019\\1ore "
37\Name=Common Fixes/Contractions/im | 
|   |   | 
|  11-08-2015, 07:59 AM | #116 | |
| Addict            Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony | 
			
			You're welcome. I'm glad you find it useful. Quote: 
 Although the plugin already contains code for some of the things you mentioned (eg Joining Paragraphs and Split Names Mr. , Mrs. , etc) if your code improves on the code in the plugin or if your code can, eg, join paragraphs that are not covered by the plugin, then I would be very keen to include your code for these functions. Split speeches have been problematic; I have not had time to develop code that can cope with this problem. At present I use a few manual search and replace regex expressions for this (not yet included in the plugin) but I would like to automate this if possible. I would like to adapt your expressions for fixing split speeches if possible, particularly if these can automate the process. I have looked at your sample contractions; many of these could go in the file that contains a customised list of words to be corrected automatically; the contractions that could not go in this file are those that use the pipe (|) character - this is used to separate the incorrect word from the correct word in the customised word list; I need to consider an alternative character to use in this file so that the pipe character can be used in expressions. Can anybody see a problem if the character ¬ is used as the separator (other suggestions welcome)? Before I add any more features to the plugin I would like to rewrite the code so that it uses the facilities provided by Sigil 0.9; I will not be able to start on this before next weekend! Meanwhile, if you could post a file in the format that is described in the section 'Using a customised list of words that are corrected automatically' in the manual for this plugin that contains (1) common OCR spelling mistakes from your searches and (2) corrections to contractions (and anything else) that do not use the pipe character , then I can append it to the file IncorrectWords.txt that is in the first post for other users to use. | |
|   |   | 
|  11-09-2015, 04:16 AM | #117 | |
| Connoisseur  Posts: 57 Karma: 10 Join Date: Dec 2011 Device: Samsung Tablet | 
			
			Glad you think they will be useful, here are the OCR errors. I'll work on the tidying the others. The items in the list above with the pipe are usually just where the regex allowed for upper or lower case at the beginning or when it allowed for punctuation or a space after the text.([\p{P}|\s]). I assume you are doing this in the tool? Quote: 
 | |
|   |   | 
|  11-10-2015, 02:35 PM | #118 | 
| Addict            Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony | 
			
			I have updated IncorrectWords.txt with the list provided by Steadyhands except for: al!|all This is because the plugin will only replace words that are incorrectly spelt and ignores the surrounding punctuations marks. If the plugin did not do this, for example, if the plugin replaced the expression al! with all then correctly spelt words would be amended (eg dismal! would be replaced with dismall). Also, if a book contained the expression et al! then this would become et all. NB The plugin will examine an ePub book to determine the type of apostrophe that is used (straight or curly) and will use the appropriate type when the text is replaced in the book. A straight apostrophe should be used in IncorrectWords.txt so that the plugin uses this feature. | 
|   |   | 
|  11-10-2015, 05:15 PM | #119 | 
| Enthusiast  Posts: 28 Karma: 10 Join Date: Dec 2011 Device: PRS-T1 | 
			
			Thanks for this plugin.  I can't get it to work, though. It is probably a problem on my end. I'm running Windows 7 64 bit. I've tried it with Sigil 0.8.7, 0.8.9, and 0.9. I've tried it with the auto set option and manually telling it where python 3.4.3 is and I keep getting the same result: Status: failed Traceback (most recent call last): File "C:\Program Files\Sigil\plugin_launchers\python\launcher.py", line 134, in launch target_script = __import__(script_module) File "C:\Users\Edwin\AppData\Local\sigil-ebook\sigil\plugins\ePubTidyTool\plugin.py", line 27, in <module> from ManualWordChecker import cManualWordCheck File "C:\Users\Edwin\AppData\Local\sigil-ebook\sigil\plugins\ePubTidyTool\ManualWordChecker .py", line 9, in <module> from bs4 import BeautifulSoup ImportError: No module named 'bs4' Error: No module named 'bs4' Anyone know what I did wrong? Thanks. | 
|   |   | 
|  11-10-2015, 05:59 PM | #120 | 
| Connoisseur  Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD | 
			
			You must have installed BeautifulSoup to use the ManualWordCheck See Appendix 1 in ePub tidy tool v0.1.1.6.epub | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Tidying Up My Kindle | selectortone | Calibre | 2 | 07-17-2013 10:35 AM | 
| developping a Plugin for Presentation files | abdlink | Plugins | 4 | 04-15-2013 11:27 AM | 
| Plugin to fix fb2 files | oviksna | Plugins | 3 | 01-28-2013 08:53 AM | 
| Tidying Up My Library | JayLaFunk | Library Management | 2 | 09-20-2011 09:12 AM | 
| Calibre 0.7.50 can't see plugin files | mb_webguy | Calibre | 5 | 04-29-2011 03:41 AM |