Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2015, 02:46 PM   #136
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
@CalibUser
I think i found it. I change the (\w+) to (\w*|\s) and it seems it works
Code:
#Fixes ώ in words that are misspelled
CorrectText("ώ fixes",r"(\w*|\s)(ο\'\)|ιίι|\(ό|ο\)|ίό|ο&gt;|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
I'm gonna test it more and i reply here
gipsy is offline   Reply With Quote
Old 11-18-2015, 06:57 PM   #137
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
Can someone check (and confirm) if the plugin bypass the first line of IncorectWords (and in custom)?

For example

if the IncorrectWords are like
Code:
WTiat|What
It doesn't fix the WTiat.

It fix it when you have the IncorrectWords like
Code:
WTiat|What
WTiat|What
Thanks
gipsy is offline   Reply With Quote
Old 11-18-2015, 09:11 PM   #138
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
I used to scan my books and did the same thing on notepad++.
It's a lot of work, but the errors differ from device to device, and ocr to ocr.
ProDigit is offline   Reply With Quote
Old 11-19-2015, 01:27 AM   #139
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
@CalibUser
I change some things in the CorrectTexts...
It works better but I get some unnecessary fixes (4 in a total of 200) but i can use the customised word list to fix them. (like ω, φυλάξοι that are spell correct but it's not the words the text have, I attach the custom text file).
All the Fix finds now the misspelled character in the whole word.
I also add a fix for φ that are as "η&gt;|«ρ|ηι|&lt;ρ|4&gt;|ιρ" after the OCR.
Code:
def IsFixP(m):
	"""
	FIXES Π 
	This function examines a word to see whether is required to fix the Π character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the Π fixed
	"""
	FixP=m.group(1)+"Π"+m.group(3)
	FixP2=m.group(1)+m.group(2)+m.group(3)

	if spell(FixP2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixP):
		print("FixP: ",FixP2, " changed to ", FixP)
		return (m.group(1)+'Π'+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))

def IsFixE(m):
	"""
	FIXES έ 
	This function examines a word to see whether is required to fix the έ character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the Π fixed
	"""
	FixE=m.group(1)+"έ"+m.group(2)
	FixE2=m.group(1)+"ύ"+m.group(2)
	if spell(FixE2):
		return(m.group(1)+"ύ"+m.group(2))
	elif spell(FixE):
		print("FixE: ",FixE2, " changed to ", FixE)
		return(m.group(1)+"έ"+m.group(2))
	else:
		return(m.group(1)+"ύ"+m.group(2))

def IsFixO(m):
	"""
	This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ώ fixed
	"""
	FixO=m.group(1)+"ώ"+m.group(3)
	FixO2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixO2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixO):
		print("FixΏ: ",FixO2, " changed to ", FixO)
		return(m.group(1)+"ώ"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))
	
def IsFixW(m):
	"""
	This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterς that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ω fixed
	"""
	FixW=m.group(1)+"ω"+m.group(3)
	FixW2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixW2):
		return(m.group(1)+m.group(2)+m.group(3))
	#elif spell(FixW2):
	#	return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixW):
		print("FixΩ: ",FixW2, " changed to ", FixW)
		return(m.group(1)+"ω"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))

def IsFixF(m):
	"""
	This function examines a word to see whether is required to fix the ((ρ|χρ|η&gt;|«ρ|ηι|&lt;ρ|4&gt;|ιρ) characterς that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ω fixed
	"""
	FixF=m.group(1)+"φ"+m.group(3)
	FixF2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixF2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixF):
		print("FixΦ: ",FixF2, " changed to ", FixF)
		return(m.group(1)+"φ"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))
Code:
				if useHunspellDict=="Yes":
					#Fixes Π in words that are misspelled
					CorrectText("Π fixes",r"(\w*|\s)(Ιΐ|1\ Ι|1\ Ι|1Ι|1I|ΓΙ|Γΐ|ΙΙ|II|Ι\ Ι|ΓΤ|ΙΊ|Ιί)[ ]?(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixP)
					#Fixes έ in words that are misspelled
					CorrectText("έ fixes",r"(\w+|\s)ύ(\w+|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixE)
					#Fixes ώ in words that are misspelled
					CorrectText("ώ fixes",r"(\w*|\s)(οί\)|νο'\)|α\)|οδ|οό|ιυ|άί|ο5|ο'\)|ιίι|\(ό|ο\)|ίό|ο&gt;|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
					#Fixes ω in words that are misspelled
					CorrectText("ω fixes",r"(\w*|\s)(οί\)|νο'\)|α\)|οδ|οό|ιυ|άί|ο5|ο'\)|ιίι|\(ό|ο\)|ίό|ο&gt;|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW)
					#Fixes φ in words that are misspelled
					CorrectText("φ fixes",r"(\w*|\s)(\(ρ|χρ|η&gt;|«ρ|ηι|&lt;ρ|4&gt;|ιρ)(\w*|\s)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixF)
EDIT: I attach and a IncorrectWords for greek.
Calib sometime in the future it's possible to make the greek fixes a different py file so i don't mess with your HTMLProcceror all the time?
Attached Files
File Type: txt custom.txt (115 Bytes, 417 views)
File Type: txt IncorrectWords.txt (2.6 KB, 446 views)

Last edited by gipsy; 11-19-2015 at 02:12 AM. Reason: add some more searches
gipsy is offline   Reply With Quote
Old 11-21-2015, 09:58 AM   #140
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
Plugin updated to version 0.2.0.0.3

I have updated the plugin in the first post in this thread.

Updates
The update fixes the bug reported by ovinio:

Quote:
Originally Posted by ovinio View Post
Is it possible to avoid this?
Code:
  <p><a href="https://www.mobileread.com/forums/">https://www.mobileread.com/forums/</a><br/></p>
changes to
  <p><a href="https://www.mobileread.com/forums/">http:’/www.mobileread.com’forums’</a><br/></p>
Plugin Runner output
and it contains updates for processing Greek text from gipsy.

Other queries

Quote:
Originally Posted by gipsy View Post
Can someone check (and confirm) if the plugin bypass the first line of IncorectWords (and in custom)?
I have checked this on Windows 7 (32 bit) and have not found this a problem. Is anybody else experiencing this issue?

@gipsy:

Quote:
Originally Posted by gipsy View Post
Calib sometime in the future it's possible to make the greek fixes a different py file so i don't mess with your HTMLProcceror all the time?
To be honest, I don't see myself having time to think about reorganising the functions into at least two files and debugging the new files.
CalibUser is offline   Reply With Quote
Old 11-30-2015, 02:34 PM   #141
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
Updated to version V0.2.0.0.4

As Sigil version 0.9.1 has been released, I have reinstated the ability of this plugin to load a named css style sheet; I had disabled this feature because Sigil version 0.9.0 had a bug that caused this feature of the plugin to corrupt ePub books. The bug has been fixed in Sigil version 0.9.1.

Warning: Do not use this version of the plugin with Sigil 0.9.0 as the bug in Sigil 0.9.0 will corrupt your ePub.
CalibUser is offline   Reply With Quote
Old 12-04-2015, 07:58 AM   #142
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
With Sigil 0.9.1 the 2.0.0.4 ver of the plugin doesn't run well.
It stucks to "The selected html files do not contain span tags". If i let it for some time and hit Close it make some changes and i get a Status: success.
gipsy is offline   Reply With Quote
Old 12-04-2015, 02:06 PM   #143
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
@gipsy: Thanks for the bug report. I haven't experienced this problem. I will look into it.
CalibUser is offline   Reply With Quote
Old 12-04-2015, 03:02 PM   #144
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
@CalibUser maybe it has to do with the in-build dictionary.
I change the name for the greek (in HTMLProcessor.py) and it's not display that the dictionary is missing. With the name changed, the plugins runs smoothly.
After that I changed the in-build dictionary to a different dictionary and it stuck again.

EDIT: Test it in Windows 8.1 and Windows 10, i check and the 2.0.0.2 release of the plugin runs in those and with Sigil 0.9.1

Last edited by gipsy; 12-04-2015 at 03:04 PM.
gipsy is offline   Reply With Quote
Old 12-08-2015, 02:04 PM   #145
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
@gipsy: I have not been able to reproduce the fault. Please let me now which options you are ticking in the plugin, also whether you are processing header tags and whether you are using any options for chapter headings.
Thanks.
CalibUser is offline   Reply With Quote
Old 12-08-2015, 03:29 PM   #146
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
@CalibUser i uninstall the sigil and the plugin, also delete all related folders.
Reinstall them both. I only tick Process Greek characters only and Use in-build dictionary Greek* for the following testing.
I don't Process header tags, or chapters i also don't Select any files for automatic-manual spell check.
  1. With the use in-build dictionary ticked and the el_GR presend in C:\Program Files\Sigil\hunspell_dictionaries after the Process text the plugin running and running ... without any changes.

  2. With the use in-build dictionary unticked...
    The plugin runs.

    Status: success
    Changes made
    ===============
  3. With the use in-build dictionary ticked and the el_GR renamed so the plugin cannot find it in C:\Program Files\Sigil\hunspell_dictionaries after the Process text the plugin runs again

    Status: success
    Changes made
    ===============
    Please click OK to close the Plugin Runner window.

I attach a test epub and the el_GR hunspell so someone test it in windows 7 for example.
Attached Files
File Type: rar test.rar (3.44 MB, 371 views)

Last edited by gipsy; 12-08-2015 at 03:56 PM.
gipsy is offline   Reply With Quote
Old 12-09-2015, 02:04 PM   #147
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
Updated to version V0.2.0.0.4A

@Gipsy: Thanks for providing the test code and the Greek components of the Hunspell dictionary. I traced the error to the code that "Fixes Π in words that are misspelled". There was a round bracket missing from the code. I have added this in and now the plugin works properly in the tests that I have been running.

I have updated the plugin in the first post for this thread.
CalibUser is offline   Reply With Quote
Old 12-09-2015, 02:28 PM   #148
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
Works fine now CalibUser.
Thanks!
gipsy is offline   Reply With Quote
Old 12-21-2015, 02:37 AM   #149
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Here is a few suggested additions to your truncated words list for the apostrophes in wrong direction. I also made two other small changes also to suit my perferences, I added the (?i) to ignore the case - this is to catch the instances where the first letter has been capitalised and I prefer to look for punctuation / space combination at the end
Code:
[ ]?‘(?i)(ad|at|appen|ard|ave|bout|bye|cause|cept|cos|cuz|couse|eard|em|er|e|ee|ell|fraid|fore|im|is|isself|gainst|less|mongst|neath|nough|nother|nuff|ome|ow|ope|oney|orse|puter|round|scuse|spect|scaped|sides|specially|tween|taint|til|tis|twas|twere|twould|twill|un)([\p{P}|\s])
Steadyhands is offline   Reply With Quote
Old 12-21-2015, 02:29 PM   #150
CalibUser
Addict
CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.CalibUser goes to eleven.
 
Posts: 203
Karma: 62362
Join Date: Jul 2015
Device: Sony
Thanks, Steadyhands.

I will incorporate the code in the next version of the plugin.
CalibUser is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tidying Up My Kindle selectortone Calibre 2 07-17-2013 10:35 AM
developping a Plugin for Presentation files abdlink Plugins 4 04-15-2013 11:27 AM
Plugin to fix fb2 files oviksna Plugins 3 01-28-2013 08:53 AM
Tidying Up My Library JayLaFunk Library Management 2 09-20-2011 09:12 AM
Calibre 0.7.50 can't see plugin files mb_webguy Calibre 5 04-29-2011 03:41 AM


All times are GMT -4. The time now is 03:40 AM.


MobileRead.com is a privately owned, operated and funded community.