Plugin for tidying ePub files - Page 7

gipsy · 10-07-2015, 06:13 PM

And I test the following...

Code:

############ FIXES Π ###########		
def IsFixP(m):
	"""
	This function examines a word to see whether is required to fix the Π character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the Π fixed
	"""
	FixP="Π"+m.group(2)
	FixP2=m.group(1)+m.group(2)

	if spell(FixP2):
		return(m.group(0))
	elif spell(FixP):
		print("FixP removed from: ", FixP)
		return ('Π'+m.group(2))
	else:
		return(m.group(1)+m.group(2))


############ FIXES έ ###########		
def IsFixE(m):
	"""
	This function examines a word to see whether is required to fix the έ character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the έ fixed
	"""
	FixE=m.group(1)+"έ"+m.group(2)
	FixE2=m.group(1)+"ύ"+m.group(2)
	if spell(FixE2):
		return(m.group(1)+"ύ"+m.group(2))
	elif spell(FixE):
		print("FixE removed from: ", FixE)
		return(m.group(1)+"έ"+m.group(2))
	else:
		return(m.group(1)+"ύ"+m.group(2))

####################

		#Fixes Π in words that are misspelled
		if dictExists == True:
			CorrectText("Π fixes",r"(1\ Ι|1\ Ι|1Ι|1I|ΓΙ|Γΐ|ΙΙ|II|Ι\ Ι|ΓΤ|ΙΊ|Ιί)[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixP)

		#Fixes έ in words that are misspelled
		if dictExists == True:
			CorrectText("έ fixes",r"(\w+)ύ(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixE)

There is any simple way to sort the changes for those in the plugin runner? Or have the list of changes like the IncorrectWords -> "Changed "+mispelt+" to "+correctSpell or extract the Messages from the plugin runner. To check the fixes.

I was trying to another one. But i can't figure out how to create the CorrectText for the («ρ|(ρ|4>|<ρ|ηι) . The group can be at the start or at the middle of some word.

If i figure how to get the regex... i had another group to make a FixSomething

I attach and a IncorrectWords for greek words.

CalibUser... In FixP seems that it doesn't fix it when we have a lowercase after the (1\ Ι|ΓΙ|Γΐ|ΙΙ|II|Ι\ Ι|ΓΤ|ΙΊ|Ιί). When you have time can you check the code for both?

Thanks!

EDIT: A suggestion for the future and if it's possible... I add about 400-500 words to a user dictionary per epub and I edit the WordDictionary to add the new ones. It's possible to change the plugin to not use the WordDictionary but to get the words from the Sigils dictionary and selected userdictionaries?
Something like how the sigil get the misspelled word in spellcheck.

CalibUser · 10-08-2015, 03:32 PM

I have uploaded a new version of the plugin in the first thread. The following changes have been made:

The html code ‘, ’, “, ”, ', – and ' will be replaced by the single characters: ‘, ’, “, ”, ', – and ' when the option Replace HTML code eg – is ticked.
When using the customised error list to replace words containing apostrophes or quote marks ePubTidy will replace them with either straight quote or curly quotes depending on the most common type of quote marks that are found in the selected files in the ePub. The most common type of quote mark used in the ePub is determined accurately if BeautifulSoup is installed, otherwise the most common type of quote mark is estimated.
Bug fix: Word files that contain Greek characters are processed correctly.

CalibUser · 10-08-2015, 03:48 PM

Quote:

Originally Posted by gipsy

There is any simple way to sort the changes for those in the plugin runner? Or have the list of changes like the IncorrectWords -> "Changed "+mispelt+" to "+correctSpell or extract the Messages from the plugin runner. To check the fixes.

You could select and copy the text in the plugin runner and paste it into a program eg Textpad that provides a facility for sorting the text. If there is enough interest in your suggestion then I could add a feature to write the changes to a log file and then sort it.

Quote:

Originally Posted by gipsy

I was trying to another one. But i can't figure out how to create the CorrectText for the («?|(?|4>|<?|??) . The group can be at the start or at the middle of some word.

If i figure how to get the regex... i had another group to make a FixSomething

CalibUser... In FixP seems that it doesn't fix it when we have a lowercase after the (1\ ?|G?|G?|??|II|?\ ?|G?|??|??). When you have time can you check the code for both?

Thanks!

I will try to find time to look at this over the weekend...

Quote:

Originally Posted by gipsy

And I test the following...

EDIT: A suggestion for the future and if it's possible... I add about 400-500 words to a user dictionary per epub and I edit the WordDictionary to add the new ones. It's possible to change the plugin to not use the WordDictionary but to get the words from the Sigils dictionary and selected userdictionaries?
Something like how the sigil get the misspelled word in spellcheck.

I don't think this could be made to work. The words that are added to the WordDictionary are ones that are frequently misspelt in the same way again and again and the WordDictionary contains a very specific correction that corresponds to each of these words. I don't know how to ensure that the correct word is selected from Sigil's dictionary and selected user dictionaries.

gipsy · 10-08-2015, 04:04 PM

Quote:

Originally Posted by CalibUser

You could select and copy the text in the plugin runner and paste it into a program eg Textpad that provides a facility for sorting the text. If there is enough interest in your suggestion then I could add a feature to write the changes to a log file and then sort it.

I use the notepad++
But i wanted to check the words that are fixed (with FixP & FixE to check if the code is working fine). But don't worry, I think i found how to make it

martyger · 10-09-2015, 08:09 AM

CalibUser,

I have been testing the plug-in and I think it is on its way to becoming a very useful tool. However, IMO it needs one important modification -- a means of "stepping through" certain types of changes. Some modifications can be automatic -- character replacement, tag changes, etc. However, some changes need to be monitored.

For example, sometimes an OCR will miss periods at the end of a paragraph or add spurious lowercase letters to the end of sentences -- the correct fix is to add a period or delete the character...*not* to join paragraphs. Also, many words (like arid/and, modem/modern, etc) may or may not be errors -- the user needs to make that decision based on context.

Adding the ability to step through word lists and paragraph joins-- rather than implementing them *all* automatically -- will prevent the tool from generating a new set of errors while correcting the old ones.

As far as I can see, this change will make the plug-in the most useful item in my pulp-conversion toolbox.

Thanks.

gipsy · 10-09-2015, 01:06 PM

From my tests-edits i can say that.
The IncorectWords with the latest version works fine. It finds the whole word.
The join paragraphs works fine. Only some errors with subtitles within the text. But there arent too much.
Some replacements i had comment them in the plugin file (the sup 5, \ etc) because in greek the "λ" sometimes is recognized as \
Hyphens fix work fine with the dic support.
The spans need some work, the upper only, sometimes we have and a italic within the smallcaps span. So maybe we can add and a italics upper in the selection menu.
The greek FixP, FixE seems to work fine from.my tests. The counter of the corrections made are off but it's ok, it counter all the finds and not only the changed ones. I will attach the code here CalibUser and if you can add some text in the plugin output window to tell the user to check the changed words if they are ok.

gipsy · 10-09-2015, 01:20 PM

@CalibUser

Those are ok, if you can add a Message for the user such as when you haven't checked the Fix line breaks or in the Plugin Runner message window.
Something like... "Please check the FixP & FixE words!!!"

Code:

############ FIXES Π ###########		
def IsFixP(m):
	"""
	This function examines a word to see whether is required to fix the Π character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the Π fixed
	"""
	FixP="Π"+m.group(2)
	FixP2=m.group(1)+m.group(2)

	if spell(FixP2):
		return(m.group(0))
	elif spell(FixP):
		print("FixP: ",FixP2, " changed to ", FixP)
		return ('Π'+m.group(2))
	else:
		return(m.group(1)+m.group(2))


############ FIXES έ ###########		
def IsFixE(m):
	"""
	This function examines a word to see whether is required to fix the έ character that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the Π fixed
	"""
	FixE=m.group(1)+"έ"+m.group(2)
	FixE2=m.group(1)+"ύ"+m.group(2)
	if spell(FixE2):
		return(m.group(1)+"ύ"+m.group(2))
	elif spell(FixE):
		print("FixE: ",FixE2, " changed to ", FixE)
		return(m.group(1)+"έ"+m.group(2))
	else:
		return(m.group(1)+"ύ"+m.group(2))

Code:

		#Fixes Π in words that are misspelled
		if dictExists == True:
			CorrectText("Π fixes",r"(1\ Ι|1\ Ι|1Ι|1I|ΓΙ|Γΐ|ΙΙ|II|Ι\ Ι|ΓΤ|ΙΊ|Ιί)[ ]?(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixP)

		#Fixes έ in words that are misspelled
		if dictExists == True:
			CorrectText("έ fixes",r"(\w+)ύ(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixE)

CalibUser · 10-10-2015, 07:07 AM

Quote:

Originally Posted by martyger

CalibUser,

...sometimes an OCR will miss periods at the end of a paragraph or add spurious lowercase letters to the end of sentences -- the correct fix is to add a period or delete the character...*not* to join paragraphs. Also, many words (like arid/and, modem/modern, etc) may or may not be errors -- the user needs to make that decision based on context.

Adding the ability to step through word lists and paragraph joins-- rather than implementing them *all* automatically -- will prevent the tool from generating a new set of errors while correcting the old ones.

Taking the word list issue first, two different situations arise when correcting misspelt words: some words may be misspelt in the same way every time by OCR readers/converters where there is only one possible way of spelling these types of word correctly (eg presendy|presently, vou|you). Other words that are misspelt may or may not be errors or there may be alternative corrections that are applicable and these need to be looked at on a case-by-case basis.

Currently this plugin resolves the first situation as this was relatively straightforward to implement; it uses a word list to automatically correct words that are misspelt in the same way every time by OCR readers/converters that have only one possible way of spelling the misspelt word correctly.

I will consider adding a feature that offers alternative words for corrections to resolve the second situation; however, I don't have much time to develop the plugin (at the moment I am only carrying out 'tweaks'), so it may be a while before I can add this feature to the plugin.

Similarly paragraph joins can be an issue and some manual searching is necessary. The plugin will automatically join paragraphs that end with a hyphen to the next paragraph, paragraphs that begin with a lowercase letter to the previous one, paragraphs that end with Mrs.|Mr.|Dr.|St. to the next one and - if you tick the option 'Fix all broken line endings' - it will join paragraphs that end with a lowercase letter to those that begin with an upper case letter. If you do not tick this option then the plugin should not join paragraphs that have any other types of errors (eg it should not join paragraphs that end with lower case letters to the next paragraph if the next paragraph begins with a capital letter or punctuation mark unless this option is ticked - if you find that when you untick this option it does join paragraphs with other types of errors together then please let me know and give an example of two paragraphs that are incorrectly being joined together).

You can use the following regex expressions to do a manual Find/replace for paragraphs that have not been corrected automatically:

Find: ([a-z])</p>\s+<p>
Replace:\1 {There is a space after \1}

I may, in a future version, show each incorrectly terminated paragraph and provide the option to correct it manually if there is enough demand for this feature.

CalibUser · 10-10-2015, 07:08 AM

Quote:

Originally Posted by gipsy

Some replacements i had comment them in the plugin file (the sup 5, \ etc) because in greek the "?" sometimes is recognized as \

Would it help if I added an option for checking Greek texts so that these type of checks are bypassed?

Quote:

Originally Posted by gipsy

The spans need some work, the upper only, sometimes we have and a italic within the smallcaps span. So maybe we can add and a italics upper in the selection menu.

I will add an option for a small-sized font in italics in upper case to the span tag replacement options in the next version of the plugin.

Quote:

Originally Posted by gipsy

The greek FixP, FixE seems to work fine from.my tests.

Quote:

Originally Posted by gipsy

I will attach the code here CalibUser and if you can add some text in the plugin output window to tell the user to check the changed words if they are ok.

I will include your code in the next version of the plugin - thanks for doing this.

Quote:

Originally Posted by gipsy

The counter of the corrections made are off but it's ok, it counter all the finds and not only the changed ones.

This is a bug that I will need to fix! Thanks for pointing it out.

gipsy · 10-10-2015, 07:53 AM

Quote:

Originally Posted by CalibUser

Would it help if I added an option for checking Greek texts so that these type of checks are bypassed?

Yes!!!

I comment those in the HTMLProcessor

Code:

	CorrectText("Corrected <sup>5 and <sup>9", r"""<sup>[59]</sup>""", r'’')
	CorrectText("Corrected <sup>6</sup>", r"""<sup>6</sup>""", r'‘')
	CorrectText("Corrected / with quote mark", r"""(?s)([^<|>])(/)(?![^<>]*>)(?!.*<body[^>]*>)""", r'\1’')
	CorrectText("Corrected / with quote 'I'", r""" / """, r' I ')	#NB Could be 1 on more rare occassions

The <sup>something many times is a interpunct (·) in greek
And the / because it's a period followed by a apostrophe in a greek vowel character

Quote:

Originally Posted by CalibUser

This is a bug that I will need to fix! Thanks for pointing it out.

I mean the counter in the FixP, FixE. In those i notice the difference in the changes with the counter. But it's OK

gipsy · 10-10-2015, 04:41 PM

@CalibUser

Here some text to explain what greek correction are made by the plugin

Code:

  <h1>Ελληνικές Διορθώσεις</h1>

  <ol>
    <li>Διορθώνει τις τρεις τελείες (<b>...</b>) σε <b>…</b><br/></li>

    <li>Διορθώνει το <b>ΐ]</b> σε <b>η</b><br/></li>

    <li>Διορθώνει το <b>σιη</b> σε <b>στη</b> ακόμα και όταν είναι μέρος λέξεων, <i>από τις δοκιμές που έχω κάνει είναι ασφαλές.</i><br/></li>

    <li>Διορθώνει τα <b>οτη</b>, <b>οτο</b>, <b>οτον</b>, <b>οτα</b>, <b>οις</b>, <b>οην</b> σε <b>στη, στο, στον, στα, στις, στην</b> μεμονομένα, όχι μέρη λέξεων.<br/></li>

    <li>Διορθώνει τα <b>τοιν</b>, <b>τιον</b> σε <b>των</b>.<br/></li>

    <li>Διορθώνει το <b>οιί</b> σε <b>ού</b><br/></li>

    <li>Διορθώνει το <b>σιις</b> σε <b>στις</b><br/></li>

    <li>Διορθώνει τα <b>σιο</b>, <b>σιου</b>, <b>σια</b> σε <b>στο</b>,<b> σου</b>,<b> σα</b><br/></li>

    <li>Διορθώνει τα <b>ο'ι</b> σε <b>ώ</b><br/></li>

    <li>Διορθώνει τα <b>γΓ</b>, <b>γΡ</b> σε&nbsp;<b>γι’</b><br/></li>

    <li>Διορθώνει το <b>νπ</b> σε <b>ντι</b><br/></li>

    <li>Διορθώνει το <b>ΓΓ</b> σε <b>Γι’</b><br/></li>

    <li>Μετατρέπει σε τονισμένα τα κεφαλαία φωνήεντα πχ τα <b>'Α</b>, <b>"Α</b> σε <b>Ά</b><br/></li>

    <li>Μετατρέπει τα <b>ΰ</b> σε <b>ύ</b>, <i>μερικά μπορεί να είναι λάθος αλλά είναι ελάχιστα.</i><br/></li>

    <li>Διορθώνει το <b>ε'</b> σε <b>έ</b><br/></li>

    <li>Βάζει κενό μετά από το τελικό σίγμα (<b>ς)</b> που ακολουθείται από γράμμα.<br/></li>

    <li>Διορθώνει τα <b>Π</b> που είναι σαν <b>1 Ι,1Ι, ΓΙ, Γΐ, II, Ι Ι, ΓΤ, ΙΊ, Ιί</b><u>εφόσον η λέξη που εμπεριέχονται υπάρχει στο λεξικό</u>. Αλλιώς το αφήνει ώς έχει.<br/></li>

    <li>Διορθώνει τα <b>έ</b> που είναι σαν <b>ύ</b>&nbsp;<u>εφόσον η λέξη που εμπεριέχονται υπάρχει στο λεξικό</u>. Αλλιώς το αφήνει ώς έχει.<br/></li>
  </ol>

  <p></p>

  <p><b>ΠΡΟΣΟΧΗ: </b>Για τις διορθώσεις 17 και 18 <u>καλό είναι να τσεκάρετε τις λέξεις που έγιναν οι αλλαγές. Θα φαίνονται στο παράθυρο του Plugin ως <b>FixP:</b> <b>FixE:</b></u><br/></p>

gipsy · 10-13-2015, 04:18 AM

Two more CalibUser

Code:

############ FIXES ώ ###########		
def IsFixO(m):
	"""
	This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ώ fixed
	"""
	FixO=m.group(1)+"ώ"+m.group(3)
	FixO2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixO2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixO):
		print("FixΏ: ",FixO2, " changed to ", FixO)
		return(m.group(1)+"ώ"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))
		
############ FIXES ω ###########		
def IsFixW(m):
	"""
	This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterς that is misspelled.
	It is called by a regular expression function (re.sub) in FixCommonErrors()
	It returns the original expression if the checked word is not in the dictionary,
	otherwise it returns the word without the ω fixed
	"""
	FixW=m.group(1)+"ω"+m.group(3)
	FixW2=m.group(1)+m.group(2)+m.group(3)
	if spell(FixW2):
		return(m.group(1)+m.group(2)+m.group(3))
	elif spell(FixW):
		print("FixΩ: ",FixW2, " changed to ", FixW)
		return(m.group(1)+"ω"+m.group(3))
	else:
		return(m.group(1)+m.group(2)+m.group(3))
		

--------------------------------------------------------------------

		#Fixes ώ in words that are misspelled
		if dictExists == True:
			CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο&gt;|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)

		#Fixes ω in words that are misspelled
		if dictExists == True:
			CorrectText("ω fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο&gt;|ο'ι|ιό|οί|ιο|οι|&lt;ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixW)

EDIT: How can I modify the regex to match and the last and first characters in a word? I noticed that they work only inside the word.

Thanks!

CalibUser · 11-01-2015, 02:34 PM

Update for the ePub Tidy Tool

A new version, v0.1.1.6, has been attached to the first article in this thread and the manual has been updated.
This plugin has been tested on Windows 7 and requires that Python 3 is installed on your computer.

The following features have been added:

Has a new customised word list. Some words may be accepted by the spell checker because they are spelt correctly but an incorrect word is used. For example, sometimes the word "modern" is read by an OCR package as "modem". In this case the words needs to be checked manually. You can provide a list of these words for the plugin to process. It will find each of these words and present the paragraph that contains it, together with an alternative word. You can then select the alternative word or retain the original word.
Has an option tick box for processing Greek text
Fixes incorrect Greek words that have Π, ώ, ω and έ missing (fix provided by gipsy)
Has an extra option for processing span tags: Change to small uppercase italics. This can change the text that has the style "font-variant:small-caps" to upper case italics in a smaller font than normal and puts the span tag <uCaseSmallItalics> around the capitalised text. This allows you to define a class uCaseSmallItalics in a css file to allow a smaller size font to be applied to the italicised capitalised text
Has an option for importing a CSS file so that you can use your preferred format for text.

To use the customised word list you need to install Beautiful Soup. Instructions for this are given in the manual for Windows 7; for other systems (Mac, Linux)please search the web.

Important: Beautiful Soup will change all html mark-ups (eg &lsquo

to a single character (in this case, a left single quote mark) when it processes text. To ensure that the text processed by Beautiful Soup matches the html file exactly, it is necessary to tick the box Replace HTML code eg &msdash; to find all suspect words. This will change html characters in the ePub to single characters that are used in the search.

The code that implements the manual word check is slow compared to the automatic word search. When you press a button to accept/reject changing a word, there may a brief pause while the plugin finds the next paragraph that contains a suspect word. Despite this, it is faster to use the plugin than to use the normal Find/Search facility that is built into Sigil where you would need to manually enter each word that could be suspect and also risk leaving some out!

exaltedwombat · 11-01-2015, 04:36 PM

Is this version supposed to work with the Python that comes along with Sigil 0.8.901?

Thanks.

eschwartz · 11-01-2015, 04:50 PM

Now that Sigil's plugin launcher includes an interface to libhunspell and a way to retrieve hunspell dictionaries, is this plugin going to learn how to read those directly?

10-08-2015, 03:32 PM	#92
CalibUser Addict Posts: 201 Karma: 62362 Join Date: Jul 2015 Device: Sony	Update to ePubTidy tool I have uploaded a new version of the plugin in the first thread. The following changes have been made: The html code ‘, ’, “, ”, ', – and ' will be replaced by the single characters: ‘, ’, “, ”, ', – and ' when the option Replace HTML code eg – is ticked. When using the customised error list to replace words containing apostrophes or quote marks ePubTidy will replace them with either straight quote or curly quotes depending on the most common type of quote marks that are found in the selected files in the ePub. The most common type of quote mark used in the ePub is determined accurately if BeautifulSoup is installed, otherwise the most common type of quote mark is estimated. Bug fix: Word files that contain Greek characters are processed correctly.

11-01-2015, 02:34 PM	#103
CalibUser Addict Posts: 201 Karma: 62362 Join Date: Jul 2015 Device: Sony	Update for the ePub Tidy Tool - version, v0.1.1.6 available Update for the ePub Tidy Tool A new version, v0.1.1.6, has been attached to the first article in this thread and the manual has been updated. This plugin has been tested on Windows 7 and requires that Python 3 is installed on your computer. The following features have been added: Has a new customised word list. Some words may be accepted by the spell checker because they are spelt correctly but an incorrect word is used. For example, sometimes the word "modern" is read by an OCR package as "modem". In this case the words needs to be checked manually. You can provide a list of these words for the plugin to process. It will find each of these words and present the paragraph that contains it, together with an alternative word. You can then select the alternative word or retain the original word. Has an option tick box for processing Greek text Fixes incorrect Greek words that have Π, ώ, ω and έ missing (fix provided by gipsy) Has an extra option for processing span tags: Change to small uppercase italics. This can change the text that has the style "font-variant:small-caps" to upper case italics in a smaller font than normal and puts the span tag <uCaseSmallItalics> around the capitalised text. This allows you to define a class uCaseSmallItalics in a css file to allow a smaller size font to be applied to the italicised capitalised text Has an option for importing a CSS file so that you can use your preferred format for text. To use the customised word list you need to install Beautiful Soup. Instructions for this are given in the manual for Windows 7; for other systems (Mac, Linux)please search the web. Important: Beautiful Soup will change all html mark-ups (eg &lsquo to a single character (in this case, a left single quote mark) when it processes text. To ensure that the text processed by Beautiful Soup matches the html file exactly, it is necessary to tick the box Replace HTML code eg &msdash; to find all suspect words. This will change html characters in the ePub to single characters that are used in the search. The code that implements the manual word check is slow compared to the automatic word search. When you press a button to accept/reject changing a word, there may a brief pause while the plugin finds the next paragraph that contains a suspect word. Despite this, it is faster to use the plugin than to use the normal Find/Search facility that is built into Sigil where you would need to manually enter each word that could be suspect and also risk leaving some out!

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Tidying Up My Kindle	selectortone	Calibre	2	07-17-2013 10:35 AM
developping a Plugin for Presentation files	abdlink	Plugins	4	04-15-2013 11:27 AM
Plugin to fix fb2 files	oviksna	Plugins	3	01-28-2013 08:53 AM
Tidying Up My Library	JayLaFunk	Library Management	2	09-20-2011 09:12 AM
Calibre 0.7.50 can't see plugin files	mb_webguy	Calibre	5	04-29-2011 03:41 AM

10-09-2015, 08:09 AM	#95
martyger Member Posts: 11 Karma: 10 Join Date: Dec 2013 Device: none	CalibUser, I have been testing the plug-in and I think it is on its way to becoming a very useful tool. However, IMO it needs one important modification -- a means of "stepping through" certain types of changes. Some modifications can be automatic -- character replacement, tag changes, etc. However, some changes need to be monitored. For example, sometimes an OCR will miss periods at the end of a paragraph or add spurious lowercase letters to the end of sentences -- the correct fix is to add a period or delete the character...not to join paragraphs. Also, many words (like arid/and, modem/modern, etc) may or may not be errors -- the user needs to make that decision based on context. Adding the ability to step through word lists and paragraph joins-- rather than implementing them all automatically -- will prevent the tool from generating a new set of errors while correcting the old ones. As far as I can see, this change will make the plug-in the most useful item in my pulp-conversion toolbox. Thanks.

10-09-2015, 01:06 PM	#96
gipsy Connoisseur Posts: 81 Karma: 10 Join Date: Nov 2013 Device: Kobo Aura HD	From my tests-edits i can say that. The IncorectWords with the latest version works fine. It finds the whole word. The join paragraphs works fine. Only some errors with subtitles within the text. But there arent too much. Some replacements i had comment them in the plugin file (the sup 5, \ etc) because in greek the "λ" sometimes is recognized as \ Hyphens fix work fine with the dic support. The spans need some work, the upper only, sometimes we have and a italic within the smallcaps span. So maybe we can add and a italics upper in the selection menu. The greek FixP, FixE seems to work fine from.my tests. The counter of the corrections made are off but it's ok, it counter all the finds and not only the changed ones. I will attach the code here CalibUser and if you can add some text in the plugin output window to tell the user to check the changed words if they are ok.

11-01-2015, 04:36 PM	#104
exaltedwombat Guru Posts: 878 Karma: 2457540 Join Date: Nov 2011 Device: none	Is this version supposed to work with the Python that comes along with Sigil 0.8.901? Thanks.

11-01-2015, 04:50 PM	#105
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Now that Sigil's plugin launcher includes an interface to libhunspell and a way to retrieve hunspell dictionaries, is this plugin going to learn how to read those directly?

Advert

Advert