Hi CalibUser,
something doesn't work fine in greek right now
It changes the "δυνατόν περισσότερους ναυαγούς" to "δυνατόό περισσότεροο ναυαγοο" but i can't figure why. The code from 0.1.1.5 is the same
Maybe it's something in
Code:
def IsFixO(m):
"""
This function examines a word to see whether is required to fix the (ιό|οί|ιο|οι) characterw that is misspelled.
It is called by a regular expression function (re.sub) in FixCommonErrors()
It returns the original expression if the checked word is not in the dictionary,
otherwise it returns the word without the ώ fixed
"""
FixO=m.group(1)+"ώ"+m.group(3)
FixO2=m.group(1)+m.group(2)+m.group(3)
if spell(FixO2):
return(m.group(1)+m.group(2)+m.group(3))
elif spell(FixO):
print("FixΏ: ",FixO2, " changed to ", FixO)
return(m.group(1)+"ώ"+m.group(3))
else:
return(m.group(1)+m.group(2)+m.group(3))
--------------------------------------------
#Fixes ώ in words that are misspelled
CorrectText("ώ fixes",r"(\w+)(ιίι|(ό|ο)|ίό|ο>|ο'ι|ιό|οί|ιο|οι|<ο|οϊ)(\w+)(?![^<>]*>)(?!.*<body[^>]*>)", IsFixO)
because i get a
Code:
Changes made
===============
ώ fixes 3
And you forget to add those :P
Code:
#------------------------ Greek character corrections -------------
#Fixes '…' when PDFd as ...
CorrectText("Changed ... to …", r'\.\.\.', r'…')
#Fixes 'η' when PDFd as ΐ]
CorrectText("Changed ΐ] to η", r'ΐ]', r'η')
#Fixes 'στη' when PDFd as σιη
CorrectText("Changed σιη to στη", r'σιη', r'στη')
#Fixes 'στ(η|ο|ον|α|ις|ην)' when PDFd as '"οτ(η|ο|ον|α|ις|ην)'
CorrectText("Changed οτ(η|ο|ον|α|ις|ην) to στ(η|ο|ον|α|ις|ην)", r' οτ(η|ο|ον|α|ις|ην) ', r' στ\1 ')
#Fixes 'των' when PDFd as 'τ(οι|οι)ν'
CorrectText("Changed τ(οι|ιο)ν to των", r' τ(οι|ιο)ν ', r' των ')
#Fixes 'ού' when PDFd as 'οιί'
CorrectText("Changed οιί to ού", r'οιί', r'ού')
#Fixes 'στις' when PDFd as σιις
CorrectText("Changed σιις to στις", r'σιις', r'στις')
#Fixes 'στ(η|ο|ον|ην)' when PDFd as οτ(η|ο|ον|ην)
CorrectText("Changed οτ(η|ο|ον|ην) to στ(η|ο|ον|ην)", r' οτ(η|ο|ον|ην) ', r'στ\1')
#Fixes 'στ(ο|ου|α)' when PDFd as σι(ο|ου|α)
CorrectText("Changed σι(ο|ου|α) to στ(ο|ου|α)", r' σι(ο|ου|α)', r'στ\1')
#Fixes 'ώ' when PDFd as ο'ι
CorrectText("Changed ο'ι to ώ", r'(ο\'ι|\(ί\))', r'ώ')
#Fixes 'Άκουσ' when PDFd as Ακόυσ
CorrectText("Changed Ακόυσ to Άκουσ", r'Ακόυσ', r'Άκουσ')
#Fixes 'γι’' when PDFd as γΓ,γΡ
CorrectText("Changed γΓ γΡ to γι’", r'(γΓ|γΡ)', r'γι’')
#Fixes 'ντι' when PDFd as νπ
CorrectText("Changed νπ to ντι", r'νπ', r'ντι')
#Fixes 'Γι’' when PDFd as ΓΓ
CorrectText("Changed ΓΓ to Γι’", r'ΓΓ ', r'Γι’ ')
#Fixes 'σχεδίαζ' when PDFd as σχέδιαζ
CorrectText("Changed σχέδιαζ to σχεδίαζ", r'σχέδιαζ', r'σχεδίαζ')
#Fixes '\u0388' when PDFd as 'E "E
CorrectText("Changed 'E,\"E to \u0388", r'(\'|\")(\u0395)', r'Έ')
#Fixes \u038E when PDFd as 'Y or "Y
CorrectText("Changed 'Y,\"Y to \u038E", r'(\'|\")(\u03A5)', r'Ύ')
#Fixes \u038A when PDFd as 'I or "I
CorrectText("Changed 'I,\"I to \u038A", r'(\'|\")(\u0399)', r'Ί')
#Fixes \u038C when PDFd as 'O or "O
CorrectText("Changed 'O,\"O to \u038C", r'(\'|\")(\u039F)', r'Ό')
#Fixes \u0386 when PDFd as 'A or "A
CorrectText("Changed 'A,\"A to \u0386", r'(\'|\")(\u0391)', r'Ά')
#Fixes \u0389 when PDFd as 'H or "H
CorrectText("Changed 'H,\"H to \u0389", r'(\'|")(\u0397)', r'Ή')
#Fixes \u038F when PDFd as '\u03C9 or "\u03C9
CorrectText("Changed '\u03C9,\"\u03C9 to \u038F", r'(\'|\")(\u03C9)', r'Ώ')
#Fixes \u03CD when PDFd as \u03B0
CorrectText("Changed \u03CD to \u03B0", r'ΰ', r'ύ')
#Fixes \u03CD when PDFd as \u03B0
CorrectText("Changed ε' to έ", r'ε\'', r'έ')
#Fixes ς Character when PDFd as ςCharacter
CorrectText("Changed ςCharacter to ς Character", r'ς([\u0370-\u03CE])', r'ς \1')