Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-19-2017, 05:22 PM   #16
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by varlog View Post
There is this piece of code in Sigil:
Code:
QString Utility::getSpellingSafeText(const QString &raw_text)
{
    // There is currently a problem with Hunspell if we attempt to pass
    // words with smart apostrophes from the CodeView encoding.
    // There are likely better ways to solve this, but this one does
    // get the job done until someone can implement something better.
    QString text(raw_text);
    return text.replace(QString::fromUtf8("\u2019"), "'");
}
Don't know if it's related, it is used three times in SpellCheck.cpp. Just info.
Maybe this is done (and should be done) by the .aff fille?

if i remove the line
ICONV ’ '
from de_DE_OLDSPELL.aff
it works.
i can leave the following line untouched.
OCONV ' ’
AnselmD is offline   Reply With Quote
Old 01-19-2017, 06:44 PM   #17
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by AnselmD View Post
Where can i get working version of:
German (de-DE-1901) old spelling dictionaries-2016.04.03 | Apache OpenOffice Extensions
http://extensions.openoffice.org/de/...aries-20160403
@AnselmD: These dictionary files were already converted to utf-8 by the maintainers. Since you've already figured out how solve the problem by editing the .aff file, there's nothing for the developers to do.

@KevinH: There was a spelling reform in Germany in 1996 and many word processors and text editors offer spelling support for both "unreformed German" and "reformed German." Since many German MR contributors use Sigil to prepare German Public Domain MR books written according to the pre-1996 spelling rules, it'd be nice, if you could officially add the GPL licensed, utf-8 encoded de_DE_OLDSPELL.dic/aff spellcheck dictionaries from dict-de_de-1901_oldspell_2016-04-03.oxt in one of the next Sigil builds.

The OpenOffice developers have also released a new version of the post 1996 German spelling dictionaries in 01/2017.
Could you please extract de_DE_frami.dic/aff files from dict-de_de-frami_2017-01-12.oxt, convert both files to utf-8, change the SET ISO8859-1 line to SET UTF-8 and include these dictionary files (instead of the de_De dictionaries) in one of the next Sigil builds?
(As I've already mentioned, the OLDSPELL dictionaries for "unreformed German" have already been converted to utf-8 by the maintainers. I.e., all you have to is add them to the build scripts.)
Doitsu is offline   Reply With Quote
Advert
Old 01-19-2017, 06:49 PM   #18
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
That line should only be removed if you decrement the count in the iconv header line. Also, the actual text encoding used in the .aff file matters. It is has to match that specified in the line with SET.

The byte value for the smart single quotes depends on encoding and must be correct for the SET line encoding.
KevinH is offline   Reply With Quote
Old 01-19-2017, 07:02 PM   #19
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Quote:
Originally Posted by Doitsu View Post
@KevinH: There was a spelling reform in Germany in 1996 and many word processors and text editors offer spelling support for both "unreformed German" and "reformed German." Since many German MR contributors use Sigil to prepare German Public Domain MR books written according to the pre-1996 spelling rules, it'd be nice, if you could officially add the GPL licensed, utf-8 encoded de_DE_OLDSPELL.dic/aff spellcheck dictionaries from dict-de_de-1901_oldspell_2016-04-03.oxt in one of the next Sigil builds.
Users can simply install this dictionary on their own. I do not want to add the extra meg or so to include it on every download when it is of use only for older works and does not represent current spelling in German. Users can install their own dictionaries quite easily. The .oxt is pretty much just a zip file.

Quote:
The OpenOffice developers have also released a new version of the post 1996 German spelling dictionaries in 01/2017.
Could you please extract de_DE_frami.dic/aff files from dict-de_de-frami_2017-01-12.oxt, convert both files to utf-8, change the SET ISO8859-1 line to SET UTF-8 and include these dictionary files (instead of the de_De dictionaries) in one of the next Sigil builds?
If the dictionary license is compatible with Sigil then yes we can do this.

KevinH
KevinH is offline   Reply With Quote
Old 01-19-2017, 07:03 PM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
That line should only be removed if you decrement the count in the iconv header line. Also, the actual text encoding used in the .aff file matters. It is has to match that specified in the line with SET.

The byte value for the smart single quotes depends on encoding and must be correct for the SET line encoding.
Presumably, neither I nor the OP is familiar with the exact syntax of the AFF/DIC files and are therefore ill-equipped to provide any useful feedback to you.

Quote:
Originally Posted by KevinH View Post
If the dictionary license is compatible with Sigil then yes we can do this.
They're GPL2/3 licensed.
Doitsu is offline   Reply With Quote
Advert
Old 01-19-2017, 09:32 PM   #21
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
The hunspell man pages and docs describe what can be used in an aff file. It was hunspell that added the iconv and oconv features. Before the chararacter conversions on input is an iconv number that tells how many iconv patterns follow it. If you remove one of the iconv pattern lines, you need to decrement that line count. Also if you edit the aff in the wrong encoding, you can mess it up.

Here is a good hunspell man page I found online that describes the contents of the aff file and its meaning:

http://manpages.ubuntu.com/manpages/...unspell.4.html

I will take a look at the dictionary and see why removing that line seems to help.

Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 01-20-2017, 07:38 AM   #22
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by KevinH View Post
The hunspell man pages and docs describe what can be used in an aff file. It was hunspell that added the iconv and oconv features. Before the chararacter conversions on input is an iconv number that tells how many iconv patterns follow it. If you remove one of the iconv pattern lines, you need to decrement that line count. Also if you edit the aff in the wrong encoding, you can mess it up.

Here is a good hunspell man page I found online that describes the contents of the aff file and its meaning:

http://manpages.ubuntu.com/manpages/...unspell.4.html

I will take a look at the dictionary and see why removing that line seems to help.

Thanks,

KevinH
Hi Kevin,

my changes to the .aff files does not seem to help at all. At my sample book i posted above, in the following sentence, the word keinen changes to misspelled, but is correctly spelled:
Da gab’s keinen!
AnselmD is offline   Reply With Quote
Old 01-20-2017, 08:59 AM   #23
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@AnselmD: Since Sigil uses the same dictionaries as LibreOffice, I just tested the text from your test epub with curly and straight apostrophes with LibreOffice and in both versions the contracted words were flagged as typos.
I.e., the German dictionary designers most likely didn't define the contraction 's for verbs.
You'll have to add these contractions, which are mostly found in colloquial German, to the default user dictionary.
BTW, I got the same results with Calibre Editor.

Quote:
Originally Posted by AnselmD View Post
At my sample book i posted above, in the following sentence, the word keinen changes to misspelled, but is correctly spelled:
Da gab’s keinen!
I wasn't able to reproduce this error with the default de_DE dictionary and LibreOffice. Keinen was only flagged with non-German spelling dictionaries.
Attached Thumbnails
Click image for larger version

Name:	LOCurly.png
Views:	225
Size:	29.7 KB
ID:	154321   Click image for larger version

Name:	LOStraight.png
Views:	215
Size:	28.6 KB
ID:	154322  
Doitsu is offline   Reply With Quote
Old 01-20-2017, 09:46 AM   #24
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
With the OLDSPELL it has the wrong WORDCHARS for handling anything related to possesives or contractions:

WORDCHARS ß-.'’

It is hard splitting words after every (')

This makes no sense to me and the modern version of the dictionary uses the following:

WORDCHARS ß-.

I think that should be fixed in the OLDSPELL version but I do not know the language well enough to know. Doitsu, what happens in your testing if that is removed from the WORDCHARS, are correct german contractions handled properly then?

Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 01-20-2017, 09:52 AM   #25
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Doitsu,
FWIW, I have grabbed the newest modern de_DE dictionary and converted it to utf-8 with python3 reading it as binary and re-encoding to utf-8 and writing it out as binary with no errors reported. Its aff file does not use either the ICONV or OCONV which would certainly impact if smart single quotes suggestions are generated by the dictionary or only dumb ones. Do you want me to add an iconv and oconv to force suggestions to smart single quotes too?

I will use this version to update the modern German dictionary once you let me know about the preferred iconv oconv settings.

Thanks,

KevinH
KevinH is offline   Reply With Quote
Old 01-20-2017, 11:18 AM   #26
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
FYI:
I installed the hunspell command line program for cygwin, in the manual exists the following option:

--check-apostrophe
Check and force Unicode apostrophes (U+2019), if one of the
ASCII or Unicode apostrophes is specified by the spelling dic‐
tionary, as a word character (see WORDCHARS, ICONV and OCONV in
hunspell(5)).
AnselmD is offline   Reply With Quote
Old 01-20-2017, 11:18 AM   #27
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Hi KevinH,

Quote:
Originally Posted by KevinH View Post
With the OLDSPELL it has the wrong WORDCHARS for handling anything related to possesives or contractions:

WORDCHARS ß-.'’
If understand the definition of WORDCHARS correctly, the characters in this list are special characters that otherwise would be considered word delimiters. If that's the case, the German WORDCHARS list, should be identical to the English WORDCHARS list, because the 1996 spelling reform officially sanctioned the use of English style possessive constructions with straight and curly apostrophes in German. (The pre-1996 spelling rules only allowed straight and curly apostrophes in contracted words.)

I.e., WORDCHARS ß-.'’ should be OK for German, because properly spelled German words might contain full stops, hyphens and straight or curly apostrophes. They also may contain the ß character, but only in the middle and at the end of words.
I'm a bit puzzled as to why the dictionary deverlopers specifically added the ß character, but not the other German umlauts (äöü).

However, when I added WORDCHARS ß-.'’ to a utf-8 version of the latest de_De_frami.aff file, it appeared to have no effect, because contracted words such as gabs were still flagged as typos.

Gab, which is the first part of gabs (= gab + es), is in the spelling dictionary, which you can test by inserting a space before the apostrophe. I'd have expected the spell checker to ignore this word because gab is in the dictionary.

Since German words may contain curly apostrophes, the German affix file should also contain:

Code:
ICONV 1
ICONV ’ '
OCONV 1
OCONV ' ’
However, users, who want to save curly apostrophes in user dictionaries, would have to comment out this section.

D.

Last edited by Doitsu; 01-20-2017 at 01:47 PM.
Doitsu is offline   Reply With Quote
Old 01-20-2017, 11:43 AM   #28
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by Doitsu View Post
@AnselmD:
You'll have to add these contractions, which are mostly found in colloquial German, to the default user dictionary.
Yes, but this does not work. I can add them, but they are still recognized as misspelled.
AnselmD is offline   Reply With Quote
Old 01-20-2017, 11:50 AM   #29
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by KevinH View Post
With the OLDSPELL it has the wrong WORDCHARS for handling anything related to possesives or contractions:

WORDCHARS ß-.'’

It is hard splitting words after every (')

This makes no sense to me and the modern version of the dictionary uses the following:

WORDCHARS ß-.

I think that should be fixed in the OLDSPELL version but I do not know the language well enough to know. Doitsu, what happens in your testing if that is removed from the WORDCHARS, are correct german contractions handled properly then?

Thanks,

KevinH
I checked my text with the hunspell command line tool of cygwin. I stored the text as utf8 (using pspad/notepad++). I used the OLDSPELL .dic and .aff files. The words with apostrophe are shown as misspelled, i can add them to the users dictionary and afterwards they are shown as correctly spelled.
This should be the correct behavior.
AnselmD is offline   Reply With Quote
Old 01-20-2017, 12:42 PM   #30
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by Doitsu View Post
@AnselmD: You'll have to add these contractions, which are mostly found in colloquial German, to the default user dictionary.
BTW, I got the same results with Calibre Editor.
If i add the words to the user defined dictionary in libre office (using the OLD_SPELL dictionary), they are still misspelled. I will check this with the default German dictionary later. Does anyone manage to add them, so they are not misspelled?
AnselmD is offline   Reply With Quote
Reply

Tags
bug report, feature request, punctuation, sigil, unicode


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spellcheck and some notes. brolny Sigil 0 11-24-2015 04:37 AM
SpellCheck - Abbreviation(?) Apostrophes Paulie_D Editor 10 01-08-2015 08:22 AM
Request for future spellcheck mrmikel Editor 1 03-21-2014 11:42 AM
Quick and Dirty Spellcheck? ManosHandsOfFate Workshop 3 03-07-2014 02:41 PM
SPELLCHECK NATION: Does SpellCheck have a dark side? cbaehr Self-Promotions by Authors and Publishers 10 11-07-2010 12:45 PM


All times are GMT -4. The time now is 04:11 PM.


MobileRead.com is a privately owned, operated and funded community.