![]() |
#16 | |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
Quote:
if i remove the line ICONV ’ ' from de_DE_OLDSPELL.aff it works. i can leave the following line untouched. OCONV ' ’ |
|
![]() |
![]() |
![]() |
#17 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
@KevinH: There was a spelling reform in Germany in 1996 and many word processors and text editors offer spelling support for both "unreformed German" and "reformed German." Since many German MR contributors use Sigil to prepare German Public Domain MR books written according to the pre-1996 spelling rules, it'd be nice, if you could officially add the GPL licensed, utf-8 encoded de_DE_OLDSPELL.dic/aff spellcheck dictionaries from dict-de_de-1901_oldspell_2016-04-03.oxt in one of the next Sigil builds. The OpenOffice developers have also released a new version of the post 1996 German spelling dictionaries in 01/2017. Could you please extract de_DE_frami.dic/aff files from dict-de_de-frami_2017-01-12.oxt, convert both files to utf-8, change the SET ISO8859-1 line to SET UTF-8 and include these dictionary files (instead of the de_De dictionaries) in one of the next Sigil builds? (As I've already mentioned, the OLDSPELL dictionaries for "unreformed German" have already been converted to utf-8 by the maintainers. I.e., all you have to is add them to the build scripts.) |
|
![]() |
![]() |
![]() |
#18 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,448
Karma: 5703586
Join Date: Nov 2009
Device: many
|
That line should only be removed if you decrement the count in the iconv header line. Also, the actual text encoding used in the .aff file matters. It is has to match that specified in the line with SET.
The byte value for the smart single quotes depends on encoding and must be correct for the SET line encoding. |
![]() |
![]() |
![]() |
#19 | ||
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,448
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Quote:
Quote:
KevinH |
||
![]() |
![]() |
![]() |
#20 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
They're GPL2/3 licensed. |
|
![]() |
![]() |
![]() |
#21 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,448
Karma: 5703586
Join Date: Nov 2009
Device: many
|
The hunspell man pages and docs describe what can be used in an aff file. It was hunspell that added the iconv and oconv features. Before the chararacter conversions on input is an iconv number that tells how many iconv patterns follow it. If you remove one of the iconv pattern lines, you need to decrement that line count. Also if you edit the aff in the wrong encoding, you can mess it up.
Here is a good hunspell man page I found online that describes the contents of the aff file and its meaning: http://manpages.ubuntu.com/manpages/...unspell.4.html I will take a look at the dictionary and see why removing that line seems to help. Thanks, KevinH |
![]() |
![]() |
![]() |
#22 | |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
Quote:
my changes to the .aff files does not seem to help at all. At my sample book i posted above, in the following sentence, the word keinen changes to misspelled, but is correctly spelled: Da gab’s keinen! |
|
![]() |
![]() |
![]() |
#23 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
@AnselmD: Since Sigil uses the same dictionaries as LibreOffice, I just tested the text from your test epub with curly and straight apostrophes with LibreOffice and in both versions the contracted words were flagged as typos.
I.e., the German dictionary designers most likely didn't define the contraction 's for verbs. You'll have to add these contractions, which are mostly found in colloquial German, to the default user dictionary. BTW, I got the same results with Calibre Editor. I wasn't able to reproduce this error with the default de_DE dictionary and LibreOffice. Keinen was only flagged with non-German spelling dictionaries. |
![]() |
![]() |
![]() |
#24 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,448
Karma: 5703586
Join Date: Nov 2009
Device: many
|
With the OLDSPELL it has the wrong WORDCHARS for handling anything related to possesives or contractions:
WORDCHARS ß-.'’ It is hard splitting words after every (') This makes no sense to me and the modern version of the dictionary uses the following: WORDCHARS ß-. I think that should be fixed in the OLDSPELL version but I do not know the language well enough to know. Doitsu, what happens in your testing if that is removed from the WORDCHARS, are correct german contractions handled properly then? Thanks, KevinH |
![]() |
![]() |
![]() |
#25 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,448
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Hi Doitsu,
FWIW, I have grabbed the newest modern de_DE dictionary and converted it to utf-8 with python3 reading it as binary and re-encoding to utf-8 and writing it out as binary with no errors reported. Its aff file does not use either the ICONV or OCONV which would certainly impact if smart single quotes suggestions are generated by the dictionary or only dumb ones. Do you want me to add an iconv and oconv to force suggestions to smart single quotes too? I will use this version to update the modern German dictionary once you let me know about the preferred iconv oconv settings. Thanks, KevinH |
![]() |
![]() |
![]() |
#26 |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
FYI:
I installed the hunspell command line program for cygwin, in the manual exists the following option: --check-apostrophe Check and force Unicode apostrophes (U+2019), if one of the ASCII or Unicode apostrophes is specified by the spelling dic‐ tionary, as a word character (see WORDCHARS, ICONV and OCONV in hunspell(5)). |
![]() |
![]() |
![]() |
#27 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Hi KevinH,
Quote:
I.e., WORDCHARS ß-.'’ should be OK for German, because properly spelled German words might contain full stops, hyphens and straight or curly apostrophes. They also may contain the ß character, but only in the middle and at the end of words. I'm a bit puzzled as to why the dictionary deverlopers specifically added the ß character, but not the other German umlauts (äöü). However, when I added WORDCHARS ß-.'’ to a utf-8 version of the latest de_De_frami.aff file, it appeared to have no effect, because contracted words such as gab’s were still flagged as typos. Gab, which is the first part of gab’s (= gab + es), is in the spelling dictionary, which you can test by inserting a space before the apostrophe. I'd have expected the spell checker to ignore this word because gab is in the dictionary. Since German words may contain curly apostrophes, the German affix file should also contain: Code:
ICONV 1 ICONV ’ ' OCONV 1 OCONV ' ’ D. Last edited by Doitsu; 01-20-2017 at 01:47 PM. |
|
![]() |
![]() |
![]() |
#28 |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
|
![]() |
![]() |
![]() |
#29 | |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
Quote:
This should be the correct behavior. |
|
![]() |
![]() |
![]() |
#30 |
Zealot
![]() Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
|
If i add the words to the user defined dictionary in libre office (using the OLD_SPELL dictionary), they are still misspelled. I will check this with the default German dictionary later. Does anyone manage to add them, so they are not misspelled?
|
![]() |
![]() |
![]() |
Tags |
bug report, feature request, punctuation, sigil, unicode |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spellcheck and some notes. | brolny | Sigil | 0 | 11-24-2015 04:37 AM |
SpellCheck - Abbreviation(?) Apostrophes | Paulie_D | Editor | 10 | 01-08-2015 08:22 AM |
Request for future spellcheck | mrmikel | Editor | 1 | 03-21-2014 11:42 AM |
Quick and Dirty Spellcheck? | ManosHandsOfFate | Workshop | 3 | 03-07-2014 02:41 PM |
SPELLCHECK NATION: Does SpellCheck have a dark side? | cbaehr | Self-Promotions by Authors and Publishers | 10 | 11-07-2010 12:45 PM |