Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2017, 08:02 PM   #46
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
The manual is old. The hyphenation dictionaries have not been removed from the install (yet) but nothing in Sigil uses them. In fact, they we improperly being loaded in the hunspell spellchecker but proper hyphenation dictionaries have hyphenation rule chars including digits embedded in them and are not valid words in and of themselves.

The only reason I haven't removed them yet, is I have considered adding a hyphenation library to Sigil, but am unconvinced it is needed. I will fix that when the documentation github site opens.

That said, the best way to handle dictionary installation is with a plugin to extract it, parse the .xcu xml to get the file names and copy the files.
KevinH is online now   Reply With Quote
Old 01-22-2017, 04:41 AM   #47
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by BetterRed View Post
@Doitsu - do you know of any utilities to add words to an existing hunspell dictionary. I've looked around a couple of times, all I've ever found were instructions on how to do it manually, which isn't exactly suited to occasional use.
AFAIK, there are only low-level utilities. For example, KevinH already mentioned munch and unmuch. However, they're not intended for end-users. Moreover the hunspell file format documentation appears to have been written with professional programmers in mind and is therefore not very accessible to non-programmers.
The best and easiest solution for end-users is to add words to a custom word list.


Quote:
Originally Posted by BetterRed View Post
@Doitsu - my enquiry re a utility still stands.
IMHO, such a single-use utility would be overkill. The fact that there's no standalone GUI Editor for OpenOffice/LibreOffice Hunspell dictionaries, also seems to indicate that the majority of end-users are quite happy with the default dictionaries, even though some of them are actually somewhat buggy as KevinH found out.
While we're at the topic, there is one relatively safe AFF file hack for getting better suggestions for OCRed text, but I definitely wouldn't recommend any other changes to Hunspell dictionaries.
Doitsu is offline   Reply With Quote
Advert
Old 01-22-2017, 09:32 AM   #48
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
If the user dictionary uses the .aff file from the chosen dictionary, it is not language independent.

Shouldn't there be different ones for German, English etc.? e.g. mydic_de, mydic_en

iconv and oconv: are converting the curly and straight apostrophes, is this necessary for UTF-8 or is this old stuff from ISO8859-1?
AnselmD is offline   Reply With Quote
Old 01-22-2017, 10:02 AM   #49
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
needed for all dictionaries that do not want to duplicate contractions and possessives. The only way it can be used is if the characters exist in the current encoding.

As for a dictionary .dic editor, it is simply not easy to do due to the need in most languages for prefix and suffix compression to make the working set size viable.

Yes you can create multiple user wordlists and they should as a general rule match the main dictionary language being used. One good exception is to include foreign words commonly used in another language. For example my user wordlists include some latin terms and abbreviations, some french terms, etc. I also have a scientific word list that has a number of latin terms as well.
KevinH is online now   Reply With Quote
Old 01-22-2017, 11:56 AM   #50
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Is the hunspell algorithm designed to deal with more than one dictionary?

At this command line tool, it is possible to select several dictionaries, i did not test, if this really works (-d parameter):
hunspell(1) - Linux man page
https://linux.die.net/man/1/hunspell
AnselmD is offline   Reply With Quote
Advert
Old 01-22-2017, 12:12 PM   #51
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
FWIW, the modern German dictionary would mark "geht's" as incorrect since it is not in the wordlist as far as I can tell.

Technically the apostrophe (single quotes) is needed and correct in the following line, is it not?

Quote:
Wie geht's?
Perhaps someone German can say if the current German Dictionary is acceptable or not and if there are any better hunspell German dictionaries we should be using instead.

Quote:
Originally Posted by KevinH View Post
@Doitsu,
Understood.

KevinH
KevinH is online now   Reply With Quote
Old 01-22-2017, 12:19 PM   #52
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
The hunspell commandline tool may support it but it is very different from how the hunspell library is used inside Sigil. Right now Sigil supports one main hunspell dictionary (you can select it and change it anytime you want) and multiple user based wordlists.

Calibre supports multiple language dictionaries open at once and smartly uses xhtml lang attributes to know what language to check each word in.

There is also varlog's mlspell Sigil branch that adds that to Sigil but it has not been accepted/merged yet due to issues on how to do spellchecking on the fly during live editing with highlighting in multiple languages when only that line of context is provided and not the entire document.

KevinH
KevinH is online now   Reply With Quote
Old 01-22-2017, 12:24 PM   #53
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by KevinH View Post
FWIW, the modern German dictionary would mark "geht's" as incorrect since it is not in the wordlist as far as I can tell.

Technically the apostrophe (single quotes) is needed and correct in the following line, is it not?

Wie geht's?



Perhaps someone German can say if the current German Dictionary is acceptable or not and if there are any better hunspell German dictionaries we should be using instead.
I am a German native speaker, but nevertheless i have to look for it (i grew up with the old spelling).

So, the Duden (an important dictionary of the German language) (Duden - Wikipedia https://en.wikipedia.org/wiki/Duden)
says:

Duden | Apostroph
http://www.duden.de/sprachwissen/rec...geln/apostroph

Wie gehts (auch: geht's) dir?

This means, both is correct!
AnselmD is offline   Reply With Quote
Old 01-22-2017, 12:31 PM   #54
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
I took a look into the old Duden (old spelling):
Geht's gut?

The apostrophe is a must.
AnselmD is offline   Reply With Quote
Old 01-22-2017, 12:49 PM   #55
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,647
Karma: 5433388
Join Date: Nov 2009
Device: many
So for the modern German dictionary the apostrophe is not needed for "gehts" vs "geht's" which is why it is left out of the dictionary, but for old German, you really must use "geht's" instead of "gehts".

So our current German dictionary is okay in that regard.

Thanks,

KevinH
KevinH is online now   Reply With Quote
Old 01-22-2017, 12:59 PM   #56
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Apostroph – Wikipedia
https://de.wikipedia.org/wiki/Apostr...assungszeichen

If you put it into google translation, it is (more or less) understandable:
===
Outlet character

A function of the apostrophe is the marking of omitted letters; Predominantly in the transcription of spoken language, especially in words that would otherwise be difficult to read or misleading:

Heute ist’s kalt. – Heute ist es kalt. (It's cold today. - Today it is cold.)
Hast du noch ’nen Euro? also: Hast du noch nen Euro? – Hast Du noch einen Euro?
Das ist so’ne Sache. also: Das ist sone Sache. – Das ist so eine Sache.
Was für ’n Blödsinn!/Kommen S’ nur herein! – Was für ein Blödsinn. Kommen Sie nur herein.


For omissions in the word:

D’dorf for Düsseldorf
Lu’hafen for Ludwigshafen
M’gladbach for Mönchengladbach
Ku’damm for Kurfürstendamm
E’ler for Eschweiler
A’dam for Amsterdam;
However: GMhütte for Georgsmarienhütte

Occasionally the apostrophe is also used illegitimately in the composition preposition + of certain articles, for example, in’s, an’s, um’s, zu’r. . According to the valid rules, however, an apostrophe can only be placed if the composition without an apostrophe is "opaque" (for example mit’m Fahrrad). [18] Also unlawful is the apostrophe in the case of the ex post and sentence arrhythmic omission of the e of the ending in the 1st and 3rd person plural indicative of the present active as well as of the subjunctive I.

=====


Auslassungszeichen

Eine Funktion des Apostrophs ist die Kennzeichnung ausgelassener Buchstaben; vorwiegend in der Verschriftlichung gesprochener Sprache, vor allem bei Wörtern, die sonst schwer lesbar oder missverständlich wären:

Heute ist’s kalt. – Heute ist es kalt.
Hast du noch ’nen Euro? auch: Hast du noch nen Euro? – Hast Du noch einen Euro?
Das ist so’ne Sache. auch: Das ist sone Sache. – Das ist so eine Sache.
Was für ’n Blödsinn!/Kommen S’ nur herein! – Was für ein Blödsinn. Kommen Sie nur herein.

Bei Auslassungen im Wortinnern:

D’dorf für Düsseldorf
Lu’hafen für Ludwigshafen
M’gladbach für Mönchengladbach
Ku’damm für Kurfürstendamm
E’ler für Eschweiler
A’dam für Amsterdam;
jedoch: GMHütte für Georgsmarienhütte

Gelegentlich wird der Apostroph regelwidrig auch bei der Zusammensetzung Präposition + bestimmter Artikel benutzt, beispielsweise in’s, an’s, um’s, zu’r. Nach den gültigen Regeln darf ein Apostroph aber nur gesetzt werden, wenn die Zusammensetzung ohne Apostroph „undurchsichtig“ wäre (beispielsweise mit’m Fahrrad).[18] Ebenfalls regelwidrig ist der Apostroph beim vers- und satzrhythmischen Wegfall des e der Endung -en in der 1. und 3. Person Plural Indikativ des Präsens Aktiv sowie des Konjunktivs I.
AnselmD is offline   Reply With Quote
Old 01-22-2017, 01:19 PM   #57
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by KevinH View Post
So for the modern German dictionary the apostrophe is not needed for "gehts" vs "geht's" which is why it is left out of the dictionary, but for old German, you really must use "geht's" instead of "gehts".

So our current German dictionary is okay in that regard.

Thanks,

KevinH
But "geht's" is not misspelled.
And even with an old spelling dictionary it is misspelled.

And as you can see, in this German Learning Course for beginners they say:

Karin: Hallo Eva! Wie geht’s
Deutsch üben - Einstieg - Hallo, wie geht es dir?*-*Goethe-Institut*
http://www.goethe.de/lrn/prj/wnd/deu...wg/deindex.htm

I do not say, it should be solved in Sigil, because i think it does not work in any program which uses Hunspell.

Or someone has to contact the maintainer of the dictionary. I think they are mentioned in the .aff file.
AnselmD is offline   Reply With Quote
Old 01-22-2017, 02:10 PM   #58
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
For your interest, I took a quick look at some of this German ebooks:

ePub Books - MobileRead Forums
https://www.mobileread.com/forums/fo...play.php?f=130

"This work is assumed to be in the Life+70 public domain OR the copyright holder has given specific permission for distribution. "

So the most of them uses Old Spelling and they uses apostrophes.
AnselmD is offline   Reply With Quote
Old 01-22-2017, 06:38 PM   #59
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Doitsu View Post
IMHO, such a single-use utility would be overkill. The fact that there's no standalone GUI Editor for OpenOffice/LibreOffice Hunspell dictionaries, also seems to indicate that the majority of end-users are quite happy with the default dictionaries, even though some of them are actually somewhat buggy as KevinH found out.
While we're at the topic, there is one relatively safe AFF file hack for getting better suggestions for OCRed text, but I definitely wouldn't recommend any other changes to Hunspell dictionaries.
My 'problem' is that standard dictionaries (including paper ones) tend to be sparse when it comes to knowledge domain specific words. I wasn't envisaging a GUI tool. Instead, I had a question/answer dialogue on a dumb terminal model in mind (a'la Eliza):

"Add <word from list> ?" N (word gets written to discard list)
"Add <next word from list> ?" Y
"A series of questions to create the affix entries" Not the whole enchilada, but a practical subset.

Back when pragmatics trumped perfection, PROFS/DISSOS (or something similar) had a dictionary creator along these lines. Algol springs to mind so it might have on MCP - salad days.

BR
BetterRed is offline   Reply With Quote
Old 01-23-2017, 05:23 AM   #60
AnselmD
Zealot
AnselmD began at the beginning.
 
Posts: 105
Karma: 10
Join Date: Oct 2013
Device: none
Quote:
Originally Posted by BetterRed View Post
My 'problem' is that standard dictionaries (including paper ones) tend to be sparse when it comes to knowledge domain specific words. I wasn't envisaging a GUI tool. Instead, I had a question/answer dialogue on a dumb terminal model in mind (a'la Eliza):

"Add <word from list> ?" N (word gets written to discard list)
"Add <next word from list> ?" Y
"A series of questions to create the affix entries" Not the whole enchilada, but a practical subset.

Back when pragmatics trumped perfection, PROFS/DISSOS (or something similar) had a dictionary creator along these lines. Algol springs to mind so it might have on MCP - salad days.

BR
Maybe as workarround you can use hunspell command line tool. I did some mini test with cygwin (windows) a few days ago.

hunspell -d de_DE_OLDSPELL /cygdrive/c/books/ApostropheTest.txt

(see -H The input file is in SGML/HTML format. )

hunspell(1) - Linux man page
https://linux.die.net/man/1/hunspell
Attached Thumbnails
Click image for larger version

Name:	2017-01-23 11_16_45-hunspell-cygwin.png
Views:	205
Size:	7.5 KB
ID:	154422  
AnselmD is offline   Reply With Quote
Reply

Tags
bug report, feature request, punctuation, sigil, unicode


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spellcheck and some notes. brolny Sigil 0 11-24-2015 04:37 AM
SpellCheck - Abbreviation(?) Apostrophes Paulie_D Editor 10 01-08-2015 08:22 AM
Request for future spellcheck mrmikel Editor 1 03-21-2014 11:42 AM
Quick and Dirty Spellcheck? ManosHandsOfFate Workshop 3 03-07-2014 02:41 PM
SPELLCHECK NATION: Does SpellCheck have a dark side? cbaehr Self-Promotions by Authors and Publishers 10 11-07-2010 12:45 PM


All times are GMT -4. The time now is 11:15 AM.


MobileRead.com is a privately owned, operated and funded community.