![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
|
New single-source German hyphenation — please test!
German readers: Parallel thread at E-Reader Forum
I’m working on new single-source German hyphenation patterns for Linux and many types of e-readers, including Kobo. These are based on a ~600,000 word text corpus and generate patterns for the German reformed spelling (1996/2006). Since I don’t own a Kobo, I cannot test this, and ask for your help in testing, please. I’d be interested in:
I tried to prepare a Kobo-compatible file. As far as I have read, you’ll have to:
Here is one of the books I used for testing: Hans Dominik - Atlantis. It has real crazy, manually constructed "ellpises" at word endings, constructed like (NNBSP)(.)(NNBSP)(.)(NNBSP)(.). Screenshots: Left—bad hyphenation; Right—new version hyph_de.dic Last edited by Moonbase59; 07-12-2025 at 05:09 AM. |
![]() |
![]() |
![]() |
#2 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Would you mind going to https://www.mobileread.com/forums/sh...d.php?t=252405 and downloading the German Hyphenation dictionary from the first post and see what you think of it? Thanks.
I would like to know which one is better as we don't need two of them. The one that's in that post is a Libra Office German hyphenation dictionary. |
![]() |
![]() |
![]() |
#3 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
|
Quote:
Starting out with "what we have in Linux" is never a bad idea, many years ago I did the same (using igerman, aspell, myspell and later hunspell files), and of course that’s what LibreOffice does. Makes sense, using good building blocks. The one you’re mentioning doesn’t show where it’s from, and the original text corpus it was built from. It apparently uses an older style format, and the ISO8859-1 character set (instead of UTF-8). Unfortunately, I already overwrote the (Linux system’s) hyph_de_DE.dic LibreOffice uses nowadays, but I think it was from 2017. I built mine from an almost 600,000 word text corpus containing preferred (primary) and secondary hyphenation points, as well as common word beginnings and endings, current as of 2025-07-09. For reading devices’ typically real old software that doesn’t understand the Hyphen 2.7+ NOHYPHEN command, I also created large exception lists, so word boundaries are recognized correctly, if characters like punctuation, apostrophes, brackets, invisible nonbreaking spaces etc. are directly adjacent to what should be considered a "word". See screenshots above. So yes, I suggest that mine should result in much better hyphenation, and it proves true on Linux (including LibreOffice, Sigil, et al) and the devices I own and could test, a Tolino Vision 5 and a Pocketbook Era. Since I don’t own a Kobo, I must know if my "Kobo version" works correctly before I can release it officially, see the questions in the first post. My goal is to provide a single-sourced, top quality, uniform hyphenation for most software and e-readers that can use it. All generated from the most current and extremely well maintained corpora the German Dante e.V. Trennmustermannschaft provide for use with LaTeX. Many thanks to them for laying such a fantastic groundwork! Last edited by Moonbase59; 07-12-2025 at 09:36 AM. |
|
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 13
Karma: 10
Join Date: Jun 2025
Location: Germany
Device: paperwhite 11gen
|
i know german
|
![]() |
![]() |
![]() |
#5 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
|
Thanks for testing!
How did you define "better"? What did you test? |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Metadata Source Plugin] Show multiple results for single source? | memory_dump | Plugins | 6 | 06-03-2025 01:03 AM |
Updating source while running from source | ownedbycats | Development | 2 | 01-30-2022 05:32 AM |
Export single article epub from a news source | rozen | Library Management | 1 | 08-14-2016 02:55 AM |
hyphenation | CPatrick | OpenInkpot | 3 | 03-22-2010 06:06 AM |