Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader > Kobo Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 07-12-2025, 04:49 AM   #1
Moonbase59
Groupie
Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.
 
Moonbase59's Avatar
 
Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
New single-source German hyphenation — please test!

German readers: Parallel thread at E-Reader Forum

I’m working on new single-source German hyphenation patterns for Linux and many types of e-readers, including Kobo. These are based on a ~600,000 word text corpus and generate patterns for the German reformed spelling (1996/2006).

Since I don’t own a Kobo, I cannot test this, and ask for your help in testing, please.

I’d be interested in:
  • Does it work at all?
  • Can the Kobo software use UTF-8 hyphenation dictionaries?
  • Are special word boundary cases recognized and handled correctly, like shown in the screenshots?

I tried to prepare a Kobo-compatible file. As far as I have read, you’ll have to:
  • Unpack the ZIP archive.
  • Connect the Kobo via USB.
  • Copy the unpacked KoboRoot.tgz into the .kobo folder on the device.
  • Safely eject the device.
  • The Kobo reader should then install the new German hyphenation dictionary.

Here is one of the books I used for testing: Hans Dominik - Atlantis. It has real crazy, manually constructed "ellpises" at word endings, constructed like (NNBSP)(.)(NNBSP)(.)(NNBSP)(.).

Screenshots: Left—bad hyphenation; Right—new version hyph_de.dic
Attached Thumbnails
Click image for larger version

Name:	scr0020.png
Views:	36
Size:	161.4 KB
ID:	216857   Click image for larger version

Name:	scr0019.png
Views:	28
Size:	159.3 KB
ID:	216858  
Attached Files
File Type: zip Moonbase59-hyph_de_DE.zip (175.1 KB, 9 views)

Last edited by Moonbase59; 07-12-2025 at 05:09 AM.
Moonbase59 is online now   Reply With Quote
Old 07-12-2025, 07:21 AM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Would you mind going to https://www.mobileread.com/forums/sh...d.php?t=252405 and downloading the German Hyphenation dictionary from the first post and see what you think of it? Thanks.

I would like to know which one is better as we don't need two of them. The one that's in that post is a Libra Office German hyphenation dictionary.
JSWolf is offline   Reply With Quote
Old 07-12-2025, 09:25 AM   #3
Moonbase59
Groupie
Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.
 
Moonbase59's Avatar
 
Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
Quote:
Originally Posted by JSWolf View Post
The one that's in that post is a Libra Office German hyphenation dictionary.
Problem with these is you can’t really tell from the end result, because the patterns are highly-processed and compressed and you’d not easily be able to deduct the original words from that. Also, unfortunately, license and date comments are often removed.

Starting out with "what we have in Linux" is never a bad idea, many years ago I did the same (using igerman, aspell, myspell and later hunspell files), and of course that’s what LibreOffice does. Makes sense, using good building blocks.

The one you’re mentioning doesn’t show where it’s from, and the original text corpus it was built from. It apparently uses an older style format, and the ISO8859-1 character set (instead of UTF-8).

Unfortunately, I already overwrote the (Linux system’s) hyph_de_DE.dic LibreOffice uses nowadays, but I think it was from 2017.

I built mine from an almost 600,000 word text corpus containing preferred (primary) and secondary hyphenation points, as well as common word beginnings and endings, current as of 2025-07-09. For reading devices’ typically real old software that doesn’t understand the Hyphen 2.7+ NOHYPHEN command, I also created large exception lists, so word boundaries are recognized correctly, if characters like punctuation, apostrophes, brackets, invisible nonbreaking spaces etc. are directly adjacent to what should be considered a "word". See screenshots above.

So yes, I suggest that mine should result in much better hyphenation, and it proves true on Linux (including LibreOffice, Sigil, et al) and the devices I own and could test, a Tolino Vision 5 and a Pocketbook Era.

Since I don’t own a Kobo, I must know if my "Kobo version" works correctly before I can release it officially, see the questions in the first post.

My goal is to provide a single-sourced, top quality, uniform hyphenation for most software and e-readers that can use it. All generated from the most current and extremely well maintained corpora the German Dante e.V. Trennmustermannschaft provide for use with LaTeX. Many thanks to them for laying such a fantastic groundwork!

Last edited by Moonbase59; 07-12-2025 at 09:36 AM.
Moonbase59 is online now   Reply With Quote
Old 07-12-2025, 09:28 AM   #4
SilentHex
Member
SilentHex began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2025
Location: Germany
Device: paperwhite 11gen
i know german
SilentHex is offline   Reply With Quote
Old 07-12-2025, 11:08 AM   #5
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,756
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Moonbase59 View Post
Problem with these is you can’t really tell from the end result, because the patterns are highly-processed and compressed and you’d not easily be able to deduct the original words from that. Also, unfortunately, license and date comments are often removed.

Starting out with "what we have in Linux" is never a bad idea, many years ago I did the same (using igerman, aspell, myspell and later hunspell files), and of course that’s what LibreOffice does. Makes sense, using good building blocks.

The one you’re mentioning doesn’t show where it’s from, and the original text corpus it was built from. It apparently uses an older style format, and the ISO8859-1 character set (instead of UTF-8).

Unfortunately, I already overwrote the (Linux system’s) hyph_de_DE.dic LibreOffice uses nowadays, but I think it was from 2017.

I built mine from an almost 600,000 word text corpus containing preferred (primary) and secondary hyphenation points, as well as common word beginnings and endings, current as of 2025-07-09. For reading devices’ typically real old software that doesn’t understand the Hyphen 2.7+ NOHYPHEN command, I also created large exception lists, so word boundaries are recognized correctly, if characters like punctuation, apostrophes, brackets, invisible nonbreaking spaces etc. are directly adjacent to what should be considered a "word". See screenshots above.

So yes, I suggest that mine should result in much better hyphenation, and it proves true on Linux (including LibreOffice, Sigil, et al) and the devices I own and could test, a Tolino Vision 5 and a Pocketbook Era.

Since I don’t own a Kobo, I must know if my "Kobo version" works correctly before I can release it officially, see the questions in the first post.

My goal is to provide a single-sourced, top quality, uniform hyphenation for most software and e-readers that can use it. All generated from the most current and extremely well maintained corpora the German Dante e.V. Trennmustermannschaft provide for use with LaTeX. Many thanks to them for laying such a fantastic groundwork!
If it turns out that your dictionary is better on a Kobo then the one I made, I'll remove mine and post a link to yours in the hyphenation thread.
JSWolf is offline   Reply With Quote
Old 07-12-2025, 12:10 PM   #6
Moonbase59
Groupie
Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.Moonbase59 can program the VCR without an owner's manual.
 
Moonbase59's Avatar
 
Posts: 167
Karma: 196896
Join Date: Oct 2021
Location: Germany
Device: Tolino Vision 5, Tolino Tab 8", Pocketbook Era (16GB)
Thanks for testing!

How did you define "better"? What did you test?
Moonbase59 is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Metadata Source Plugin] Show multiple results for single source? memory_dump Plugins 6 06-03-2025 01:03 AM
Updating source while running from source ownedbycats Development 2 01-30-2022 05:32 AM
Export single article epub from a news source rozen Library Management 1 08-14-2016 02:55 AM
hyphenation CPatrick OpenInkpot 3 03-22-2010 06:06 AM


All times are GMT -4. The time now is 01:55 AM.


MobileRead.com is a privately owned, operated and funded community.