View Single Post
Old 12-01-2021, 07:43 PM   #34
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by klover137006 View Post
And does someone have a link to a page that explains how the patterns are constructed – for instance, what ".aan5" means?
Yes. The number 5 is a "rank" of where hyphens can potentially go.

Higher/lower numbers let certain rules override other rules.

I recently explained Hyphenation Dictionaries in a simpler form a few months ago:

Quote:
Note: Hyphenation Dictionaries work by patterns.

They list combinations of letters where hyphens can occur, then apply that across the entire text.

It's not like they list hundreds of thousands of every word known to man:

- hyphenate
- hyphenated
- hyphenates
- hyphenation
- hyphenations
- hyphenating

Instead, these hyphenation dictionaries list hundreds of patterns/rules like:

- "If a word ends in -ing OR -tion, you can stick a hyphen there."
- "If a word begins with anti- or semi-, you can stick a hyphen there."

Every language is going to have different patterns/rules, and people have already created these dictionaries for many of the main languages... even smaller ones like Welsh.

So even if you came up with some super cool new English word like:

- superduperliciousness

the device will auto-hyphenate correctly:

- su-per-duper-li-cious-ness
Side Note: If you needed all the extreme technical details, Hyphenation.org is the place to go.

Especially see the "Documentation" section:
  • Frank Liang's thesis "Word Hy-phen-a-tion by Com-put-er" (1983)
    • This explains all the numbers + patterns + how they work.
  • patgen is the tool used to generate all the patterns.
  • In ~2008, these hyphenation dictionaries were expanded to support UTF-8.
    • All currently supported languages + their hyphenation dictionaries can be found at Hyphenation.org.
      • These dictionaries are the basis for most programs (Firefox, LibreOffice, Kobo, etc.).
    • They also list typographically proper Left/Right hyphenmin numbers for every supported language.
      • English = 2/3
      • Swedish = 2/2
      • Hindi = 1/1
      • [...]

Quote:
Originally Posted by shalym View Post
It should actually be "mem-ory". Words are supposed to break at the syllable break.
In American English, yes. Words for the most part follow their syllables.

In British English, many words hyphenate based on the root words.

In 2014, I posted a few examples of American vs. British Hyphenation differences:
  • "hy-phen-ation" or "hyphena-tion"?
  • "cryptog-raphy" or "crypto-graphy"?
  • "ex-actly" or "ex-act-ly" or "exact-ly"?
  • "ap-pearance" or "appear-ance"?
  • "di-minish" or "dimin-ish"?

A great way to find valid hyphenation points is to search your word on M-W.com:

https://www.merriam-webster.com/dictionary/memory

"mem-o-ry" is the syllables.

But you're correct with "mem-ory". In English typography, it's best to have >=3 letters after the final hyphen.

If you look up "cryptography", you can see the American is "cryp-tog-ra-phy".

Not all words follow the syllables though, and there are exceptions.

"therapist" is a funny example... "the-rapist". Seeing that in a book would make you look twice! If you look it up, the correct hyphenation is actually "ther-a-pist".

Side Note: For British English hyphenation, the Oxford Dictionary (now known as Lexico) used to have actual hyphenation points listed like Merriam-Webster... but maybe 5 or 6 years ago, they redid their website and removed it.

I was in the middle of researching detailed American vs. British differences when it happened. Really makes me wish I backed that stuff up when I had the chance.

Quote:
Originally Posted by Simboubou View Post
Ah, thank you for that ! That's interesting : I'm French, and while when speaking english I would definitely pronounce that "mem'ry", when reading my eyes are expecting hyphens based on a french "me-mo-ry".

A couple more examples like this I came across while reading were "alarm-ing", "usu-ally" or "Riv-iera". Do they all seem correct to you ?
According to syllables at M-W.com:
  • alarm-ing
  • usu-al-ly
  • Riv-i-era

If we apply the "need to end with >=3 characters", the bad "-ly" hyphen disappears... so your "usu-ally" is correct.

Now let's say you wanted to hyphenate:
  • alarm-ing-ly

same thing. The ending "-ly" is only 2 characters, so it shouldn't split there, so you'd get:
  • alarm-ingly

Note: In French, 2/2 characters to the Left/Right is valid. See Hyphenation.org + my descriptions way above.

Quote:
Originally Posted by Simboubou View Post
It may have been the book I was reading, tho. Maybe something around the language metadata not being correct.
Most likely. The EPUB's metadata language + HTML's language must match the Hyphenation Dictionary on the device.

So an "en" dictionary only applies to English + a "de" dictionary would only apply to German, "fr" to French, etc., etc.

On Kindles, this hyphenation even goes down to the word-level. See jhowell's "Kindle hyphenation" post from 2021. I'm unsure on most other devices/apps.

Quote:
Originally Posted by Buhaj47 View Post
I have the same issue with a Polish hyphenation dictionary right now. When I set left and right characters minimum limit to 3, KEBUB files somehow render 2 characters before a hyphen - should I increase the limit to 4?
For Polish hyphenation, 2/2 are the true Left/Right numbers you want.

Hmmm... but I'm not sure on Kobo's quirks. According to JSWolf's Post #14, KEPUB has an off-by-one bug... so you'd have to specify 3/3(???) in your dictionary.

JSWolf, has this bug been fixed? Has it been reported to Kobo? Does it still apply to the latest firmware?

Quote:
Originally Posted by JSWolf View Post
There is a better solution. Try my better hyphenation dictionary. I use it and so do a lot of other people.

https://www.mobileread.com/forums/sh...d.php?t=252405


Might want to also add Finnish.

I recently promoted Kobo's openness (with Hyphenation Dictionaries) in a Reddit post. (The original poster wanted to create a two-column bilingual book... turns out he's Finnish. When I went looking for Finnish support in Kobos, I couldn't find any info.)

Last edited by Tex2002ans; 12-01-2021 at 08:44 PM.
Tex2002ans is offline   Reply With Quote