MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Workshop (https://www.mobileread.com/forums/forumdisplay.php?f=178)
-   -   Soft Hyphens (https://www.mobileread.com/forums/showthread.php?t=28139)

wallcraft 08-23-2008 02:01 PM

Soft Hyphens
 
8 Attachment(s)
The thread Problems reading epub on prs-505 indicates that soft hyphens are a problem in ePub ebooks. From Robin’s HTML 4.0 Conformance Test:

Quote:

A soft hyphen indicates where an optional word break may occur. When a soft hyphen breaks a word between one line and the next, a hyphen character is displayed at the end of the first line. When a soft hyphen does not break a word between lines, the hyphen must not be displayed.

Soft hyphens are vital for text that must be displayed on a tiny screen or in a narrow frame. Web browsers have no excuse for rendering them incorrectly, when they can be minimally compliant by ignoring them completely.
However, the ebook readers I tested don't handle soft hyphens well.

The attached ebooks are based on http://www.cs.tut.fi/~jkorpela/shytest.html, which is from Soft hyphen (SHY) – a hard problem?. I enclose a single-file HTML (ZIP), MOBI (via MobiPocket Creator) and ePub (via BookGlutton) versions. The screenshots are from a Windows PC using Adobe Digital Editions, Sony Ebook Library (PRS-505 like), MobiPocket Reader, FBReader and uBook.

The uBook version (last screenshot) appears to do the best job, but it does not display the "-" when a soft hypen is positioned at the end of a line in the actual document and it might in fact be ignoring all the soft hyphens and using its own hyphenation (it can give discre-tionary, which isn't from the soft hyphens). Adobe Digital Editions (ePub) breaks on a soft hyphen, but does not add a "-" when it does so. Sony is based on ADE, it breaks on a soft hyphen but it also shows "?" at every soft hyphen. MobiPocket shows all soft hyphens as "-" and does not break words. FBReader does break words, but shows all soft hyphens as "-".

Soft hyphens could provide a viable alternative (or augmenation) to on the fly hyphenation, but only if ebook readers either use them for hyphenation or ignore them completely.

DaleDe 08-25-2008 12:58 AM

the eBookwise 1150 handles soft hyphens just fine exactly like it is supposed to.

kovidgoyal 08-25-2008 02:09 AM

I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?

Hadrien 08-25-2008 02:54 AM

Quote:

Originally Posted by kovidgoyal (Post 240548)
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?

I agree: soft hyphens should be the exception rather than the rule. It's a much better idea to use hyphenation patterns and to allow these patterns to be embedded in the ePub file rather than adding soft hyphens in every XML flow. For example, in english, hyphenation rules are different for english and americans.

Take a look at: http://www.w3.org/TR/css3-gcpm/#hyphenation

WillAdams 08-25-2008 09:09 AM

Soft hyphens are invaluable for indicating a valid/correct hyphenation point in a word-phrase which one knows an automated system will (or probably / likely will) handle incorrectly.

William

tompe 08-25-2008 09:36 AM

Quote:

Originally Posted by WillAdams (Post 240616)
Soft hyphens are invaluable for indicating a valid/correct hyphenation point in a word-phrase which one knows an automated system will (or probably / likely will) handle incorrectly.

Exactly. And this is especially important in languages were you create new words by concatenating two words. Or what you want to do is to create a word list with exception like you do in LaTeX.

Hadrien 08-25-2008 11:39 AM

Quote:

Originally Posted by tompe (Post 240621)
Exactly. And this is especially important in languages were you create new words by concatenating two words. Or what you want to do is to create a word list with exception like you do in LaTeX.

You can do this with CSS3 and that's exactly what I meant when I said "soft hyphens should be the exception not the rule". Most of the time it's better to let the reading system handle the hyphenation with hyphenation patterns (this way, you can select different patterns if you'd like). But for some words (in technical documentations for example) you'll have to specify "manually".

marcinJ13 08-10-2009 05:05 PM

I agree that it is better to have automatic mechanism for hyphenation, but it would have to be different for different languages and on top of that exeption lists would have to be created for many ebooks.

CSS3 is another option, but let me know how many browsers uses CSS3 let alone ebook reading devices.

So it seems to be much easier to implement correct displaying soft hyphens than any other option. Yet problem exists.

I will try soft hyphens on my Cybook and let you know about results.

MJ

ahi 08-12-2009 01:56 AM

Quote:

Originally Posted by kovidgoyal (Post 240548)
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?

... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.

I find it incredible, by the way, that this question was posed at all!

- Ahi

ahi 08-12-2009 01:58 AM

Quote:

Originally Posted by Hadrien (Post 240680)
You can do this with CSS3 and that's exactly what I meant when I said "soft hyphens should be the exception not the rule". Most of the time it's better to let the reading system handle the hyphenation with hyphenation patterns (this way, you can select different patterns if you'd like). But for some words (in technical documentations for example) you'll have to specify "manually".

In some languages like mentioned by tompe, the same word can be both a non-compound conjugated word and a non-conjugated compound word. Depending on which it is, different hyphenation pattern is called for. (And it might be difficult to impossible for the software to correctly guess based on context which sense the word is being used in, depending on the grammar of the given language.)

It is impossible for software automation to get hyphenation completely right even in a language like English, never mind languages that pose challenges like that.

- Ahi

kovidgoyal 08-12-2009 03:15 AM

Quote:

Originally Posted by ahi (Post 551244)
... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.

I find it incredible, by the way, that this question was posed at all!

- Ahi

You know any humans who will get all the spelling, let alone hyphenation correct for a book of any meaningful size?

Jellby 08-12-2009 05:39 AM

An algorithm that only hyphenates words that can be fairly safely hyphenated would already be an improvement.

rogue_ronin 08-18-2009 11:31 PM

Arguments for auto-hyphenation in the book-readers is not an argument against properly functioning ­ tags.

Both should work. Particularly since properly functioning ­ tags are easier to implement, and demand much less processing power. It sometimes makes sense to offload the processing of hyphens from the hardware reader.

Besides, it's part of the (X)HTML spec. Meeting the spec should be a minimum goal of anyone building a reader, hardware or software.

m a r

Ankh 08-23-2009 09:09 PM

Quote:

Originally Posted by kovidgoyal (Post 240548)
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?

Automated hyphenation might be decently implemented for English language, but I doubt that the same is true for myriad of living (and dead ?) languages.

In the future, a professional hyphenation tool (based on OED database?) might emerge that will do a better job than the automated process. If the author wants to invest effort to fix hyphenation (or to purposely tweak it), why not allow them to do so?

IMHO, the automated hyphenation should be clever enough to shut itself off when soft hyphen characters are present in the source of the word at the edge of the screen.

Ankh 08-23-2009 09:39 PM

Quote:

Originally Posted by ahi (Post 551244)
... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.

Literally impossible?

The set of words (and their derivative forms) used in any language is huge, but still finite set. The current tools for automated hyphenation might not be up to the task, but it is definitely theoretically possible to create a complete database, and from there a "perfect" tool for automation of that task.


All times are GMT -4. The time now is 09:34 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.