Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 08-23-2008, 01:01 PM   #1
wallcraft
reader
wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.wallcraft ought to be getting tired of karma fortunes by now.
 
wallcraft's Avatar
 
Posts: 6,975
Karma: 5183568
Join Date: Mar 2006
Location: Mississippi, USA
Device: Kindle 3, Kobo Glo HD
Soft Hyphens

The thread Problems reading epub on prs-505 indicates that soft hyphens are a problem in ePub ebooks. From Robin’s HTML 4.0 Conformance Test:

Quote:
A soft hyphen indicates where an optional word break may occur. When a soft hyphen breaks a word between one line and the next, a hyphen character is displayed at the end of the first line. When a soft hyphen does not break a word between lines, the hyphen must not be displayed.

Soft hyphens are vital for text that must be displayed on a tiny screen or in a narrow frame. Web browsers have no excuse for rendering them incorrectly, when they can be minimally compliant by ignoring them completely.
However, the ebook readers I tested don't handle soft hyphens well.

The attached ebooks are based on http://www.cs.tut.fi/~jkorpela/shytest.html, which is from Soft hyphen (SHY) – a hard problem?. I enclose a single-file HTML (ZIP), MOBI (via MobiPocket Creator) and ePub (via BookGlutton) versions. The screenshots are from a Windows PC using Adobe Digital Editions, Sony Ebook Library (PRS-505 like), MobiPocket Reader, FBReader and uBook.

The uBook version (last screenshot) appears to do the best job, but it does not display the "-" when a soft hypen is positioned at the end of a line in the actual document and it might in fact be ignoring all the soft hyphens and using its own hyphenation (it can give discre-tionary, which isn't from the soft hyphens). Adobe Digital Editions (ePub) breaks on a soft hyphen, but does not add a "-" when it does so. Sony is based on ADE, it breaks on a soft hyphen but it also shows "?" at every soft hyphen. MobiPocket shows all soft hyphens as "-" and does not break words. FBReader does break words, but shows all soft hyphens as "-".

Soft hyphens could provide a viable alternative (or augmenation) to on the fly hyphenation, but only if ebook readers either use them for hyphenation or ignore them completely.
Attached Thumbnails
Click image for larger version

Name:	shytest_ADE.gif
Views:	1431
Size:	165.5 KB
ID:	15508   Click image for larger version

Name:	shytest_PRS.gif
Views:	1137
Size:	155.3 KB
ID:	15509   Click image for larger version

Name:	shytest_WMR.gif
Views:	1057
Size:	196.2 KB
ID:	15510   Click image for larger version

Name:	shytest_FBR.gif
Views:	1077
Size:	175.0 KB
ID:	15511   Click image for larger version

Name:	shytest_uBK.gif
Views:	997
Size:	84.5 KB
ID:	15512  
Attached Files
File Type: epub shytest.epub (2.4 KB, 603 views)
File Type: prc shytest.prc (3.5 KB, 491 views)
File Type: zip shytest_tidy.html.zip (691 Bytes, 556 views)
wallcraft is offline   Reply With Quote
Old 08-24-2008, 11:58 PM   #2
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
the eBookwise 1150 handles soft hyphens just fine exactly like it is supposed to.
DaleDe is offline   Reply With Quote
Advert
Old 08-25-2008, 01:09 AM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?
kovidgoyal is offline   Reply With Quote
Old 08-25-2008, 01:54 AM   #4
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by kovidgoyal View Post
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?
I agree: soft hyphens should be the exception rather than the rule. It's a much better idea to use hyphenation patterns and to allow these patterns to be embedded in the ePub file rather than adding soft hyphens in every XML flow. For example, in english, hyphenation rules are different for english and americans.

Take a look at: http://www.w3.org/TR/css3-gcpm/#hyphenation
Hadrien is offline   Reply With Quote
Old 08-25-2008, 08:09 AM   #5
WillAdams
Wizard
WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.WillAdams ought to be getting tired of karma fortunes by now.
 
WillAdams's Avatar
 
Posts: 1,234
Karma: 3350652
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
Soft hyphens are invaluable for indicating a valid/correct hyphenation point in a word-phrase which one knows an automated system will (or probably / likely will) handle incorrectly.

William
WillAdams is offline   Reply With Quote
Advert
Old 08-25-2008, 08:36 AM   #6
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
Quote:
Originally Posted by WillAdams View Post
Soft hyphens are invaluable for indicating a valid/correct hyphenation point in a word-phrase which one knows an automated system will (or probably / likely will) handle incorrectly.
Exactly. And this is especially important in languages were you create new words by concatenating two words. Or what you want to do is to create a word list with exception like you do in LaTeX.
tompe is offline   Reply With Quote
Old 08-25-2008, 10:39 AM   #7
Hadrien
Feedbooks.com Co-Founder
Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.Hadrien understands the importance of being earnest.
 
Hadrien's Avatar
 
Posts: 2,263
Karma: 145123
Join Date: Nov 2006
Location: Paris, France
Device: Sony PRS-t-1/350/300/500/505/600/700, Nexus S, iPad
Quote:
Originally Posted by tompe View Post
Exactly. And this is especially important in languages were you create new words by concatenating two words. Or what you want to do is to create a word list with exception like you do in LaTeX.
You can do this with CSS3 and that's exactly what I meant when I said "soft hyphens should be the exception not the rule". Most of the time it's better to let the reading system handle the hyphenation with hyphenation patterns (this way, you can select different patterns if you'd like). But for some words (in technical documentations for example) you'll have to specify "manually".
Hadrien is offline   Reply With Quote
Old 08-10-2009, 04:05 PM   #8
marcinJ13
Member
marcinJ13 has a complete set of Star Wars action figures.marcinJ13 has a complete set of Star Wars action figures.marcinJ13 has a complete set of Star Wars action figures.
 
Posts: 13
Karma: 288
Join Date: Mar 2008
Device: Cybook Gen3
I agree that it is better to have automatic mechanism for hyphenation, but it would have to be different for different languages and on top of that exeption lists would have to be created for many ebooks.

CSS3 is another option, but let me know how many browsers uses CSS3 let alone ebook reading devices.

So it seems to be much easier to implement correct displaying soft hyphens than any other option. Yet problem exists.

I will try soft hyphens on my Cybook and let you know about results.

MJ
marcinJ13 is offline   Reply With Quote
Old 08-12-2009, 12:56 AM   #9
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by kovidgoyal View Post
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?
... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.

I find it incredible, by the way, that this question was posed at all!

- Ahi
ahi is offline   Reply With Quote
Old 08-12-2009, 12:58 AM   #10
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Hadrien View Post
You can do this with CSS3 and that's exactly what I meant when I said "soft hyphens should be the exception not the rule". Most of the time it's better to let the reading system handle the hyphenation with hyphenation patterns (this way, you can select different patterns if you'd like). But for some words (in technical documentations for example) you'll have to specify "manually".
In some languages like mentioned by tompe, the same word can be both a non-compound conjugated word and a non-conjugated compound word. Depending on which it is, different hyphenation pattern is called for. (And it might be difficult to impossible for the software to correctly guess based on context which sense the word is being used in, depending on the grammar of the given language.)

It is impossible for software automation to get hyphenation completely right even in a language like English, never mind languages that pose challenges like that.

- Ahi
ahi is offline   Reply With Quote
Old 08-12-2009, 02:15 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by ahi View Post
... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.

I find it incredible, by the way, that this question was posed at all!

- Ahi
You know any humans who will get all the spelling, let alone hyphenation correct for a book of any meaningful size?
kovidgoyal is offline   Reply With Quote
Old 08-12-2009, 04:39 AM   #12
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
An algorithm that only hyphenates words that can be fairly safely hyphenated would already be an improvement.
Jellby is offline   Reply With Quote
Old 08-18-2009, 10:31 PM   #13
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Arguments for auto-hyphenation in the book-readers is not an argument against properly functioning ­ tags.

Both should work. Particularly since properly functioning ­ tags are easier to implement, and demand much less processing power. It sometimes makes sense to offload the processing of hyphens from the hardware reader.

Besides, it's part of the (X)HTML spec. Meeting the spec should be a minimum goal of anyone building a reader, hardware or software.

m a r
rogue_ronin is offline   Reply With Quote
Old 08-23-2009, 08:09 PM   #14
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by kovidgoyal View Post
I dont really see the point of soft hyphens, since some automated process is going to put them in place anyway, why not just let an automated procvess in the reader software handle hyphenation?
Automated hyphenation might be decently implemented for English language, but I doubt that the same is true for myriad of living (and dead ?) languages.

In the future, a professional hyphenation tool (based on OED database?) might emerge that will do a better job than the automated process. If the author wants to invest effort to fix hyphenation (or to purposely tweak it), why not allow them to do so?

IMHO, the automated hyphenation should be clever enough to shut itself off when soft hyphen characters are present in the source of the word at the edge of the screen.
Ankh is offline   Reply With Quote
Old 08-23-2009, 08:39 PM   #15
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by ahi View Post
... because it is literally impossible for software automation to get hyphenation correct without considerable human intervention for a book of any meaningful size.
Literally impossible?

The set of words (and their derivative forms) used in any language is huge, but still finite set. The current tools for automated hyphenation might not be up to the task, but it is definitely theoretically possible to create a complete database, and from there a "perfect" tool for automation of that task.
Ankh is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre remove soft hyphens? zuli Calibre 3 11-08-2017 09:20 PM
Soft Hyphens Deleted When Opened in Book View rcgordon Sigil 4 06-16-2010 07:14 AM
Feature request: soft hyphens paulpeer Sigil 3 12-05-2009 01:43 PM
Calibre deletes soft Hyphens in Epub ? NASCARaddicted Calibre 4 09-20-2009 06:31 PM
Certain hyphens being removed on HTML to ePub phunkysai Calibre 4 05-19-2009 03:17 PM


All times are GMT -4. The time now is 03:47 AM.


MobileRead.com is a privately owned, operated and funded community.