![]() |
#466 |
The Grand Mouse 高貴的老鼠
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 73,955
Karma: 315160596
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Oasis
|
There is an easy solution to one of the hyphenation problems. When the ebook is being created, optional hyphens should be added by the creator to all the valid hyphenation positions in all the words in the book.
The creator would use software to automate the insertion of the hyphens, and said software ask for help on words it hadn't already been told about, or which might have different hyphenations depending on context. This hyphenation position marking should be a quick and simple process, as most words will already be in the software's hyphenation dictionary. It can even suggest hyphenations to the operator for unknown words using language-specific algorithms. Most of the time the operator will just have to agree. The problem of valid hyphenation positions is now solved - the rendering software needs no intelligence, it has only to hyphenate at the optional hyphen positions. |
![]() |
![]() |
![]() |
#467 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,258
Karma: 3439432
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
jbjb,
The appropriate cite for ``typography is not a machine solvable problem'' would be the Knuth-Plass paper ``Breaking Paragraphs into Lines'', D.E. Knuth and M.F. Plass, chapter 3 of _Digital Typography_, CSLI Lecture Notes #78. Please note that there is no H&J algorithm which can successfully detect and prevent ``stacks'' or rivers --- it seems to be (to use the formal computing term) ``NP Complete'' --- I'd be very interested in any research or algorithm which makes this a solvable problem. There're even fewer efforts to solve typographic problems at a level larger than a page --- and I've frequently had to relay an entire chapter because of how the last page fell out --- Here's a list of the current research on this: http://groups.google.com/group/comp....9?dmode=source So, unless someone has an example of an implementation which will automatically paginate a text and _not_ allow stacks, orphans or other bad breaks, I believe that the above references should stand as the requested citation to demonstrate that, ``typography is not a machine solvable problem''. William |
![]() |
![]() |
![]() |
#468 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,258
Karma: 3439432
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
Pdurrant --- if you think inserting all possible appropriate hyphens is easy, please provide a rate quote for doing this to text on a per file basis per thousand characters of text. Please note that ``present'' has different hyphenation points depending on its pronounciation (whether it's a gift or the act of presenting something or the current time) and that any method for doing this would need to take into account any such words and only insert the appropriate and correct hyphenation.
William Last edited by WillAdams; 09-02-2009 at 09:52 AM. |
![]() |
![]() |
![]() |
#469 | |
Somewhat clueless
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 772
Karma: 9999999
Join Date: Nov 2008
Location: UK
Device: Kindle Oasis
|
Quote:
Furthermore, the problem itself is sufficiently ill-defined that the assertion is meaningless. As has been pointed out to you several times, different people have different opinions of the level of typography required for the problem to be classed as "solved". I assume you'd be happy to concede that there is no single perfect typographical layout that would be universally recognised, and that even "experts" would disagree about which was superior out of a selection of hand-made layouts? Given that, if the criterion for success is thge perfect layout, then you could trivially say that the problem is not machine-solvable, but it's also unsolvable period. The only meaningful (it seems to me) yardstick for "solved" is when the typography is sufficiently good that the reader is completely happy with it. Different people will have different thresholds for this, and many people's can be met with an automated solution. Furthermore, and this is key, just because you claim expertise in this field doesn't make your opinion of what is acceptable to any given reader any more valid than their own. Different people want different things. /JB |
|
![]() |
![]() |
![]() |
#470 | |
Exwyzeeologist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 535
Karma: 3261
Join Date: Jun 2009
Device: :PRS-505::iPod touch:
|
Quote:
I find the thought of inserting at least 5 extra characters into every single multisyllabic word to be a ridiculously poor way of handling the problem. For one, it would make the HTML nearly unreadable due to the number of extra markups involved (not exactly a huge problem, but one that could certainly be a major nuisance to some). And let's not even think about what it would do to the filesizes. Last edited by Abecedary; 09-02-2009 at 10:03 AM. |
|
![]() |
![]() |
![]() |
#471 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
Even if genuinely solving the hyphenation problem was possible and practical, would all eBook readers include all hyphenation patterns for all languages? All popular languages. All less popular languages. All languages that have no states attached to them? (Or at least the ones with more speakers than some of the world's smaller countries.) ... and, of course, some of these languages will be such that they will have words whose meaning, and therefore correct hyphenation, depends entirely on the semantic context. Which takes us to the eBook reading software having to try to at least figure out the grammar for X number of languages (where X might be very large indeed). The truth is, it would be easier (and quite possibly achievable with remarkably high degree of accuracy) to have eBook readers replace dumb quotes with smart quotes in ePubs on the run... But I don't see that ever happening. And until it does, or something very like it is successfully implemented, I find it difficult to take seriously any suggestion that eBook readers will ever even try to hyphenate in a way that has any chance of producing correct hyphenation for a genuinely large percentage of all eBooks (regardless of language). But, of course, I do not believe it is possibel for them to actually succeed, even if they do try. - Ahi |
|
![]() |
![]() |
![]() |
#472 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Wow, such overreaction.
LaTeX already knows the hyphenation of most words, as already been not only stated but demonstrated. The fact that a million words exist, the vast majority of which are almost never used is completely irrelevant. Including the results of a script that finds exceptions to what the system knows, and identifies those that are known to be ambiguous (though grammar-check-like software could handle most such cases) to be given to the book designer at book creation to mark just these words likely wouldn't take more than 10 minutes per book. If the trade-off is between no hyphenation anywhere, or a non-reflowable format, vs the occasionally wrong pre-sent vs. pres-ent, I certainly would put up with the latter. There's no reason in principle why a computer can't identify stacks and rivers, and try to do something about them. LaTeX already can be programmed to completely avoid widows and orphans, though on a sufficiently small page, doing so would be a bad idea. Maybe a perfect algorithm isn't possible, and a computer can't do these things perfectly, though I'm not sure they're humanly perfectable either, but let me just say this... Using the failure of perfectability as an excuse for poo-pooing the push for software that does these things much better than what we currently have is well... incredibly silly. Last edited by frabjous; 09-02-2009 at 10:22 AM. |
![]() |
![]() |
![]() |
#473 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
The different opinions mostly come from individuals whose work and profession have nothing whatsoever to do with bookmaking. To be frank, I am not going to pretend they ought to be considered on equal footing with definitions of typography not debased by an ardent desire to only read HTML books in the future. But let's not even worry about typography's solvability being defined, unless you have some cogent argument to make about how hyphenation-at-display-time can be solved in a way that works for practically (but, to be fair, not literally) all of humanity, not just the anglosphere or the western world. (Comprehensive [as opposed to superficial] hyphenation patterns for Gikuyu, anyone? [Presumably with autodetection of English and Swahili words included, to which their respective hyphenation ought to be applied.] So) Quote:
The fact that even experts cannot agree on perfect typography is irrelevant. Experts and even reasonably intelligent and knowledgeable amateurs will be able to recognize good, high quality typography when they see it. But even this is irrelevant at this point. The problem with display-time typography is that without perfect hyphenation (and sometimes even with perfect hyphenation) there may well be no straightforward way to render without committing blatant, obvious, and egregious typographic errors. And surely you will concede that whether or not experts can agree on what is perfect hyphenation, there are objective standards in most written languages as to what is incorrect hyphenation. Quote:
Quote:
Clearly. I don't think this has as thoroughly broad implications as you suggest though. --- Having said all this... let me somewhat give you what you want me to say: Yes, I do believe it is possible to get hyphenation and typography right enough for a lot of people to be satisfied. Primarily because it already happened, since a lot of people are fine with no hyphenation and utterly broken typography. Any improvements will doubtless be welcomed and celebrated by those people as much as anyone. What I do not believe though, is that either hyphenation or typography in general can be gotten to a state where it is objectively of a professional quality (note, I did not say perfect). - Ahi |
||||
![]() |
![]() |
![]() |
#474 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Me?
Quote:
In Hungarian, hyphenation of certain words (not an ennumaratably small list) depends on semantic context... literally no way to know the correct hyphenation without understanding the word/sentence. In addition, the Hungarian double digraphs "ssz" (a long "sz"), "ccs" (a long "cs"), "zzs", "ggy", 'nny" are treated unorthodoxly. If "massza" is hyphenated as "masz-sza"... however "ssz" could also be "s+sz" as in "vasszarv" which is correctly hyphenated as "vas-szarv". The LaTeX solution is to manually mark double digraphs... so that if hyphenation needs to occur there, it is not mistakenly separated the wrong way. Oh... and, of course, this is also an issue with single digraphs. Is a "cs" sequence a digraph, or merely "c+s"--is a "sz" or a "zs" sequence a digraph or "s+z" or "z+s". Tolerable hyphenation that is right most of the time will not forever be impossible to do at display-time. Professional hyphenation correct to the standards of books published by reputable publishers, however, I believe will remain so perpetually because of the myriad complications (most of which you and I do not even know, on account of being language-specific issues) on top of the already formidable challenges. - Ahi |
|
![]() |
![]() |
![]() |
#475 | |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,094
Karma: 2682
Join Date: Aug 2009
Device: N/A
|
Quote:
Also, sorry, I also don't believe your statement about typography without at at least an informal proof. (Make a statement on a math problem, provide a proof or a demonstration. Thanks!) Frabjous - A LaTeX install is also, at a minimum, hundreds of meg in size. This is one of the things I'm on about - it's not suitable as a typological processor in a low-resource environment. Typography is demnstrably mostly-solveable, by brute force, but that that soloution is not applicable to low-power devices. |
|
![]() |
![]() |
![]() |
#476 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
It's a worthy task done the wrong way and at the wrong stage of processing. ... and the other issues I mention one post above. - Ahi |
|
![]() |
![]() |
![]() |
#477 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,258
Karma: 3439432
Join Date: Feb 2008
Device: Amazon Kindle Paperwhite (300ppi), Samsung Galaxy Book 12
|
ahi wrote:
>some of these languages will be such that they will have words whose meaning, and therefore correct hyphenation, depends entirely on the semantic context. and Dawnfalcon asked >Which language is that? English for one, see my example for ``present'' in the post just above. The Knuth & Plass paper which I cited has a formal proof and discussion of the impossibility of finding the perfect set of breaks for a paragraph. jbjb --- you keep saying that something is possible (machine-done, perfect page composition) and asking people to prove that it's not possible --- yet you can't prove that it is possible by showing us a single implementation --- yet a large number of people, some of whom work in this field are stating that it isn't possible, and have pointed you to research papers on the difficulties of this task. I can't even find a grammar checker which can reliably disambiguate between the two different forms of ``present'', let alone every other such word in the English language --- and that's only a small part of the problem. William |
![]() |
![]() |
![]() |
#478 | |
Liseuse Lover
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
Quote:
edit: and of course, good enough for which set of people ![]() Last edited by acidzebra; 09-02-2009 at 11:19 AM. |
|
![]() |
![]() |
![]() |
#479 | ||
Somewhat clueless
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 772
Karma: 9999999
Join Date: Nov 2008
Location: UK
Device: Kindle Oasis
|
Quote:
Quote:
Seems to me that simply e.g. detecting and preventing stacks should be fairly straight-forward if you're allowed to arbitrarily add space and break lines wherever you want. I know that's a pedantic point, but non-computability is a formal thing and needs to be treated formally (i.e. with a better definition of what constitutes a solution before claiming one can't be found). /JB |
||
![]() |
![]() |
![]() |
#480 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
I think one fundamental problem is that people assume that some of these problems are beyond them, but surely not beyond some people out there smart enough or computers out there fast enough. When in fact... in a way, it truly is. The problem has been solved the only way it can be: with intelligent and educated people doing a good bit of work in advance. Any other solution will be recognizable poorer in quality for the foreseeable future... even if not forever. Shall we say: The fact that I don't know how to do something, does not mean that somebody else out there does, or even that it is practically doable at all. - Ahi |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
bad format of pdf ebook reader | Adolfo00 | Calibre | 9 | 04-22-2010 12:11 PM |
Convert PDF To Sony eBook Format? | Sjwdavies | Sony Reader | 12 | 12-13-2009 03:15 AM |
Free eBook for Kindle or pdf format | cmwilson | Deals and Resources (No Self-Promotion or Affiliate Links) | 38 | 05-06-2009 03:32 AM |
Master Format for multi-format eBook Generation? | cerement | Workshop | 43 | 04-01-2009 12:00 PM |
Format Comparison: PDF, EPUB, and Mobi Downloads from Ebook Bundles | Kris777 | News | 2 | 01-22-2009 04:19 AM |