09-08-2009, 12:25 PM | #556 | |||
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Quote:
Once you have that, you have a program that can "find" all the possible patterns for a paragraph. The next, bigger, hurdle is choosing between these patterns. If you mean something that can find the "ideal" hyphenation patterns for an entire paragraph, I think LaTeX does that reasonably well when it has a decent line length to work with. As good as a human? I know of no reason to think any but the very best human typographers can do better. Getting something that will do as well as the very best typographers may be a long way off, but I haven't really heard--at least for English--any argument to the effect that it isn't practically possible, even within a few years. Quote:
Quote:
Last edited by frabjous; 09-08-2009 at 12:33 PM. |
|||
09-08-2009, 03:54 PM | #557 | |||
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
1. Those that LaTeX knows how to hyphenate correctly. 2. (ERROR) Those that LaTeX thinks it knows how to hyphenate, but--the word being an exception to whatever LaTeX hyphenation pattern matches is--in fact hyphenates incorrectly. 3. Those that LaTeX has no hyphenation patterns for, and rightly so, because the word should not be hyphenated. 4. (ERROR) Those that LaTeX has no hyphenation patterns for, but which should be hyphenated. The "traditional" approach is to not worry about all this nonsense, and proofread the book to catch #2's, and manually fix badboxes to catch #4's. This is particularly sensible, since LaTeX has no way whatsoever to autodetect #2 type hyphenation errors, and has no way guaranteed correct way of separating #4's from #3's. Not to mention that the number of words needing to be fixed are likely to be fewer than an exhaustive list of hyphenation errors and unhyphenatable words. Basically, I'm yet to be convinced that any alternative way of handling hyphenation in LaTeX beats the traditional way without needlessly compromising quality or actually upping the necessary manual work. Quote:
As for the unsolved problem... it's unsolved, but not a problem. People who are fine with reflow formats do not complain about poor hyphenation. Quote:
Such a renderer, unless you only care about English language books, would have to be several magnitudes more complex than the most sophisticated typesetting systems that exist today. I can, with remarkable ease, use LaTeX to create typographically correct documents in English, French, Hungarian, et cetera. Hanzi documents, either horizontally or vertically, that respect the rules of Chinese typography. Documents with Thai, Georgian, Korean, Ethiopian text that respects those languages typographic and hyphenation rules (or lack thereof). Etruscan and Old Hungarian runic texts running either left, right, or even in boustrophedon. Documents that contain a mixture of greek, hebrew, arabic, and syriac texts. Or even Klingon, Tengwar, or Shavian. The fact that I can do all these things, and infinitely more, is what makes the LaTeX/PDF/Fixed-layout option basically (given some small improvements in the resolution and contrast) as good as paper... and anything that cannot at least offer the same all-but-limitless possibilities "functionality" of paper (independent of whatever else it may be capable of) is not a viable replacement for paper (or paper books). Unless of course one holds that in eBooks function should follow form, instead of the other way around. - Ahi Last edited by ahi; 09-08-2009 at 03:58 PM. |
|||
Advert | |
|
09-08-2009, 04:40 PM | #558 | ||||
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Quote:
Quote:
I don't know if a tool presently exists to parse a LaTeX document and return a list of words that it doesn't know how to hyphenate, but if it doesn't, I cannot imagine that such a tool would be at all difficult to create, even if it meant digging into (La)TeX's source code a little bit. My thought is that at book creation, this would be run once to generate a list, and then the person writing the tex code would use a \hypenation tag to deal with all of them. But you raise a good point as to how it easy it would be to get that algorithm to distinguish between your cases #3 and #4. I'll admit I don't know enough about LaTeX's hyphenation algorithm to know how easy this would be, but even it does pattern matching rather than word matching (--actually my own experience makes me think that LaTeX does store its hyphenation rules at the word-level rather than pattern-level, but I'm not sure--) I don't think it would be that hard. Most unhyphenateable words would be common one-syllable words, and a list of such words to check against does not seem like it would be difficult to generate. (And if a few got through during this process it wouldn't be a problem... the book creator would just specify that they can't be hyphenated...) (Again, I'm restricting my comments to English and similar languages. The market for English is big enough to make this worthwhile...) And LaTeX is not the only software out there that does hyphenation... there's also Scribus, InDesign (though I think their algorithm is based on TeX's), etc. Surely, this is not such an unreachable goal. I'd very surprised if more than a few paragraphs per book on average are "hand-hyphenated" now, even with good presses, to be honest, though I don't have any first-hand knowledge of such things. Quote:
Quote:
Geez, now you have me wondering whether Tengwar and Klingon, etc. have hyphenation rules... Anyway, I admit that there are some assumptions I'm making here that may be wrong. I just haven't seen what I would consider compelling evidence against the possibility of such things. Last edited by frabjous; 09-08-2009 at 04:46 PM. |
||||
09-08-2009, 08:30 PM | #559 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
And, personally, an eBook reading software/format whose primary (and perhaps sole realistic) aim is to support the English/Western world in the foreseeable future is... well, of absolutely no interest/worth/value to me... or to the majority of humanity (most of whom today are not eBook reading device customers, but if their needs are rendered basically impossible to meet with the established industry standards, they almost certainly never will be). That is my view. - Ahi |
|
09-08-2009, 08:34 PM | #560 | |
Banned
Posts: 2,094
Karma: 2682
Join Date: Aug 2009
Device: N/A
|
Quote:
(And I'm sure most constuct languages do, one I'm familar with, the ironically-named PlusThink, a newspeak "derivative" does) Also, right, TeX would need new conventions... let me ask again, is anyone actually working on this? |
|
Advert | |
|
09-08-2009, 09:00 PM | #561 | |
Guru
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
|
Quote:
Fully justified text without hyphenation looks bad to me. I would rather stick with "jagged right edge" until hyphenation is solved. |
|
09-08-2009, 10:18 PM | #562 | |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Quote:
There was a research talk about TeX as a eBook reader at the TUG conference this past summer, so apparently, yes. I don't have any first hand knowledge of their work, though. |
|
09-09-2009, 06:29 AM | #563 |
Liseuse Lover
Posts: 869
Karma: 1035404
Join Date: Jul 2008
Location: Netherlands
Device: PRS-505
|
Having played with LyX over the weekend all the way until today, I must revise my opinion of PDF as an ebook format. Here is my revised view:
Letter- or A4 sized PDFs, especially complex ones, tend to look like crap on the (smaller) reader screens. Specially-formatted PDFs look way better than anything else I have ever seen on the Sony, and are a faithful representation of the content as it was (intended to be) laid out. This includes the pretty full justification, hyphenation (both fantastic features of LyX/LaTeX and I don't have any reason to complain about how it hyphenates, even in SF books with lots of made up words), fonts, TOC, graphics and whatnot. The drawback is you lose some layout flexibility; press the zoom button and it all goes awry. Of course, formatting the PDF with a decent font + font size takes away the need to zoom to a great extent (though when my eyes are tired I like to do this). I don't see publishers putting out several PDFs of each books with different font sizes formatted for each size mainstream screen any time soon if ever, though. So all in all, I am pretty happy with this rather combative thread; I have found LyX which takes some getting used to but once you have a decent profile/layout set up you can process books quickly and with minimal intervention (I am big on automation) and make them look great. I'd post some results here but most of my conversions go against the MR rules (copyright) and they would probably make the professional typographer's eyes bleed. Muahahahaaa! Last edited by acidzebra; 09-09-2009 at 06:31 AM. |
09-09-2009, 10:15 AM | #564 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
- Ahi |
|
09-09-2009, 10:43 AM | #565 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
As long as I have to order from Europe the same damn Fisher Price toys being sold in the local Walmart, because I want the damn plush dog to count in Hungarian rather than English... instead of being able to (as you rightly suggest in the case of eBook reader devices) simply update/change the firmware... this seems unlikely to me. And, yes, the key is multilingualism. Multilingualism that is, by the way, the norm in humans, as there are more multilingual people in the world than monolingual ones. It seems to me that your approach, separately from whatever other merits or failings it may have, firmly places (potential) readers into the hands/mercy of manufacturers that have little tangible motivation to even update firmware, never mind create endless variations thereof for a myriad different scripts and languages. And while doubtless your approach would not leave disenfranchised readers of Hanzi, whether vertical or horizontal, what is the likelihood of the particulars of the Yi script being supported? Or of the Chuvash language? (Consider these rhetorical questions relevant more for their type than for their specific content... I do not know what complications there would be with either the Yi script or with Chuvash... but the world is full of languages that are probably too minor for a large multinational corporation to ever care to do much work to support.) Not to mention stuff that starts heading toward the fanciful... how much will Sony work to professionally support all the peculiarities of ancient/koine Greek, Hebrew, Latin, Arabic, and Syriac for biblical/koranic/classical scholarship? How much will they work to support either Etruscan or Runic Hungarian (or Greek or Latin, for that matter) in boustrophedon? Will I be able to implement a fanciful way of writing modern English with Old English Runes? Will there be full and professional support for all the different scripts that Albanian has been written with in recent memory (and each of which doubtless has books originally written/prepared/published using it)? How high a priority will Inuktitut support be for them? It is a co-official language of one of Canada's provinces... along with it's related language/dialect Inuinnaqtun which may have quirks of its own. And then there's the Klingon, Tengwar, et cetera tomfoolery. And will there only be a firmware for writing Quenya with Tengwar? Or both Quenya and Sindarin? Or even English? Will Dutch written with Tengwar never be supported? And the voynich manuscript? Nothing that I mentioned is a problem to do with real paper or with PDF, assuming the author/typesetter knows what they want. However it could very easily become all but impossible to do (or at least, to do well) using a system where most of the work is left to the display engine... which knows only what the proprietor put money in. Unless you see firmware going open source in a big way... but part of me thinks that even then, writing firmware is a good deal greater a barrier to entry than simply learning LaTeX and getting the right fonts. The latter a determined language revivalist could far more readily do than the former, unless they already have a software development backgrounds. - Ahi Last edited by ahi; 09-09-2009 at 03:11 PM. |
|
09-09-2009, 10:50 AM | #566 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
Regarding the putting out of multiple PDFs for different font sizes and different screen sizes... I maintain that it sounds more onerous than it really is. Not to mention that as long as devices support proper resizing of PDFs, there can be useful sharing of sorts... e.g.: the 10pt & 6" screen version could be the 8" screen large print version. Anyways though... good luck with your future PDF endeavours! Edit: You could take screenshots and post those! - Ahi Last edited by ahi; 09-09-2009 at 10:54 AM. |
|
09-09-2009, 11:16 AM | #567 | |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
acidzebra -- I'm glad you're enjoying LyX. Do consider making the move to standard LaTeX editing, however. LyX is more of a stepping stone, in my mind, anyway.
Quote:
I wasn't really suggesting an approach for how to handle other languages, but mainly just trying to limit my positive proposal for a better renderer to languages I actually know something about. I only know English and a smattering of French and German. I'm simply not qualified to make recommendations about what would be an ideal renderer for other languages. I don't know of a reason to think this is a matter of the format itself, however. Again, my arguments weren't really about whether the markup was LaTeX or (Math)(X)(HT)ML, but about pushing for a renderer that does a better job with what it's given. I don't see why the file format itself couldn't be suitably multinational--HTML and TeX are both so, as far as I know. LaTeX's, and even more so, XeLaTeX's support for other languages is extensive, and I certainly wouldn't be opposed to that being the standard. Perhaps your point is that it would much more difficult for other languages to implement a renderer that could reflow well on the fly for some other languages, and for those, having the fixed formats to fall back to would therefore be all the more important. To repeat, I'm all for including the possibility of fixed format fallbacks. You yourself (I think it was you... too lazy to check) pointed out already early in the thread that even with current renderers for ePub you can get the exact look you want with a series of PNGs, and no one is suggesting moving to a format that wouldn't allow the insertion of arbitrary (but obviously nonreflowable) images. But surely it's not some horrendous multinational tragedy, if for those languages in which a renderer that does decent typography and allows for arbitrary reflow at the same time IS possible, we actually use such a renderer. I'm surprised at your skepticism about the firmware though: one possibility that is still live in my mind is using (pdf)LaTeX itself, or a tweaked derivative, as the renderer -- if it does the good job with these other languages that you say, why couldn't it do as good a job on our devices? (It would merely be a matter of preloading certain packages as default for different markets.) And even if there are legal barriers to this, to quote myself from earlier in the thread, the fact that the wheel has been invented once gives us all the more reason to think that reinventing it is not an impossibility. |
|
09-09-2009, 11:38 AM | #568 | |||
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
The key, for me, is that LaTeX is great primarily because the people doing the typesetting with LaTeX know more than LaTeX does by itself. The moment you move all that work to render time, LaTeX (or whatever you use) suddenly needs to no a hell of a lot more... because there's no human to fix things prior to it getting into the reader's hands. It's this transfer of miscellaneous knowledge that isn't encoded into software automation (because it is vastly better handled by a human being) that I see as a practical stumbling block, even if it is not a downright computational/mathematical one. Quote:
Quote:
And why? Because multimillion dollar publishing houses should be spared the few hour travail of generating 2-6 PDFs from a common LaTeX source (that they could [or might already] also be using for their printing)? Or because we don't want two to six 200 KB to 400 KB PDF files bundled along with the 100 KB to 200 KB HTML in an eBook, despite the cost of memory storage distinctly heading toward dirt-cheap, and people seemingly having no inclination to store more than 200 books at a time on their eBook reading device? This doesn't make sense to me. Let my skepticism not stop work/research on automating more and more typographical/layout tasks... but a lot of it seems to me like a fool's errand that only exists because of a tacit disinterest in doing the work the right way and at the right stage. - Ahi |
|||
09-09-2009, 02:07 PM | #569 |
Wizard
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Well, let me say too that I certainly hope readers will continue to support PDF for a long time to come... if for no other reason than that there are so many around now that even if a better alternative comes along, I'll want my current stash of PDFs not to become useless. If anything I wrote suggested otherwise, I'll happily take it back. (And with my Hebrew letter problem, as you know, of course, I do distribute things in PDF and there things are fine -- I just wanted to do a .mobi version too since Kindle has such a share of the market, and at least the first two generations don't support PDF. )
I think we may have different estimates of how much work it would take to change existing technologies to deliver high-quality typesetting that could be automatically reflowed (and you're of course right that LaTeX was designed with a different purpose in mind and would have to be changed, updated or augmented in various ways) vs. how much work it would save and what the other benefits would be. But there's little point in trying to quantify such things; too much is unknown until it is seriously attempted. I'll admit that part of the desire to create such technology is selfish. For one, I could use it for my own writing. I like what I write, even pre-publication, to look good. This is why I write in LaTeX, and distribute to colleagues for comments in PDF, whereas most others in my field write and distribute in Word, and wait until publication to get nice looking documents. Most of the reading I do is pre-publication stuff: whether it's student papers, work by colleagues who want feedback, or I'm refereeing a submission to a press pre-publication. If would be nice if I could take the files they sent me, do at most one universal conversion without deciding in advance on a fixed layout and font size, put them on my reader, and get decent looking results without deciding on a fixed format in advance. I'll certainly settle in the meantime for TeX-advocacy, hoping to be sent the source to create my own appropriately sized PDFs. Last edited by frabjous; 09-09-2009 at 02:11 PM. |
09-09-2009, 02:42 PM | #570 | |
Wizard
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Agreed, with most everything you wrote in your last message.
Quote:
1) A tool that would create a TeX source bundle--a .zip basically containing all dependencies that aren't part of the TeX base, and are actually referenced/used during the compilation. (i.e.: A compilation under a valid TeX variant [unless there are necessary ties to XeLaTeX of course] should never result in failure due to missing dependencies.) 2) The creation of a TeX distiller, if you will, that can take one of these TeX source bundles, and, with some minimal run-time control (over display size and/or font size), generate a PDF out of them with little more than drag and drop by the user. The combination of the two would basically mean that the typesetter could pre-customize the TeX source for specific sized outputs via use of the ifthen package (your idea, I believe)... meaning that PDFs of whatever size could be readily distilled, but anticipated size/font-size combination would look even better than merely auto-generated ones. Even this, I do not think is ideal for run-time use... but it would make for an eBook distribution system (with a minor intermediary step that could even be integrated into the download workflow of the particular file type, once default parameters have been set) superior to basically anything else. Albeit without addressing the need of people who demand different font-sizes for different lighting conditions, instead of resigning themselves to having to read with adequate light. Edit: Although, it might not be impossible to get LaTeX to output reasonably accurate HTML... not that I've seen any tool that does so tolerably. Usually the output is really quite ugly... though that might have more to do with the given approach's specific aims not simply being the generation of pretty (even if simplified/less than 100% accurate) HTML output from the LaTeX source. - Ahi Last edited by ahi; 09-09-2009 at 02:44 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
bad format of pdf ebook reader | Adolfo00 | Calibre | 9 | 04-22-2010 12:11 PM |
Convert PDF To Sony eBook Format? | Sjwdavies | Sony Reader | 12 | 12-13-2009 03:15 AM |
Free eBook for Kindle or pdf format | cmwilson | Deals and Resources (No Self-Promotion or Affiliate Links) | 38 | 05-06-2009 03:32 AM |
Master Format for multi-format eBook Generation? | cerement | Workshop | 43 | 04-01-2009 12:00 PM |
Format Comparison: PDF, EPUB, and Mobi Downloads from Ebook Bundles | Kris777 | News | 2 | 01-22-2009 04:19 AM |