Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 09-02-2009, 02:31 PM   #511
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by ahi View Post
Do you really consider feasible a hyphenation database that contains all actual words it addresses, along with all their compounded, conjugated/declined (and apostrophe-laden?) forms? Even for languages unlike English, where prefixes, suffices, conjugations, and declensions can create over a 100 valid and sensible words from a 3 letter root word?

Given how utterly technically simple this is, you'd think all word processors would be using it by now. (Though "present" in English, and many words in other languages would continue to be incorrectly auto-hyphenated.)

- Ahi
Feasible? Yes. Massive OED dictionaries are available for PC, this database will most likely be smaller than that.

The process would be SLOW, most likely. But it is done once, on PC, when book is made. Our desktops are very capable machines.

The situation with "present" and ambiguous cases in other languages IS NOT a concern, since we are NOT doing machine-only hyphenation here. The database can recognize such cases and ask for human intervention. You resolve it once, during book creation.

I really don't see any problems with the method.
Ankh is offline   Reply With Quote
Old 09-02-2009, 02:49 PM   #512
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Never mind. What I wrote here doesn't make sense.

Last edited by ahi; 09-02-2009 at 02:52 PM.
ahi is offline   Reply With Quote
Old 09-02-2009, 02:57 PM   #513
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Ankh View Post
Feasible? Yes. Massive OED dictionaries are available for PC, this database will most likely be smaller than that.

The process would be SLOW, most likely. But it is done once, on PC, when book is made. Our desktops are very capable machines.

The situation with "present" and ambiguous cases in other languages IS NOT a concern, since we are NOT doing machine-only hyphenation here. The database can recognize such cases and ask for human intervention. You resolve it once, during book creation.

I really don't see any problems with the method.
Ok... let me put aside for a moment that I don't think this idea is feasible... particularly because I might have run out of cogent arguments as to why...

How would you go about building such a database, Ankh?

Just processing oodles and oodles of PG eTexts, and manually hyphenate the words therefrom?

- Ahi
ahi is offline   Reply With Quote
Old 09-02-2009, 03:09 PM   #514
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by ahi View Post
How would you go about building such a database, Ankh?

Just processing oodles and oodles of PG eTexts, and manually hyphenate the words therefrom?
Start with the source of nrapallo Webster 1913 dictionary.

Then yes, expect users to help with the growth of the database. The database-assisted hyphenation engine can ask for intervention whenever a word is not in the database. When job is done, process the database, extract the words that were added to basic text file, one line per hyphenated word, submit such file back to the maintainer. Review (use dictionaries and any other tools available), merge changes, new version of the database.

Open source.
Ankh is offline   Reply With Quote
Old 09-02-2009, 03:33 PM   #515
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Ankh View Post
Start with the source of nrapallo Webster 1913 dictionary.

Then yes, expect users to help with the growth of the database. The database-assisted hyphenation engine can ask for intervention whenever a word is not in the database. When job is done, process the database, extract the words that were added to basic text file, one line per hyphenated word, submit such file back to the maintainer. Review (use dictionaries and any other tools available), merge changes, new version of the database.

Open source.
I'm highly skeptical... but such a thing would be helpful to the sorts of projects I work on. Once I get my text formatting program into a bit better shape, perhaps I'll try to implement the framework for this idea.

- Ahi
ahi is offline   Reply With Quote
Old 09-02-2009, 04:02 PM   #516
Ankh
Guru
Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.Ankh ought to be getting tired of karma fortunes by now.
 
Ankh's Avatar
 
Posts: 714
Karma: 2003751
Join Date: Oct 2008
Location: Ottawa, ON
Device: Kobo Glo HD
Quote:
Originally Posted by ahi View Post
I'm highly skeptical... but such a thing would be helpful to the sorts of projects I work on. Once I get my text formatting program into a bit better shape, perhaps I'll try to implement the framework for this idea.

- Ahi
Beware, it is a massive undertaking, especially for the database maintainer.

A rigid review policy before submission is needed, or everything can easily fall apart. In the early stage, the tool and the database will be almost useless (too many misses), but that can quickly change, since the most frequently used words and their incarnations (whatever is the reason for slightly different form) will soon find its way into the database.

A "perfect" database might never emerge, but pretty complete one (hit ratio above 99%) would be more than useful.

Your call, I am not ready to make such a commitment. Once soft hyphens are implemented on prs505, I promise that I will use the tool and contribute to the growth of the database.

Last edited by Ankh; 09-02-2009 at 04:05 PM.
Ankh is offline   Reply With Quote
Old 09-02-2009, 04:30 PM   #517
Kostas
Still wondering why
Kostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-books
 
Kostas's Avatar
 
Posts: 253
Karma: 800
Join Date: Jun 2009
Location: Athens, Greece
Device: PRS 505, (BlackBerry Bold ?)
Quote:
Originally Posted by ahi View Post
or

λοπαδο*τεμαχο*σελαχο*γαλεο*κρανιο*λειψανο*δριμ*υπο *τριμματο*σιλφιο*καραβο*μελιτο*κατακεχυ*μενο*κιχλ* επι*κοσσυφο*φαττο*περιστερ*αλεκτρυον*οπτο*κεφαλλιο *κιγκλο*πελειο*λαγῳο*σιραιο*βαφη*τραγανο*πτερύγ ων

(Will eBook readers know how to hyphenate Greek words correctly, even if they won't concern themselves with Gikuyu?)

- Ahi

Ps.: Sorry.. not sure why the Greek is broken. All are single words, by the way... despite the forum display forcing word breaks.
Forgiven! Actually, accents are missing too...
Kostas is offline   Reply With Quote
Old 09-02-2009, 04:32 PM   #518
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Kostas View Post
Forgiven! Actually, accents are missing too...
Must be some weird unicode mangling... Oh well... Nordöstersjökustartilleriflygspaningssimulatoranlä ggningsmaterielunderhållsuppföljningssystemdiskuss ionsinläggsförberedelsearbeten made my admittedly tongue-in-cheek point well enough.

- Ahi
ahi is offline   Reply With Quote
Old 09-02-2009, 04:34 PM   #519
Kostas
Still wondering why
Kostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-booksKostas has learned how to read e-books
 
Kostas's Avatar
 
Posts: 253
Karma: 800
Join Date: Jun 2009
Location: Athens, Greece
Device: PRS 505, (BlackBerry Bold ?)
Quote:
Originally Posted by ahi View Post
Must be some weird unicode mangling... Oh well... Nordöstersjökustartilleriflygspaningssimulatoranlä ggningsmaterielunderhållsuppföljningssystemdiskuss ionsinläggsförberedelsearbeten made my admittedly tongue-in-cheek point well enough.

- Ahi

Is it finnish?
Kostas is offline   Reply With Quote
Old 09-02-2009, 04:39 PM   #520
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by ahi View Post
Do you really consider feasible a hyphenation database that contains all actual words it addresses, along with all their compounded, conjugated/declined (and apostrophe-laden?) forms? Even for languages unlike English, where prefixes, suffices, conjugations, and declensions can create over a 100 valid and sensible words from a 3 letter root word?

Given how utterly technically simple this is, you'd think all word processors would be using it by now. (Though "present" in English, and many words in other languages would continue to be incorrectly auto-hyphenated.)
It's not possible to build a hyphenation database for every word, especially not every possible word in agglutinative languages, nor patchwork languages like English that create new words by smashing other words together.

However, a good, if not perfect, hyphenation algorithm could be made, based on linguistic analysis of the language. And it could be combined with a dictionary, so it would automatically put up flags for words that could be either compound words or identically-spelled words with different meanings. (I'd say "homonyms," but they might not have the same pronunciation.) It wouldn't fix all mishy-phens, but it'd allow the formatting person (whoever that is, author or editor) to quickly identify the possibilities, rather than doing a line-by-line proof every time the formatting shifts a bit.

As far as I know, "present" is always split "pre-sent," with any of its three possible pronunciations. However, unless I was using it in a sentence like
"we knew the authorization would arrive later that week, so we pre-
sent the package," I'd avoid hyphenating it to avoid confusion, because ending with "pre-" implies the long-e pronunciation.

There's no reason hyphenation software couldn't be as good as current spellcheck software--not perfect, but good enough to remove a lot of the gruntwork of proofreading, and good enough to reflow a book to avoid almost all troublesome hyphenations.

As amusing or sometimes annoying as bad hyphenations are, I'd rather publishers spent more time on actual typos, and apostrophe use, and a good table of contents. Oh, and an index for nonfiction books.

I need good content before I need great formatting.
Elfwreck is offline   Reply With Quote
Old 09-02-2009, 04:45 PM   #521
Abecedary
Exwyzeeologist
Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.
 
Abecedary's Avatar
 
Posts: 535
Karma: 3261
Join Date: Jun 2009
Device: :PRS-505::iPod touch:
Quote:
Originally Posted by Elfwreck View Post
As far as I know, "present" is always split "pre-sent," with any of its three possible pronunciations.
I'm generally sitting out of this discussion, but I did want to point out that Merriam-Websters breaks present as pres-ent for 3 of the 4 main usages of the word (all but the verb form).
Abecedary is offline   Reply With Quote
Old 09-03-2009, 12:06 AM   #522
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Kostas View Post

Is it finnish?
Norwegian, if I recall correctly.
ahi is offline   Reply With Quote
Old 09-03-2009, 04:21 AM   #523
jbjb
Somewhat clueless
jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.jbjb ought to be getting tired of karma fortunes by now.
 
Posts: 788
Karma: 11000001
Join Date: Nov 2008
Location: UK
Device: Kindle Oasis
Quote:
Originally Posted by ahi View Post
What makes you think that it's an either-or proposition? It seems a counter-intuitive assumption to me. And, to be frank, any time the publisher decides to save on getting typography or even just the hyphenation part right will not be time re-allotted to additional proofreading.

There is, I'm rather sure, no chance of improving the content accuracy of eBooks by arguing for relaxing the typographic standards thereof.

And as I've noted a good few times though, good quality typography is not difficult or time-consuming to do with the right tools and technologies, and even great typography is not prohibitively so, even for a small publisher.
What I meant was that any benefits of improved typography over what can currently be acheived on a reflowed epub or similar pale into insignificance (N.B. to me, with my particular priorities) with respect to the improvement that would result if publishers improved their basic proof reading of ebooks (and, indeed, all books). The current standard on this area is so poor, that it doesn't really strike me as at all gainful to try and convince publishers to produce hand-crafted layouts for various sizes. I'd rather concentrate on encouraging them to improve the basic accuracy of the product.

Striving for perfect (for somebody's arbitrary definition of perfect) typography on a book which is riddled with errors is simply polishing the proverbial you-know-what.

For me (and I emphasise that this is for me - as I've said, different people want different things), if the content is accurate then I'm happy with a very basic layout - minor hyphenation errors or even stacks etc. don't really bother me.

Having said that, I take your point that a format which mixed a reflowable format with specialised layouts for specific sizes would be ideal. My concern is that I can't see how publishers who can't even get the basics right are going to do it well - I'd rather they just concentrated on the fundamentals.

/JB
jbjb is offline   Reply With Quote
Old 09-03-2009, 07:45 AM   #524
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by jbjb View Post
My concern is that I can't see how publishers who can't even get the basics right are going to do it well - I'd rather they just concentrated on the fundamentals.
Hmmm... and I, on the other hand, do not see publishers who are uninterested in the relatively easier work of making their book's typographic quality not plainly broken, be courageous in their undertaking of the considerably more difficult problem of general proofreading and rigorous content correction that constitute a lengthier and more arduous set of tasks.

But, to be honest, if I did think it was an either or proposition, I'd follow you in preferring them to do the proofing properly. After all, I can typeset the book myself more easily than I can proofread it.

- Ahi
ahi is offline   Reply With Quote
Old 09-03-2009, 07:53 AM   #525
Abecedary
Exwyzeeologist
Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.Abecedary could sell banana peel slippers to a Deveel.
 
Abecedary's Avatar
 
Posts: 535
Karma: 3261
Join Date: Jun 2009
Device: :PRS-505::iPod touch:
This completely ignores that proofreading and typesetting duties are generally handled by entirely different people or departments, especially in an organization of, oh, say, five or more people. To me, this further emphasizes that it's not an either/or proposition.
Abecedary is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
bad format of pdf ebook reader Adolfo00 Calibre 9 04-22-2010 12:11 PM
Convert PDF To Sony eBook Format? Sjwdavies Sony Reader 12 12-13-2009 03:15 AM
Free eBook for Kindle or pdf format cmwilson Deals and Resources (No Self-Promotion or Affiliate Links) 38 05-06-2009 03:32 AM
Master Format for multi-format eBook Generation? cerement Workshop 43 04-01-2009 12:00 PM
Format Comparison: PDF, EPUB, and Mobi Downloads from Ebook Bundles Kris777 News 2 01-22-2009 04:19 AM


All times are GMT -4. The time now is 06:31 PM.


MobileRead.com is a privately owned, operated and funded community.