View Single Post
Old 11-11-2014, 07:45 PM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Psymon View Post
PROBLEM #1 - More Selective Hyphenation

I'm primarily an iPad user (forgive me), but I hate the way that it automatically hyphenates words willy-nilly all over the place, even shorter words that didn't need to be, and so what I did to counter that was initially turn hyphenation off in my book completely, by adding this in my styles (wherever I wanted hyphenation to be turned off)...
This is a bad, bad idea... This is something better left at the specific reading program level, and the hyphenation algorithms of the device itself. Yes, SOME devices might be crap at certain words, some of the algorithms might be junky, but those can be fixed with a software update (or will be fixed in other device X).

The ONLY spot I can MAYBE see disabling hyphenation being useful, is if you wanted to disable it in headings. Besides that, it is not recommended. I DEFINITELY don't recommend disabling it everywhere, and ENABLING it on certain words (if anything, you would do the exact opposite).

Side Note: This reminds me a lot of the soft-hyphenation talk. There is even that Calibre Plugin, "Hyphenate This!", which is dedicated towards adding in soft-hyphens everywhere under the sun:

https://www.mobileread.com/forums/sho...d.php?t=208534

Here is more talk on the soft-hyphen problem:

https://www.mobileread.com/forums/sho...d.php?t=230358

Best to leave hyphenation as a user choice, which they can either enable/disable in their reader.

Other Side Note: Here is Wikipedia's page on Widows/Orphans (italics mine):

https://en.wikipedia.org/wiki/Widows_and_orphans

Quote:
Similarly, a single orphaned word at the end of a paragraph can be cured by forcing one or more words from the preceding line into the orphan's line. In web-publishing, this is typically accomplished by concatenating the words in question with a non-breaking space and, if available, by utilizing the orphans: and widows: attributes in Cascading Style Sheets. Sometimes it can also be useful to add non-breaking spaces to the first two (or few) short words of a paragraph to avoid that a single orphaned word is placed to the left or right of a picture or table, while the remainder of the text (with longer words) would only appear after the table.

[...]

In technical writing where a single source may be published in multiple formats, and now in HTML5 with the expectation of viewing content at different sizes/resolutions, use the word processor settings that automatically prevent widows and orphans. Manual overrides like inserted empty lines or extra spaces can cause unexpected white space in the middle of pages.
Quote:
Originally Posted by Psymon View Post
And then, call me crazy, but I actually went through my whole, entire book(s) looking for "problematic words", and wrapping them with that class...[...]
This is going to cause you a ton of headaches, for very little gain. And what happens when the user changes font size, changes dimensions (landscape + portrait, or devices become higher resolution/larger), etc. etc. You are just going to cause yourself TONS of headaches.

IF, and that is a big IF, you wanted that much control over the look, you might as well just go "Fixed Format". (Which brings along its own host of problems).

Quote:
Originally Posted by Psymon View Post
[space] + [a word with at least 8 characters] + [a space OR any number of alphanumeric characters]
This is the best Regex Tutorial:

http://www.regular-expressions.info/tutorial.html

To do the above, you would want something along these lines:

Search: (\b\w{8,}\b)
Replace: <span class="hyph">\1</span>

\b = a "Word Boundary", you can read up on that here: http://www.regular-expressions.info/wordboundaries.html
\w = any "Word Character", you can read up on that here: http://www.regular-expressions.info/shorthand.html
{8,} = 8 or more characters

So, in English, this says "Find a Word Boundary, then any 8 or more Word Characters in a row, followed by another Word Boundary". Since the entire thing is surrounded by parenthesis, this says, stick this entire thing in a capture point \1.

Then take everything in \1, and "wrap that entire thing with <span class="hyph"></span>".

Quote:
Originally Posted by Psymon View Post
PROBLEM #2 - Selectively Preventing Word Wrap

Another "typographically-annoying" thing is whenever a line happens to end with the first word of a new sentence (or a phrase after punctuation mark) which starts with a single-letter word -- which, as far as I can come up with, would be "I" or "A" or "a", or, in rarer instances, "O".

Here's a made-up example of an especially annoying paragraph...
Again, you are going to cause yourself lots of problems..... the only way to keep certain words together is adding non-breaking spaces all over the place.... and I highly recommend against that.

Look, at a certain point, you have to accept that reflowable ebooks ARE NOT PRINT.

#1: Give up trying to make them print.
#2: The EPUB standards are just not there to support a lot of the complex typographical decisions.

If you want to do all of that typographical nitpicking in EPUB, you will have to go Fixed Format, OR, just create a PDF using whatever tools (LaTeX, Quark, InDesign, etc. etc.).

Side Note: For example, in French Typography, there seems to be this weird rule of "the last line of a paragraph should not be shorter than the double of the indentation of the next paragraph.":

https://tex.stackexchange.com/questi...ne/28361#28361

This sort of weird conventions are just NOT POSSIBLE in EPUB.

Other Side Note: Sometimes, I really wonder how typographers survive on the Internet. Their eyeballs/brains must be going crazy from websites not following typography rules, and users reading at all different font sizes + device/monitor sizes. Where they want pixel/mm perfect typography, the rest of us want resizable/reflowable/customizable.

Quote:
Originally Posted by Psymon View Post
I hope you all don't think I'm crazy for nit-picking over hyphenation and word wrap like this, but, well, maybe I actually am crazy.
Meh, it is not "crazy", but a reflowable EPUB is not the format for that. Either go PDF/DJVU, or Fixed Format.

Quote:
Originally Posted by Psymon View Post
Nevertheless, I've been doing this "manually" all along so far, and wow, what an enormous time saver it would make if I could come up with a regex expression that could do this with a simple search & replace instead!
Oh yeah, Regex is a super time saver. Now I can do stuff in a few clicks that used to take me many hours (for example, fixing up Indexes, catching typos in page numbers, adding en dashes between numbers, etc. etc.).

Last edited by Tex2002ans; 11-11-2014 at 08:20 PM.
Tex2002ans is offline   Reply With Quote