| 
 | |||||||
|  | 
|  | Thread Tools | Search this Thread | 
|  09-01-2014, 01:49 PM | #16 | ||||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 https://tex.stackexchange.com/questi...ween-sentences https://english.stackexchange.com/qu...riod-full-stop There are also these Wikipedia articles I just stumbled upon (plus check out those Related Articles): https://en.wikipedia.org/wiki/Senten..._digital_media https://en.wikipedia.org/wiki/Histor...ntence_spacing https://en.wikipedia.org/wiki/Senten...d_style_guides Side Note: It is quite interesting to learn, there used to be double-spacing between colons and semi-colons as well. I tended to agree with the Heraclitean arguments... He makes QUITE the case for the double-space. The double-space DOES serve a purpose that is different from a single space between words. BUT (and this is a big but), I tend to think the economic argument he gave is just the strongest argument against using the double space: Quote: 
 Also, keep in mind, you may have something like a footnote symbol, or page number, or reference in parenthesis after the period that "ends a sentence". How the heck are you going to automate fixing the spacing in THAT situation? Now, if you DO have a clean source, that you KNOW uses double-spacing correctly, I would agree with the Heraclitean article again: Quote: 
  You also have entire generations of "brainwashing" to single-space usage, that who knows who is using the double-space method correctly. Too many errors in too many source documents, that it is easier for me to toss everything out, and start from scratch. Like with Toxaris's EPUB Tools, just strip EVERYTHING down to the bare bones h1-6, p, b, i, blockquote, and continue from something you KNOW is clean/consistent. It would take too long to figure out the intricacies of THIS particular author's (crappy or not) usage of the tools, or to reverse engineer THIS particular set of unique calibre## classes, or reverse engineer THIS particular set of InDesign/Quark classes, and figure out how to implement them in my workflow. Perhaps if you had a workflow that you could completely trust... like an editor/typographer that knew what they were doing, you get used to THEIR exact style/workflow. So you could hand them a document, they would clean it for you, and you KNOW that you are getting some consistent input. I guess a tightly knit group of workers would be able to pull something off like that, sort of like what I do, I just work in a handful of small teams (2-5 people), and we get used to eachother's styles. Although even in that case, I don't trust ANY source fully (I am always running Regex and catching mistakes that were made).  Quote: 
 With a word processor or text editor, meh, I don't see too big of a deal if you use the double-space or single-space between sentences. Do whatever you are comfortable with. If you are dealing with something typographically more advanced than a word processor, you most likely already have access to more advanced tools, like variable-width spaces, variable-width fonts, and more advanced microtypography (like squeezing/stretching characters by tiny fractions). Although again, I would take the advantages of easily searchable/readable/maintainable code, over adding in too many manual interventions. I would leave the style decisions up to the heuristics of the program though, and I would take variable-length spacing between sentences over strict "double-spaces" any day of the week. Perhaps my mind will change the more I learn about typography. I must admit, I am currently barely scratching the surface in the "physical" side of things, I still have a ton to learn.  Most of my work is just focused on getting the text out of locked down formats, and getting these books into a reflowable format! Also, working from OCR doesn't help, I just strip out ALL nbsps generated by Finereader, because 99.9% of the cases, it is trash, then I can add them in if/when needed. Last edited by Tex2002ans; 09-01-2014 at 01:56 PM. | ||||
|   |   | 
|  09-01-2014, 04:40 PM | #17 | 
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I seem to see a lot of blame being layed at smartening algorithm's doorsteps instead of the users who apply them without discrimination. Most have options to granularly choose which elements you want to "smarten." For example, I tend to leave ellipses out of my smartening attempts--focusing only on quotation marks and dashes. They can be dangerous tools, sure, but that's mostly the fault of people indescriminately/lazily using them, no?
		 | 
|   |   | 
|  09-01-2014, 09:33 PM | #18 | 
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | 
			
			To be frank I don't give a rat's tail what the HTML looks like, I don't read HTML, I read what the author wrote.  Nor am I a perfectionist, if there's the odd extra space here and there where it didn't oughta be, then I promise my sky wont fall in   Its only me and a few colleagues who read what I edit, btw I start with marked up text. My colleagues wouldn't know if they were run over by a truckload of HTML or were bombed by a mountain of CSS dropped from a Bezos drone (they'd probably prefer a ship full of CCS); so they too are unlikely to get their knickers in a knot about such things. Roll on the day when the reader can configure all this stuff to what pleases their eyeballs, rather than being saddled with what others foist upon them. I'll try replacing "non-breaking space" with "no-width non break' and a "1/4 em space" and see if I like what I see when a copy the EPUB to my Note. BR | 
|   |   | 
|  09-01-2014, 09:34 PM | #19 | ||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 I also want to pull my hair out at a lot of these Content Management Systems (CMS), or these sites which automatically apply their smartening algorithms to text. People will just use the built-in tools to type, or copy/paste their document in, and it will get auto-smartified once it gets published to their WordPress or whatever, whether the input text used smart/dumb quotes or not. Although this is a dilemma, would you would want the algorithm to start completely from scratch to fix mistakes, or would you want it to not fix if you deliberately put the correct quotes in certain positions? Now that I have gotten a hawk eye for mismatched punctuation, I see the minor errors caused by them left and right!!! I would rather just have dumb quotes (like the MobileRead forums), than to have auto-smartified stuff.  You also have measurements like: 4'6"20° (to actually be PROPER, you would use a PRIME character (′) + DOUBLE PRIME character (″)), where the Smarten Punctuation algorithms insert a RIGHT SINGLE QUOTE (’) + RIGHT DOUBLE QUOTE (”). Proper: 4′6″20° Stay Dumb (still ok): 4'6"20° Smartened (wrong): 4’6”20° So if you have these measurements in your paragraph, the smarten algorithms will also get confused, and mangle the quotation marks further in the paragraph. Also, I have seen many of these algorithms where they take into account ONLY the "dumb quotes", instead of starting completely from scratch. So any text which accidentally holds some smart quotes, will get thrown off (think back again, copying/pasting material from another source). OR, I have seen certain algorithms get mangled when they are right next to an opening/closing HTML tag. Calibre's Smarten Punctuation algorithm causes a handful of these if my memory serves me right, next book I stumble across with it, I will definitely have to gather real examples. Quote: 
  I save Smartening Punctuation as one of the final steps, and I always do a Before/After EPUB. I then do a very thorough code compare to see EXACTLY what punctuation was smartened, and fix up any mistakes caused. Luckily, Finereader is able to OCR a lot of the Smart Quotes to match the quotes in the source document, so I only have to double-check a handful of comparisons where the Smarten Algorithm =/= the OCR text. I used to use Modify ePub's Smarten Punctuation, I have shifted over to Calibre's Smarten Punctuation, because it handles a few of those edge cases better. Most people would just push the Smarten button and move on, never seeing exactly what it changed. | ||
|   |   | 
|  09-02-2014, 11:12 AM | #20 | |
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
 Now if you want to throw back-tick quote conversion into the mix--that really screws the pooch. I turn that support off completely. My calibre editor plugin isn't "official" by any means, but it's posted in the Editor subforum in the following thread if you want to experiment. https://www.mobileread.com/forums/sho...d.php?t=243817 For all intents and purposes, those two are the same thing. Modify ePub calls the same internal calibre routines that calibre's own Smartening code calls (which is in fact, the SmartyPants script)--it (Modify ePub) just enabled you to do so without doing a full-blown conversion--before calibre's editor came along. I suppose the methods for calling SmartyPants could have deviated slightly (between Modify ePub and calibre's conversion parameters--and now the Smartening feature of calibre's editor) in recent versions, but last time I checked, they were identical. Last edited by DiapDealer; 09-02-2014 at 12:46 PM. | |
|   |   | 
|  09-02-2014, 05:47 PM | #21 | |||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 Quote: 
 That is the thing that I love about Toxaris's EPUB Tools, the "Check Dialogue" functionality is just above and beyond anything else I have used so far. So what I do now is Smarten Punctuation, toss into "Check Dialogue", and fix up as much as I can. Quote: 
 https://bugs.launchpad.net/calibre/+bug/1285351 I could have SWORN there was a topic a few months back too, fixing up ’em, ’tis, ’twas, ... because I got this helpful Regex from someone which was a more condensed version of the one I used before (I did a search and couldn't find the topic!!): Search: ‘(Em|em|Tis|tis|Twas|twas) Replace: ’\1 That is when I decided to finally hop over to Calibre's instead of Modify EPUB's, because of the few minor adjustments! | |||
|   |   | 
|  09-02-2014, 06:56 PM | #22 | |||
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
  It handles nested quotes and continuation quotes (no closing quote for the previous para) just fine in my experience. Quote: 
 Quote: 
 | |||
|   |   | 
|  09-02-2014, 06:57 PM | #23 | |
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
 I searched for threats in the Editor with posts by Tex....  BR | |
|   |   | 
|  09-02-2014, 07:13 PM | #24 | |||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 https://www.mobileread.com/forums/sho...d.php?t=171920 Which ALSO reminded me of another case where I have seen it break, is when a closing quote is right before/after an em or en dash. Again, I don't have any specific examples on hand, but I can recall it happening. And I thought of another example while I was OCRing last night, where "quotations" just get MANGLED. I deal with a lot of equations in text as well, and there are many cases of using "prime", "double prime", "triple prime", etc. etc. So x', y', m'', t'''. Again, I would avoid using the actual "prime" characters, and stick with the dumb equivalent (because of font issues on certain devices). In some cases, there are HUNDREDS of "primes" throughout the text, and running the Smartening Algorithms will also just completely mangle those (and mangle subsequent quotation marks). Quote: 
  Quote: 
  I really have to get around to organizing/categorizing older posts. So much good information just gets lost in the abyss! | |||
|   |   | 
|  09-03-2014, 11:02 AM | #25 | |||
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | Quote: 
 Quote: 
 Quote: 
 We may be talking about very different needs here, too. You seem to be in search of "fixup" tools to assist in the conversion of existing texts from physical to digital. Whereas I'm more focused on tools that will allow an "editor" to take content (from creative types) that is essentially correct (just typed with traditionally dumb characters/keyboards) and "smarten" it. | |||
|   |   | 
|  09-03-2014, 07:19 PM | #26 | ||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 It adds a left double quote after an em dash. Quote: 
 Again, I just fall back to the example of Toxaris's fantastic EPUB Tools. It will double-check the Smart Quotes, and then ask for user input on these "unsure" cases (is this supposed to be two opening quotation marks in a row? There is a closing quote with no open quote? The algorithm detected there are two/three quotation marks in a row (like in that "prime" example), is this correct?). One case discussing full automation, while another case is 90% automation, 10% human assistance. :P Now that I think about it, is there any tools out there which would do similar checks for all the other punctuation? For example, you can choose if ellipses/em dashes/en dashes should be set open/closed. I don't spend much time going from Style Guide A to Style Guide B, but there MUST be some tools out there to help editors speed up the process besides just a long list of Search/Replace. Side Note: Now you have sent me down another rabbit hole of research... curse you DiapDealer!   | ||
|   |   | 
|  09-03-2014, 09:28 PM | #27 | 
| Grand Sorcerer            Posts: 28,862 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | |
|   |   | 
|  09-03-2014, 10:25 PM | #28 | |
| null operator (he/him)            Posts: 22,006 Karma: 30277294 Join Date: Mar 2012 Location: Sydney Australia Device: none | Quote: 
 My theory is that its a bug inherited from Parc Place - cut 'n paste, no coffee & pasta...  BR Last edited by BetterRed; 09-03-2014 at 10:31 PM. | |
|   |   | 
|  09-04-2014, 04:13 AM | #29 | 
| Wizard            Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura | 
			
			There are even more though. In Dutch for example you have things like: 's avonds (and many more like it). Word will always give you the wrong quotation mark there by default.
		 | 
|   |   | 
|  09-04-2014, 06:50 AM | #30 | |
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | 
			
			Thanks for the Dutch example Toxaris. I haven't ventured too far into things too far out of the purview of English books... the first foreign book I did was a Canadian-French book, which is where I first ran into those non-breaking spaces around quotation marks, and non-breaking spaces before colons and semi-colons: Quote: 
 I had to look up a Canadian Style Guide (OF COURSE, slightly different rules from your normal English (US) or French typography). Luckily I was working from a quite clean DOC, so I didn't have to do the work. Even though I can't read French, I was still able to catch inconsistencies of the patterns I recognized by observing the code, so I was able to use Regex to fix 15 mistakes.  Now just be happy most of us work on the nice, clean, English books, and not those dirty French books full of spaces.  Side Note: Would be interesting to learn some more about typography in the international scene... I know I learned some interesting differences in Math Typography while going through LaTeX topics. Last edited by Tex2002ans; 09-04-2014 at 07:14 AM. | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| 0.8.63 strips non-breaking spaces when converting from epub to mobi | veezh | Calibre | 7 | 08-04-2012 08:39 AM | 
| [Old Thread] PDF to Epub conversion (spaces between letters) | mastroalex | Conversion | 8 | 10-09-2011 10:39 PM | 
| Blank spaces after header is removed | Mamaijee | Conversion | 2 | 05-26-2011 01:17 PM | 
| Non breaking spaces? | troymc | Sigil | 6 | 05-22-2010 07:47 AM |