Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-01-2014, 01:49 PM   #16
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BetterRed View Post
@Tex2002ans - the author of the Heraclitean River blog and I are like minded in that he and I would prefer 1.5 spaces between sentences rather than one or two.
There was a bunch of "double-space after sentence" discussion in these TeX topics too:

https://tex.stackexchange.com/questi...ween-sentences
https://english.stackexchange.com/qu...riod-full-stop

There are also these Wikipedia articles I just stumbled upon (plus check out those Related Articles):

https://en.wikipedia.org/wiki/Senten..._digital_media
https://en.wikipedia.org/wiki/Histor...ntence_spacing
https://en.wikipedia.org/wiki/Senten...d_style_guides

Side Note: It is quite interesting to learn, there used to be double-spacing between colons and semi-colons as well.

I tended to agree with the Heraclitean arguments... He makes QUITE the case for the double-space. The double-space DOES serve a purpose that is different from a single space between words.

BUT (and this is a big but), I tend to think the economic argument he gave is just the strongest argument against using the double space:

Quote:
The death knell for the large sentence space, still imitated by a few Linotype operators in the mid-1900s with a double space, was more new technology. Further automation developed in the era of phototypesetting led to a situation where line breaks and extra spaces were truly problematic. Automatic line breaks could occur between two spaces, thereby beginning a new line with an undesirable empty space. The solution for programmers was simple: phototypesetters would simply ignore extra white space and treat it as an error. All white space would be collapsed to a single, multipurpose space.

[...]

In sum, the primary rationale behind the shift was probably not aesthetic, since printers had accepted the same conventions for centuries. Instead, it was a move generated by economic concerns. Publishers wanted cheaper books with less whitespace and less time and expertise to typeset, and the technology they developed required simpler and lazier methods of spacing.
It is just a complete pain in the butt to add in nbsp after every single sentence, and it is VERY hard to automate. I would probably toss it along the same lines of the "Smarten Punctuation" algorithms, you would get a huge amount of false positives: think abbreviations, shortened names, etc., etc.

Also, keep in mind, you may have something like a footnote symbol, or page number, or reference in parenthesis after the period that "ends a sentence". How the heck are you going to automate fixing the spacing in THAT situation?

Now, if you DO have a clean source, that you KNOW uses double-spacing correctly, I would agree with the Heraclitean article again:

Quote:
Typographers could actually make good use of all those people who still insist on double-spacing. They could use a find-and-replace to turn those double spaces into custom spaces that provide a nice respite after ends of sentences. Whether it’s actually double or 1.5 times or whatever would be a matter of taste, considered with the typeface, leading, etc.
In my whole experience though... most of the actual source material you get CANNOT be trusted. Just like 99% of the Word documents you receive will not be using Styles, if they do use Styles, they (in many cases) wouldn't be used 100% consistently, I wouldn't trust the authors with a ten foot pole with the horrors I have seen.

You also have entire generations of "brainwashing" to single-space usage, that who knows who is using the double-space method correctly.

Too many errors in too many source documents, that it is easier for me to toss everything out, and start from scratch. Like with Toxaris's EPUB Tools, just strip EVERYTHING down to the bare bones h1-6, p, b, i, blockquote, and continue from something you KNOW is clean/consistent.

It would take too long to figure out the intricacies of THIS particular author's (crappy or not) usage of the tools, or to reverse engineer THIS particular set of unique calibre## classes, or reverse engineer THIS particular set of InDesign/Quark classes, and figure out how to implement them in my workflow.

Perhaps if you had a workflow that you could completely trust... like an editor/typographer that knew what they were doing, you get used to THEIR exact style/workflow. So you could hand them a document, they would clean it for you, and you KNOW that you are getting some consistent input. I guess a tightly knit group of workers would be able to pull something off like that, sort of like what I do, I just work in a handful of small teams (2-5 people), and we get used to eachother's styles.

Although even in that case, I don't trust ANY source fully (I am always running Regex and catching mistakes that were made).

Quote:
Originally Posted by BetterRed View Post
I just feel more comfortable if the space between sentences exceeds the space between words. I edit to two spaces using a regular space & a non breaking space. If I wanted 1.5 spaces what would you suggest I use.
Bleh, with nbsps all over the place, the HTML would get too mangled, and it would create too much of a pain to search/edit directly in the code in my opinion. So the HTML case, I would just abandon it.

With a word processor or text editor, meh, I don't see too big of a deal if you use the double-space or single-space between sentences. Do whatever you are comfortable with.

If you are dealing with something typographically more advanced than a word processor, you most likely already have access to more advanced tools, like variable-width spaces, variable-width fonts, and more advanced microtypography (like squeezing/stretching characters by tiny fractions).

Although again, I would take the advantages of easily searchable/readable/maintainable code, over adding in too many manual interventions. I would leave the style decisions up to the heuristics of the program though, and I would take variable-length spacing between sentences over strict "double-spaces" any day of the week.

Perhaps my mind will change the more I learn about typography. I must admit, I am currently barely scratching the surface in the "physical" side of things, I still have a ton to learn. Most of my work is just focused on getting the text out of locked down formats, and getting these books into a reflowable format!

Also, working from OCR doesn't help, I just strip out ALL nbsps generated by Finereader, because 99.9% of the cases, it is trash, then I can add them in if/when needed.

Last edited by Tex2002ans; 09-01-2014 at 01:56 PM.
Tex2002ans is offline   Reply With Quote
Old 09-01-2014, 04:40 PM   #17
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,862
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I seem to see a lot of blame being layed at smartening algorithm's doorsteps instead of the users who apply them without discrimination. Most have options to granularly choose which elements you want to "smarten." For example, I tend to leave ellipses out of my smartening attempts--focusing only on quotation marks and dashes. They can be dangerous tools, sure, but that's mostly the fault of people indescriminately/lazily using them, no?
DiapDealer is online now   Reply With Quote
Old 09-01-2014, 09:33 PM   #18
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,006
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
To be frank I don't give a rat's tail what the HTML looks like, I don't read HTML, I read what the author wrote. Nor am I a perfectionist, if there's the odd extra space here and there where it didn't oughta be, then I promise my sky wont fall in

Its only me and a few colleagues who read what I edit, btw I start with marked up text. My colleagues wouldn't know if they were run over by a truckload of HTML or were bombed by a mountain of CSS dropped from a Bezos drone (they'd probably prefer a ship full of CCS); so they too are unlikely to get their knickers in a knot about such things.

Roll on the day when the reader can configure all this stuff to what pleases their eyeballs, rather than being saddled with what others foist upon them.

I'll try replacing "non-breaking space" with "no-width non break' and a "1/4 em space" and see if I like what I see when a copy the EPUB to my Note.

BR
BetterRed is offline   Reply With Quote
Old 09-01-2014, 09:34 PM   #19
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
I seem to see a lot of blame being layed at smartening algorithm's doorsteps instead of the users who apply them without discrimination.
Word Processors/Microsoft Word will most likely auto-Smartify while you are typing, or not fix the smart quotations if you delete/add information later (or copy/paste text from another source). Yes, this can be disabled, but most people won't.

I also want to pull my hair out at a lot of these Content Management Systems (CMS), or these sites which automatically apply their smartening algorithms to text. People will just use the built-in tools to type, or copy/paste their document in, and it will get auto-smartified once it gets published to their WordPress or whatever, whether the input text used smart/dumb quotes or not.

Although this is a dilemma, would you would want the algorithm to start completely from scratch to fix mistakes, or would you want it to not fix if you deliberately put the correct quotes in certain positions?

Now that I have gotten a hawk eye for mismatched punctuation, I see the minor errors caused by them left and right!!! I would rather just have dumb quotes (like the MobileRead forums), than to have auto-smartified stuff.

You also have measurements like: 4'6"20° (to actually be PROPER, you would use a PRIME character (′) + DOUBLE PRIME character (″)), where the Smarten Punctuation algorithms insert a RIGHT SINGLE QUOTE (’) + RIGHT DOUBLE QUOTE (”).

Proper: 4′6″20°
Stay Dumb (still ok): 4'6"20°
Smartened (wrong): 4’6”20°

So if you have these measurements in your paragraph, the smarten algorithms will also get confused, and mangle the quotation marks further in the paragraph.

Also, I have seen many of these algorithms where they take into account ONLY the "dumb quotes", instead of starting completely from scratch. So any text which accidentally holds some smart quotes, will get thrown off (think back again, copying/pasting material from another source).

OR, I have seen certain algorithms get mangled when they are right next to an opening/closing HTML tag. Calibre's Smarten Punctuation algorithm causes a handful of these if my memory serves me right, next book I stumble across with it, I will definitely have to gather real examples.

Quote:
Originally Posted by DiapDealer View Post
Most have options to granularly choose which elements you want to "smarten." For example, I tend to leave ellipses out of my smartening attempts--focusing only on quotation marks and dashes.
Hmmm... what are the Smarten Punctuation tools you use that allow such granularity? I would LOVE to upgrade!

I save Smartening Punctuation as one of the final steps, and I always do a Before/After EPUB. I then do a very thorough code compare to see EXACTLY what punctuation was smartened, and fix up any mistakes caused. Luckily, Finereader is able to OCR a lot of the Smart Quotes to match the quotes in the source document, so I only have to double-check a handful of comparisons where the Smarten Algorithm =/= the OCR text.

I used to use Modify ePub's Smarten Punctuation, I have shifted over to Calibre's Smarten Punctuation, because it handles a few of those edge cases better.

Most people would just push the Smarten button and move on, never seeing exactly what it changed.
Tex2002ans is offline   Reply With Quote
Old 09-02-2014, 11:12 AM   #20
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,862
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Tex2002ans View Post
Hmmm... what are the Smarten Punctuation tools you use that allow such granularity? I would LOVE to upgrade!
I use the same smartening algorithm that calibre does (Smarty Pants). It (the python script) has many configurable options that calibre just doesn't choose to present to the user. I either use my own wrapper script to call SmartyPants, or I also have a plugin for calibre's editor that I use that offers more granular control of the algorithm. As far as starting with a mixture of smart and "dumb" quotations, I've never had too many problems myself. The algorithm isn't really predicated on matching pairs of quotes to do what it does. It's based more on a situational/positional logic.

Now if you want to throw back-tick quote conversion into the mix--that really screws the pooch. I turn that support off completely.

My calibre editor plugin isn't "official" by any means, but it's posted in the Editor subforum in the following thread if you want to experiment.
https://www.mobileread.com/forums/sho...d.php?t=243817

Quote:
Originally Posted by Tex2002ans View Post
I used to use Modify ePub's Smarten Punctuation, I have shifted over to Calibre's Smarten Punctuation, because it handles a few of those edge cases better.
For all intents and purposes, those two are the same thing. Modify ePub calls the same internal calibre routines that calibre's own Smartening code calls (which is in fact, the SmartyPants script)--it (Modify ePub) just enabled you to do so without doing a full-blown conversion--before calibre's editor came along. I suppose the methods for calling SmartyPants could have deviated slightly (between Modify ePub and calibre's conversion parameters--and now the Smartening feature of calibre's editor) in recent versions, but last time I checked, they were identical.

Last edited by DiapDealer; 09-02-2014 at 12:46 PM.
DiapDealer is online now   Reply With Quote
Old 09-02-2014, 05:47 PM   #21
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
I either use my own wrapper script to call SmartyPants, or I also have a plugin for calibre's editor that I use that offers more granular control of the algorithm.

[...]

My calibre editor plugin isn't "official" by any means, but it's posted in the Editor subforum in the following thread if you want to experiment.
https://www.mobileread.com/forums/sho...d.php?t=243817
I will definitely check out your plugin and do some testing. All I really need is the Smart Quote functionality, I despise it messing with my ellipses, and that sounds perfect.

Quote:
Originally Posted by DiapDealer View Post
The algorithm isn't really predicated on matching pairs of quotes to do what it does. It's based more on a situational/positional logic.
Therein lies a large problem, you need an algorithm that is not just situational, but takes into account paragraphs at a time, the odd/even, or mismatching quotation marks (is there only a closing quote, with no opening quote in this paragraph? Are there two opening quotes in this paragraph?). And in hard cases, asking for manual intervention (really, the only way to fix some of these).

That is the thing that I love about Toxaris's EPUB Tools, the "Check Dialogue" functionality is just above and beyond anything else I have used so far. So what I do now is Smarten Punctuation, toss into "Check Dialogue", and fix up as much as I can.

Quote:
Originally Posted by DiapDealer View Post
I suppose the methods for calling SmartyPants could have deviated slightly (between Modify ePub and calibre's conversion parameters--and now the Smartening feature of calibre's editor) in recent versions, but last time I checked, they were identical.
Here was a bug where Kovid fixed "measurements" and "years" (like ’60s):

https://bugs.launchpad.net/calibre/+bug/1285351

I could have SWORN there was a topic a few months back too, fixing up ’em, ’tis, ’twas, ... because I got this helpful Regex from someone which was a more condensed version of the one I used before (I did a search and couldn't find the topic!!):

Search: ‘(Em|em|Tis|tis|Twas|twas)
Replace: ’\1

That is when I decided to finally hop over to Calibre's instead of Modify EPUB's, because of the few minor adjustments!
Tex2002ans is offline   Reply With Quote
Old 09-02-2014, 06:56 PM   #22
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,862
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Tex2002ans View Post
Therein lies a large problem, you need an algorithm that is not just situational, but takes into account paragraphs at a time, the odd/even, or mismatching quotation marks (is there only a closing quote, with no opening quote in this paragraph? Are there two opening quotes in this paragraph?). And in hard cases, asking for manual intervention (really, the only way to fix some of these).
I disagree (about positional logic being a problem), but I'm OK with other opinions. Do give the plugin a workout. I simply stopped running into situations where the quotation conversion didn't work as I expected it to. The algorithm just doesn't care if there are two opening quotes (or no closing quote). It's not designed to "fix," or match, quotation marks. It just guesses what should be an opening quote or a closing quote or an apostrophe. It's guessing extremely well in my experience. Granted; I'm sure there's certain complex quotation situations (or extra spacing between words and quotation marks) where the algorithm may fail ... but I gave up worrying about it. I don't deal with anything that complex (quotationally speaking). It handles nested quotes and continuation quotes (no closing quote for the previous para) just fine in my experience.

Quote:
Here was a bug where Kovid fixed "measurements" and "years" (like ’60s):

https://bugs.launchpad.net/calibre/+bug/1285351
Thanks for that, I need to make sure the version of SmartyPants I'm using in my plugin has that change incorporated.

Quote:
I could have SWORN there was a topic a few months back too, fixing up ’em, ’tis, ’twas, ... because I got this helpful Regex from someone which was a more condensed version of the one I used before (I did a search and couldn't find the topic!!):
You may appreciate the plugin's ability to consult a user-defined, custom list of words that start with apostrophes.
DiapDealer is online now   Reply With Quote
Old 09-02-2014, 06:57 PM   #23
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,006
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Tex2002ans View Post
I could have SWORN there was a topic a few months back too, fixing up ’em, ’tis, ’twas, ... because I got this helpful Regex from someone which was a more condensed version of the one I used before (I did a search and couldn't find the topic!!):
@Tex2002ans - Is this it ==>> Find straight quotes in the text

I searched for threats in the Editor with posts by Tex....

BR
BetterRed is offline   Reply With Quote
Old 09-02-2014, 07:13 PM   #24
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
[...] It just guesses what should be an opening quote or a closing quote or an apostrophe. It's guessing extremely well in my experience. Granted; I'm sure there's certain complex quotation situations (or extra spacing between words and quotation marks) where the algorithm may fail ... but I gave up worrying about it. I don't deal with anything that complex (quotationally speaking). It handles nested quotes and continuation quotes (no closing quote for the previous para) just fine in my experience.
There was also this topic a few years back which discussed Smarten Punctuation breaking due to spaces before/after quotation marks:

https://www.mobileread.com/forums/sho...d.php?t=171920

Which ALSO reminded me of another case where I have seen it break, is when a closing quote is right before/after an em or en dash. Again, I don't have any specific examples on hand, but I can recall it happening.

And I thought of another example while I was OCRing last night, where "quotations" just get MANGLED. I deal with a lot of equations in text as well, and there are many cases of using "prime", "double prime", "triple prime", etc. etc. So x', y', m'', t'''. Again, I would avoid using the actual "prime" characters, and stick with the dumb equivalent (because of font issues on certain devices).

In some cases, there are HUNDREDS of "primes" throughout the text, and running the Smartening Algorithms will also just completely mangle those (and mangle subsequent quotation marks).

Quote:
Originally Posted by DiapDealer View Post
You may appreciate the plugin's ability to consult a user-defined, custom list of words that start with apostrophes.
Sounds fantastic, next time I have to run it, I will let you know. Currently, I have another large journal I am OCRing. This time, instead of a ~2 million word journal, it is just a lowly ~1.1 million words.

Quote:
Originally Posted by BetterRed View Post
@Tex2002ans - Is this it ==>> Find straight quotes in the text

I searched for threats in the Editor with posts by Tex....
Yes yes, I believe that might have been the topic. I knew it was hiding there somewhere. Usually I am good at hunting down these older posts. (or stumbling upon other posts, like that one you mentioned Tex2002ans + LaTeX!)

I really have to get around to organizing/categorizing older posts. So much good information just gets lost in the abyss!
Tex2002ans is offline   Reply With Quote
Old 09-03-2014, 11:02 AM   #25
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,862
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Tex2002ans View Post
There was also this topic a few years back which discussed Smarten Punctuation breaking due to spaces before/after quotation marks:
Yes that would absolutely mess with most current smartening algorithms. However, I would think that kind of "typesetting" preference would be best applied post-production, anyway. After creation/smartening/whathaveyou.

Quote:
Which ALSO reminded me of another case where I have seen it break, is when a closing quote is right before/after an em or en dash. Again, I don't have any specific examples on hand, but I can recall it happening.
I have no doubt this may have happened in the past, but I've not seen any instances of this in a long, long time. It is my contention that SmartyPants was/is often getting the blame for something that MS Word's on-the-fly smartening feature does by default. Turn that feature on in Word and type a line that starts with a quote; finishes with an emdash, and watch what happens to the closing quote after it's added.

Quote:
And I thought of another example while I was OCRing last night, where "quotations" just get MANGLED.
I have no doubt. But surely you're not suggesting it should be the responsibility of a "smartening" algorithm to detect/correct OCR errors are you? I don't consider that as part of its purview myself. I consider it the user's responsibility to hand any automated smartening routine an essentially "correct" (just punctuationally "dumb") text. Garbage In/Garbage Out still very much applies.

We may be talking about very different needs here, too. You seem to be in search of "fixup" tools to assist in the conversion of existing texts from physical to digital. Whereas I'm more focused on tools that will allow an "editor" to take content (from creative types) that is essentially correct (just typed with traditionally dumb characters/keyboards) and "smarten" it.
DiapDealer is online now   Reply With Quote
Old 09-03-2014, 07:19 PM   #26
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
It is my contention that SmartyPants was/is often getting the blame for something that MS Word's on-the-fly smartening feature does by default. Turn that feature on in Word and type a line that starts with a quote; finishes with an emdash, and watch what happens to the closing quote after it's added.
Wow, I couldn't believe it!!! Same exact thing with LibreOffice!

It adds a left double quote after an em dash.

Quote:
Originally Posted by DiapDealer View Post
We may be talking about very different needs here, too. You seem to be in search of "fixup" tools to assist in the conversion of existing texts from physical to digital. Whereas I'm more focused on tools that will allow an "editor" to take content (from creative types) that is essentially correct (just typed with traditionally dumb characters/keyboards) and "smarten" it.
Indeed, I guess we are sort of speaking about slightly different tools. You are talking straight Smartening Punctuation, while I am talking more about a Smartening Punctuation "Plus".

Again, I just fall back to the example of Toxaris's fantastic EPUB Tools. It will double-check the Smart Quotes, and then ask for user input on these "unsure" cases (is this supposed to be two opening quotation marks in a row? There is a closing quote with no open quote? The algorithm detected there are two/three quotation marks in a row (like in that "prime" example), is this correct?).

One case discussing full automation, while another case is 90% automation, 10% human assistance. :P

Now that I think about it, is there any tools out there which would do similar checks for all the other punctuation? For example, you can choose if ellipses/em dashes/en dashes should be set open/closed.

I don't spend much time going from Style Guide A to Style Guide B, but there MUST be some tools out there to help editors speed up the process besides just a long list of Search/Replace.

Side Note: Now you have sent me down another rabbit hole of research... curse you DiapDealer!
Tex2002ans is offline   Reply With Quote
Old 09-03-2014, 09:28 PM   #27
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,862
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Tex2002ans View Post
Side Note: Now you have sent me down another rabbit hole of research... curse you DiapDealer!
Glad I could be of assistance.
DiapDealer is online now   Reply With Quote
Old 09-03-2014, 10:25 PM   #28
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 22,006
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Tex2002ans View Post
Wow, I couldn't believe it!!! Same exact thing with LibreOffice!

It adds a left double quote after an em dash.
That 'feature' has been in every WP product I've used in the last 10-15 years, maybe longer. I know it's in Word 1997/2003/2007/2010 (not sure about 2013), its always been in Open Office, it was in the last version of WordPrefect I used, pretty sure it is/was in Angel Writer too. And IIRC it was in calibre smarten punc, I think Kovid fixed it earlier this year.

My theory is that its a bug inherited from Parc Place - cut 'n paste, no coffee & pasta...

BR

Last edited by BetterRed; 09-03-2014 at 10:31 PM.
BetterRed is offline   Reply With Quote
Old 09-04-2014, 04:13 AM   #29
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
There are even more though. In Dutch for example you have things like: 's avonds (and many more like it). Word will always give you the wrong quotation mark there by default.
Toxaris is offline   Reply With Quote
Old 09-04-2014, 06:50 AM   #30
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Thanks for the Dutch example Toxaris.

I haven't ventured too far into things too far out of the purview of English books... the first foreign book I did was a Canadian-French book, which is where I first ran into those non-breaking spaces around quotation marks, and non-breaking spaces before colons and semi-colons:

Quote:
[...]

<p>Au cours de notre discussion, nous avons découvert une vérité très importante&nbsp;: la monnaie est une marchandise. Apprendre cette leçon simple est l’une des tâches les plus importantes qui soient. Bien souvent, on parle de la monnaie comme si c’était plus que cela – ou moins. Mais la monnaie n’est pas une unité de compte abstraite, différente d’un bien&nbsp;; ni un jeton inutile qui ne servirait qu’aux échanges&nbsp;; ce n’est pas une «&nbsp;créance sur la société&nbsp;»&nbsp;; ni une garantie ou un niveau de prix stable. C’est une simple marchandise. Elle diffère des autres biens parce qu’elle est recherchée principalement pour son rôle de moyen d’échange. Mais, sinon, c’est une marchandise – et, comme toutes les marchandises, il en existe un certain stock, elle est demandée par des gens désirant l’acheter, la conserver etc. Comme toutes les marchandises, son «&nbsp;prix&nbsp;»</p>

[...]
Made me want to pull my hair out!

I had to look up a Canadian Style Guide (OF COURSE, slightly different rules from your normal English (US) or French typography).

Luckily I was working from a quite clean DOC, so I didn't have to do the work. Even though I can't read French, I was still able to catch inconsistencies of the patterns I recognized by observing the code, so I was able to use Regex to fix 15 mistakes.

Now just be happy most of us work on the nice, clean, English books, and not those dirty French books full of spaces.

Side Note: Would be interesting to learn some more about typography in the international scene... I know I learned some interesting differences in Math Typography while going through LaTeX topics.

Last edited by Tex2002ans; 09-04-2014 at 07:14 AM.
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
0.8.63 strips non-breaking spaces when converting from epub to mobi veezh Calibre 7 08-04-2012 08:39 AM
[Old Thread] PDF to Epub conversion (spaces between letters) mastroalex Conversion 8 10-09-2011 10:39 PM
Blank spaces after header is removed Mamaijee Conversion 2 05-26-2011 01:17 PM
Non breaking spaces? troymc Sigil 6 05-22-2010 07:47 AM


All times are GMT -4. The time now is 02:50 PM.


MobileRead.com is a privately owned, operated and funded community.