Quote:
Originally Posted by DiapDealer
[...] It just guesses what should be an opening quote or a closing quote or an apostrophe. It's guessing extremely well in my experience. Granted; I'm sure there's certain complex quotation situations (or extra spacing between words and quotation marks) where the algorithm may fail ... but I gave up worrying about it. I don't deal with anything that complex (quotationally speaking).  It handles nested quotes and continuation quotes (no closing quote for the previous para) just fine in my experience.
|
There was also this topic a few years back which discussed Smarten Punctuation breaking due to spaces before/after quotation marks:
https://www.mobileread.com/forums/sho...d.php?t=171920
Which ALSO reminded me of another case where I have seen it break, is when a closing quote is right before/after an em or en dash. Again, I don't have any specific examples on hand, but I can recall it happening.
And I thought of another example while I was OCRing last night, where "quotations" just get MANGLED. I deal with a lot of equations in text as well, and there are many cases of using "prime", "double prime", "triple prime", etc. etc. So x', y', m'', t'''. Again, I would avoid using the actual "prime" characters, and stick with the dumb equivalent (because of font issues on certain devices).
In some cases, there are HUNDREDS of "primes" throughout the text, and running the Smartening Algorithms will also just completely mangle those (and mangle subsequent quotation marks).
Quote:
Originally Posted by DiapDealer
You may appreciate the plugin's ability to consult a user-defined, custom list of words that start with apostrophes.
|
Sounds fantastic, next time I have to run it, I will let you know. Currently, I have another large journal I am OCRing. This time, instead of a ~2 million word journal, it is just a lowly ~1.1 million words.
Quote:
Originally Posted by BetterRed
|
Yes yes, I believe that might have been the topic. I knew it was hiding there somewhere. Usually I am good at hunting down these older posts. (or stumbling upon other posts, like that one you mentioned Tex2002ans + LaTeX!)
I really have to get around to organizing/categorizing older posts. So much good information just gets lost in the abyss!