MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Smart Quotes help (for Sigil plugin) (https://www.mobileread.com/forums/showthread.php?t=220542)

Perkin 08-20-2013 03:34 PM

Smart Quotes help (for Sigil plugin)
 
I'm working on a python script which hopefully will be an improvement on smartypants (used by plugin for Sigil), and would like some help with regards as what doesn't work correctly.

The ones I don't think will ever be satisfactorily solved will be where there's words which start with an apostrophe, such as
'Twas brillig, and the slithy toves....
(Having said that, I could have a list of known 'words - 'tis, 'twas 'cause etc.)

If you've got any known flubs, please can you let me know, with a small example as well if possible, and what it should look like when done correctly.

Ones like
Code:

John said, "The man said 'aaaaa'
" 'bbbb'
" 'cccc'
" dddd
" and then ended the story."

WRONG:
Code:

John said, “The man said ‘aaaaa’
 ” ‘bbbb’
 “ ‘cccc’
 ” dddd
 ” and then ended the story.”


RIGHT:
Code:

John said, “The man said ‘aaaaa’
 “ ‘bbbb’
 “ ‘cccc’
 “ dddd
 “ and then ended the story.”

The code I've got at moment does all the ones I've got correctly (including the above) - just need further test cases.

Thanks for any cases given.

DiapDealer 08-20-2013 04:06 PM

There's a weird situation I can't nail down 100% where smartypants reverses a closing quote (makes it an opening one). When it does happen, it seems to be near an emdash entity (or character). But that seems to be bug, rather than a special typographic situation it doesn't handle.

I also thought of creating a user-editable list/dictionary of 'tis-type words that could be integrated into smarty (or another) script. :)

I think you've identified the two big "deal-breaker" scenarios where SmartyPants is concerned, though.

Perkin 08-20-2013 04:14 PM

I think I know what you mean, but using your smartypants plugin - it does them correctly -- although it removes two spaces, where they appear between the dashes and the quotes. I think when the space stays there, smartypants did an opening quote rather than a closing one.

Code:

<p>He said, "Go away -- "</p>

<p>He said, "Go away --"</p>

<p>He said, 'Go away -- '</p>

<p>He said, 'Go away --'</p>


Jellby 08-20-2013 04:57 PM

A dash can be inside or outside quotes. Is this handled correctly?

Code:

<p>"Blah blah"--he said, and continued--"blah, blah blah."</p>
<p>"Blah blah--" he said, and continued, "--blah, blah, blah."</p>

and you already know that the dashes could be spaced or not...

DiapDealer 08-20-2013 06:46 PM

Quote:

Originally Posted by Jellby (Post 2599253)
A dash can be inside or outside quotes. Is this handled correctly?

Code:

<p>"Blah blah"--he said, and continued--"blah, blah blah."</p>
<p>"Blah blah--" he said, and continued, "--blah, blah, blah."</p>

and you already know that the dashes could be spaced or not...

Smartypants seems to handle that situation quite well on it's own--so far as I can tell.

In my wrapper script however, I do a little pre/post processing to achieve some personal goals that wouldn't be possible with smartypants alone (borrowing heavily from calibre). Those changes may not suit others, but they're pretty easily tweaked. For instance:

1) I preserve any html comments present. Smarty would butcher those double-dashes (calibre does the same thing).
2) I remove spaces that may occur on either side of double-dashes; simply because I find spaces before or after emdashes aesthetically unappealing when reading.
3) Smarty uses numeric entities for the quotation marks, emdashes and ellipses it creates. I've made arrangements to selectively convert those entities that Smarty creates to characters where it suits me.

I think Perkins' script is only going to be dealing with quotation marks, though. Which makes sense since "fixing" the double-dash and the "three consecutive periods" stuff is pretty trivial, really.

Perkin 08-20-2013 06:53 PM

Quote:

Originally Posted by Jellby (Post 2599253)
A dash can be inside or outside quotes. Is this handled correctly?

Code:

<p>"Blah blah"--he said, and continued--"blah, blah blah."</p>
<p>"Blah blah--" he said, and continued, "--blah, blah, blah."</p>

and you already know that the dashes could be spaced or not...

My code converts them to:
Code:

“Blah blah”–he said, and continued–“blah, blah blah.”
“Blah blah–” he said, and continued, “–blah, blah, blah.”

Ones I can't handle correctly are if there's a space before and after the quote, - one would always be wrong.
Code:

<p>"Blah blah-- " he said, and continued, " --blah, blah, blah."</p>
As it stands both the solitary quotes are converted to opening quotes, so the first would be wrong.

I'm working on correcting that.
Edit: Just solved that particular problem as well.

mrmikel 08-20-2013 07:58 PM

Glad you solved the problems of quotes with spaces. This is very very common in older works.

Perkin 08-22-2013 09:55 AM

Uploaded the plugin to DiapDealer' PlugIn thread, here
It's in post #19

Edit:
It converts quotes/apostophe and the mdash,ndash,ellipsis, and preserves html comments
and does the words that begin with an apostrophe - from an apos_exceptions.txt file

If you don't want it to do any of the (m/n)dash or ellipsis entities, you can comment out the lines (add a # to beginning of the line) in the smarten.py file
30, 31, 32 (calculate extras for the entities)
42, (add pre tags to comments)
56, 57, 58 (convert the entities)
119 (remove the pre tags from comments)

Steadyhands 08-22-2013 04:21 PM

Here's a list of my saved searches for Quote fixes. Some also include changing hyphens to mdash also. No text examples to go with them sorry.

Quote:

11\Name=Quote Fix/Quote fix1
11\Find=\\-([\\\x2018|\\\x201c])</p>
11\Replace=\x2014\x201d</p>
12\Name=Quote Fix/Quote fix2
12\Find=\x201c</p>
12\Replace=\x201d</p>
13\Name=Quote Fix/Quote fix3
13\Find=\\. \\\x201d \\\x2018
13\Replace=. \x201c \x2018
14\Name=Quote Fix/Quote fix4
14\Find="<p class=\\\"calibre(\\d+)\\\">\\\x201d \\\x2018"
14\Replace="<p class=\"calibre\\1\">\x201c \x2018"
15\Name=Quote Fix/Quote fix5
15\Find=</i> (\\p{P})\\\x201d
15\Replace=</i>\\1\x201d
16\Name=Quote Fix/Quote fix6
16\Find="<p class=\\\"calibre(\\d+)\\\">\\\x201d"
16\Replace="<p class=\"calibre\\1\">\x201c"
17\Name=Quote Fix/Quote fix7
17\Find="([\\!|\\.|\\?\\\x2026|\\,]) \\\x201d</p>"
17\Replace=\\1\x201d</p>
18\Name=Quote Fix/Quote fix8
18\Find="<p class=\\\"calibre(\\d+)\\\">\\\x201c\\s(?!\x2018)"
18\Replace="<p class=\"calibre\\1\">\x201c"
19\Name=Quote Fix/Quote fix9
19\Find="\\, \\\x201d\\s(?=[a-z])"
19\Replace=",\x201d "
20\Name=Quote Fix/Quote fix10
20\Find="<p class=\\\"calibre(\\d+)\\\">\\\x201c\\- "
20\Replace="<p class=\"calibre\\1\">\x201c\x2014"
21\Name=Quote Fix/Quote fix11
21\Find=(-|\x2013)\x201d
21\Replace=\x2014\x201d
22\Name=Quote Fix/Quote fix12
22\Find="\x2026 \x201d(?=[A-Z])"
22\Replace=\x2026 \x201c
PS, took the easy way out and cut and pasted form the sigil_searches.ini.


All times are GMT -4. The time now is 08:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.