02-22-2012, 03:53 PM | #1 |
Browser
Posts: 745
Karma: 578294
Join Date: Apr 2010
Location: Australia
Device: Kobo Touch, Kobo Aura HD
|
Creating smart quotes in Sigil?
Folks, I'm fairly new to using Sigil properly - I've just started trudging up the learning curve of coding - so pardon me for asking this. I thought it would have already been asked and answered in the past, but I couldn't find it with a search, so:
Is there a straightforward way to convert plain ASCII quotes to "smart" or "typographic" quotes? I can do a manual Find and Replace, searching for opening quotes, and then repeat it for closing quotes, and then apostrophes ... but is there a shorter route? Thanks for any help. |
02-22-2012, 04:11 PM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
You might be able to do it with some good RegEx's... Personally I do it in Word before creating the ePUB.
|
Advert | |
|
02-22-2012, 04:39 PM | #3 |
Browser
Posts: 745
Karma: 578294
Join Date: Apr 2010
Location: Australia
Device: Kobo Touch, Kobo Aura HD
|
Yes I do, too, Toxaris, thanks (Except I use WordPerfect - I dislike Word And ePubs I'm creating from scratch I put through Jutoh, rather than Sigil.)
But that's not why I was asking. I also have some fairly shoddily-formatted commercial epubs that give me the irrits, which I want to tidy up. Most of what I need to do I've nutted out, but not the quotes. Anyone got an answer for this, please? |
02-22-2012, 05:47 PM | #4 |
Evangelist
Posts: 412
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650, kobo Glo HD liseuses
|
I'm incompetent in Regex, so I have a fairly laborious procedure, which gets done in Notepad++ after any necessary scanning/OCR processes and cleaning-up line breaks:
(I prefer double quotes for speech, single quotes for abbreviations, apostrophes etc) -insert <p> at start of first line; -change all carriage-return/new-lines to </p>\r\n\r\n<p>; - insert </p> at end of last line; -change all <p>" to <p>“ -change all "</p> to ”</p>; -change all ^"space to ^”space (where ^ may be stop, comma, query or bang); -change all ^space" to ^space“ (where ^ may be stop, comma, query, bang, colon or semi-colon); -by now the number of instances of spacequote and quotespace should be sufficiently few to permit individual search/replace with double or single quotes as required - several passes may be required. - run through, tracking down the last few instances of quotes, then do a mega replace of single quotes with ’ for the abbreviations. -sort out the ndashes and ellipses; Tedious, but it gets me there in the end - a typical SF book of 8 signatures will take me 3 to 6 hours to read, correct and edit, i.e. from OCR-produced text file through to Sigil-ready html. I find it pays to use named entities - it's particularly helpful when converting a text that has single quotes for direct speech into double-quoted speech marks. I suspect there are various magic formulas in Regex which could do the job as well. If I can find a few spare brain cells one day, I may try going down that route. Bottom line - no easy solution |
02-22-2012, 06:14 PM | #5 | |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Last edited by DiapDealer; 02-22-2012 at 06:18 PM. |
|
Advert | |
|
02-22-2012, 06:25 PM | #6 | |
Berti
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
|
Quote:
In regex this should be Code:
(<p>|\s)"(\w) Code:
\1“\2 Code:
(\S)"(\s|</p>) I hope i got this regex right, i'm still on a learning curve as well . But in general it should work in most (maybe not all) cases. |
|
02-22-2012, 07:44 PM | #7 | |
Resident Curmudgeon
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
02-22-2012, 08:04 PM | #8 | |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Then your code can be as simple as: Read in the html file Pass the html data to the smartyPants function Write the data it returns to a new file. If you give me a little time, I might be able to post a pretty simple working sample (as long as you know your way around command-line stuff). |
|
02-22-2012, 09:22 PM | #9 | |
Resident Curmudgeon
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
02-22-2012, 10:48 PM | #10 |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
OK. I'm assuming you already have Python installed and working.
First get the latest smartypants.py script from here (bottom of the page). Either copy and paste the text from the browser or right-click and "save link as." Just make sure you end up with a file named "smartypants.py". (Alternatively, you can install smartypants.py as a library module in your Python installation if you know how to do that.) I tend to use the smartypants.py script included in the calibre source. Download and unzip the smarty-wrapper.py script from the zip-file I've attached to this post (the forum won't let me attach a .py file). Now... put the (x)html file you want to "smarten" in a folder along with the smartypants.py file and the smarty-wrapper.py file (you can forget about the smartypants.py script if you've already installed it as a module). From a command prompt... cd to the folder where all three files are located and issue the following command: Code:
python smarty-wrapper.py infile.html outfile.html You can read the extensive documentation that's included in the smartypants.py script in order to modify its behavior. Just know that the default action is to smarten quotes (double and single), change -- to an mdash, and to change three consecutive periods ... into an ellipsis … Last edited by DiapDealer; 02-22-2012 at 10:53 PM. |
02-22-2012, 11:18 PM | #11 |
Evangelist
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
|
I have some regex saved here which does a better job than smarty most of the time (or at least I've not run into bad cases). It needs some translation from GJSoft to PCRE - but I can do that a bit later if you are still looking for a way to do it.
Only thing I can think of is that it doesn't like multi-paragraph quotes. There's an easy way to find those tho. |
02-22-2012, 11:38 PM | #12 | |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
If you get it ported to PCRE, I'd certainly like to see it. It's always preferable to stay within Sigil if I possibly can. But whatever it is, it's going to have to work in code view (which means excluding the quotes around tag attributes) since Find & Replace in BV is just not practical/advisable with Sigil. |
|
02-27-2012, 09:59 PM | #13 |
Resident Curmudgeon
Posts: 73,957
Karma: 128903250
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Thank you for smarty-wrapper.py.
|
02-28-2012, 12:21 PM | #14 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
In the Modify ePub plugin for calibre I not so long ago added the option to just do "Smarten Punctuation" which will just run calibre's smarty pants stuff against the html files in the ePub (and do nothing else if you so choose it).
|
02-28-2012, 12:35 PM | #15 | |
Grand Sorcerer
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
Out of curiosity, will that leave the new smartened punctuation as HTML entities or does it change them to their unicode counterparts like the normal calibre conversion option seems to? EDIT: Nevermind, I see it changes them to their unicode equivalents (except for the elipsis)... which is what I want anyway. Thanks again. Last edited by DiapDealer; 02-28-2012 at 01:12 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Smart Quotes | Paxman53 | Sigil | 15 | 02-15-2012 08:31 PM |
Preserving smart quotes | petkusj | Writers' Corner | 3 | 05-10-2011 02:49 AM |
Smart Quotes | Toxaris | ePub | 2 | 05-31-2010 10:32 AM |
Removing smart quotes | horseyride | Workshop | 8 | 03-06-2008 12:08 PM |
Smart quotes in RTF? | ogghead | Sony Reader | 8 | 01-23-2007 06:38 PM |