Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-22-2012, 03:53 PM   #1
MacEachaidh
Browser
MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.
 
MacEachaidh's Avatar
 
Posts: 745
Karma: 578294
Join Date: Apr 2010
Location: Australia
Device: Kobo Touch, Kobo Aura HD
Creating smart quotes in Sigil?

Folks, I'm fairly new to using Sigil properly - I've just started trudging up the learning curve of coding - so pardon me for asking this. I thought it would have already been asked and answered in the past, but I couldn't find it with a search, so:

Is there a straightforward way to convert plain ASCII quotes to "smart" or "typographic" quotes?

I can do a manual Find and Replace, searching for opening quotes, and then repeat it for closing quotes, and then apostrophes ... but is there a shorter route?

Thanks for any help.
MacEachaidh is offline   Reply With Quote
Old 02-22-2012, 04:11 PM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You might be able to do it with some good RegEx's... Personally I do it in Word before creating the ePUB.
Toxaris is offline   Reply With Quote
Advert
Old 02-22-2012, 04:39 PM   #3
MacEachaidh
Browser
MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.MacEachaidh ought to be getting tired of karma fortunes by now.
 
MacEachaidh's Avatar
 
Posts: 745
Karma: 578294
Join Date: Apr 2010
Location: Australia
Device: Kobo Touch, Kobo Aura HD
Yes I do, too, Toxaris, thanks (Except I use WordPerfect - I dislike Word And ePubs I'm creating from scratch I put through Jutoh, rather than Sigil.)

But that's not why I was asking. I also have some fairly shoddily-formatted commercial epubs that give me the irrits, which I want to tidy up. Most of what I need to do I've nutted out, but not the quotes.

Anyone got an answer for this, please?
MacEachaidh is offline   Reply With Quote
Old 02-22-2012, 05:47 PM   #4
alecE
Evangelist
alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.
 
alecE's Avatar
 
Posts: 412
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650, kobo Glo HD liseuses
I'm incompetent in Regex, so I have a fairly laborious procedure, which gets done in Notepad++ after any necessary scanning/OCR processes and cleaning-up line breaks:
(I prefer double quotes for speech, single quotes for abbreviations, apostrophes etc)
-insert <p> at start of first line;
-change all carriage-return/new-lines to </p>\r\n\r\n<p>;
- insert </p> at end of last line;
-change all <p>" to <p>&ldquo;
-change all "</p> to &rdquo;</p>;
-change all ^"space to ^&rdquo;space (where ^ may be stop, comma, query or bang);
-change all ^space" to ^space&ldquo; (where ^ may be stop, comma, query, bang, colon or semi-colon);
-by now the number of instances of spacequote and quotespace should be sufficiently few to permit individual search/replace with double or single quotes as required - several passes may be required.
- run through, tracking down the last few instances of quotes, then do a mega replace of single quotes with &rsquo; for the abbreviations.
-sort out the ndashes and ellipses;
Tedious, but it gets me there in the end - a typical SF book of 8 signatures will take me 3 to 6 hours to read, correct and edit, i.e. from OCR-produced text file through to Sigil-ready html.

I find it pays to use named entities - it's particularly helpful when converting a text that has single quotes for direct speech into double-quoted speech marks. I suspect there are various magic formulas in Regex which could do the job as well. If I can find a few spare brain cells one day, I may try going down that route.
Bottom line - no easy solution
alecE is offline   Reply With Quote
Old 02-22-2012, 06:14 PM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
But that's not why I was asking. I also have some fairly shoddily-formatted commercial epubs that give me the irrits, which I want to tidy up. Most of what I need to do I've nutted out, but not the quotes.
SmartyPants (originally written in Perl and ported to Python). It's what calibre uses. Nothing's going to be perfect for this sort of thing but it does very well. Areas where it fails are fairly easy to clean up afterwards. I run xhtml through it before ever opening in Sigil. It will create them as HTML entities though.

Last edited by DiapDealer; 02-22-2012 at 06:18 PM.
DiapDealer is offline   Reply With Quote
Advert
Old 02-22-2012, 06:25 PM   #6
mmat1
Berti
mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.mmat1 ought to be getting tired of karma fortunes by now.
 
mmat1's Avatar
 
Posts: 1,196
Karma: 4985964
Join Date: Jan 2012
Location: Zischebattem
Device: Acer Lumiread
Quote:
Originally Posted by MacEachaidh View Post
Is there a straightforward way to convert plain ASCII quotes to "smart" or "typographic" quotes?
How would you describe the text, where an opening quote is? I guess, there's some white space or a <p> before and some letters after a opening quote.

In regex this should be
Code:
(<p>|\s)"(\w)
replaced with
Code:
\1&ldquo;\2
. For closing quotes:
Code:
(\S)"(\s|</p>)
.

I hope i got this regex right, i'm still on a learning curve as well . But in general it should work in most (maybe not all) cases.
mmat1 is offline   Reply With Quote
Old 02-22-2012, 07:44 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
SmartyPants (originally written in Perl and ported to Python). It's what calibre uses. Nothing's going to be perfect for this sort of thing but it does very well. Areas where it fails are fairly easy to clean up afterwards. I run xhtml through it before ever opening in Sigil. It will create them as HTML entities though.
How do you use SmartyPants? Do I have to write Python code to use it?
JSWolf is offline   Reply With Quote
Old 02-22-2012, 08:04 PM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
How do you use SmartyPants? Do I have to write Python code to use it?
At least a little bit. It's designed to be imported as a python module.

Then your code can be as simple as:
Read in the html file
Pass the html data to the smartyPants function
Write the data it returns to a new file.

If you give me a little time, I might be able to post a pretty simple working sample (as long as you know your way around command-line stuff).
DiapDealer is offline   Reply With Quote
Old 02-22-2012, 09:22 PM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DiapDealer View Post
At least a little bit. It's designed to be imported as a python module.

Then your code can be as simple as:
Read in the html file
Pass the html data to the smartyPants function
Write the data it returns to a new file.

If you give me a little time, I might be able to post a pretty simple working sample (as long as you know your way around command-line stuff).
Thanks. I'm very familiar with the command-line. I've been using a command line before a GUI was a thought in Xerox's mind.
JSWolf is offline   Reply With Quote
Old 02-22-2012, 10:48 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
OK. I'm assuming you already have Python installed and working.

First get the latest smartypants.py script from here (bottom of the page). Either copy and paste the text from the browser or right-click and "save link as." Just make sure you end up with a file named "smartypants.py". (Alternatively, you can install smartypants.py as a library module in your Python installation if you know how to do that.) I tend to use the smartypants.py script included in the calibre source.

Download and unzip the smarty-wrapper.py script from the zip-file I've attached to this post (the forum won't let me attach a .py file).

Now... put the (x)html file you want to "smarten" in a folder along with the smartypants.py file and the smarty-wrapper.py file (you can forget about the smartypants.py script if you've already installed it as a module).

From a command prompt... cd to the folder where all three files are located and issue the following command:
Code:
python smarty-wrapper.py infile.html outfile.html
Of course make sure to change infile.html to whatever the name of the file you're trying to modify really is. And change outfile.html to whatever you want your resulting new file to be called.

You can read the extensive documentation that's included in the smartypants.py script in order to modify its behavior. Just know that the default action is to smarten quotes (double and single), change -- to an mdash, and to change three consecutive periods ... into an ellipsis …
Attached Files
File Type: zip smarty-wrapper.zip (525 Bytes, 300 views)

Last edited by DiapDealer; 02-22-2012 at 10:53 PM.
DiapDealer is offline   Reply With Quote
Old 02-22-2012, 11:18 PM   #11
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
I have some regex saved here which does a better job than smarty most of the time (or at least I've not run into bad cases). It needs some translation from GJSoft to PCRE - but I can do that a bit later if you are still looking for a way to do it.

Only thing I can think of is that it doesn't like multi-paragraph quotes. There's an easy way to find those tho.
Serpentine is offline   Reply With Quote
Old 02-22-2012, 11:38 PM   #12
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Serpentine View Post
I have some regex saved here which does a better job than smarty most of the time (or at least I've not run into bad cases). It needs some translation from GJSoft to PCRE - but I can do that a bit later if you are still looking for a way to do it.

Only thing I can think of is that it doesn't like multi-paragraph quotes. There's an easy way to find those tho.
Yes, situations when there isn't always a closing quote is almost always going to cause issues with any automated script. Also apostrophes at the beginning of a word (like 'tis) are problematic for smarty as well (since there's no programmatic way to determine that it's just not an opening single quote). But overall, I've found it to do a much better job (not to mention quicker and easier) with double/single quotes, mdashes, ellipses than any regex I've come across (or come up with).

If you get it ported to PCRE, I'd certainly like to see it. It's always preferable to stay within Sigil if I possibly can. But whatever it is, it's going to have to work in code view (which means excluding the quotes around tag attributes) since Find & Replace in BV is just not practical/advisable with Sigil.
DiapDealer is offline   Reply With Quote
Old 02-27-2012, 09:59 PM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Thank you for smarty-wrapper.py.
JSWolf is offline   Reply With Quote
Old 02-28-2012, 12:21 PM   #14
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
In the Modify ePub plugin for calibre I not so long ago added the option to just do "Smarten Punctuation" which will just run calibre's smarty pants stuff against the html files in the ePub (and do nothing else if you so choose it).
kiwidude is offline   Reply With Quote
Old 02-28-2012, 12:35 PM   #15
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by kiwidude View Post
In the Modify ePub plugin for calibre I not so long ago added the option to just do "Smarten Punctuation" which will just run calibre's smarty pants stuff against the html files in the ePub (and do nothing else if you so choose it).
Brilliant! I wasn't aware of that. Thanks!
Out of curiosity, will that leave the new smartened punctuation as HTML entities or does it change them to their unicode counterparts like the normal calibre conversion option seems to?

EDIT: Nevermind, I see it changes them to their unicode equivalents (except for the elipsis)... which is what I want anyway. Thanks again.

Last edited by DiapDealer; 02-28-2012 at 01:12 PM.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Smart Quotes Paxman53 Sigil 15 02-15-2012 08:31 PM
Preserving smart quotes petkusj Writers' Corner 3 05-10-2011 02:49 AM
Smart Quotes Toxaris ePub 2 05-31-2010 10:32 AM
Removing smart quotes horseyride Workshop 8 03-06-2008 12:08 PM
Smart quotes in RTF? ogghead Sony Reader 8 01-23-2007 06:38 PM


All times are GMT -4. The time now is 06:40 PM.


MobileRead.com is a privately owned, operated and funded community.