Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2015, 12:28 PM   #1
adv_dp_fan
Zealot
adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!
 
Posts: 103
Karma: 57138
Join Date: May 2010
Device: Sony 505, iPad 1 & 3, Galaxy Note 8.1
Converting UK punctuation to US

Trying to find a tool or utility to convert files from UK punctuation to US (single quote for dialog to double quote for example). Any help would be appreciated.
adv_dp_fan is offline   Reply With Quote
Old 03-20-2015, 08:04 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by adv_dp_fan View Post
Trying to find a tool or utility to convert files from UK punctuation to US (single quote for dialog to double quote for example). Any help would be appreciated.
Both Sigil and Calibre Editor support Regular Expressions, which allow you to search for text enclosed by single or double quotes.

For example, you could use the following very simple expressions in Sigil and Calibre to replace single quotes with double quotes.

Find:‘(.*?)’
Replace:“\1”

(Make sure to select Regex from the Mode dropdown box.)

Note that you'll probably also have to convert British spellings to American spellings using VarCon or a similar tool.
Doitsu is offline   Reply With Quote
Advert
Old 03-20-2015, 11:38 AM   #3
adv_dp_fan
Zealot
adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!adv_dp_fan will blow your mind, man!
 
Posts: 103
Karma: 57138
Join Date: May 2010
Device: Sony 505, iPad 1 & 3, Galaxy Note 8.1
My problem with a simple regex is things like possessives or other times when you have a ' in the middle of a line. That's why I was looking for a tool that did a little more analysis of the text before making changes. It isn't an easy thing to just convert them I know, just wondered if anyone knew of any tool that did so.
adv_dp_fan is offline   Reply With Quote
Old 03-20-2015, 12:43 PM   #4
RobertDDL
Whatever...
RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.
 
RobertDDL's Avatar
 
Posts: 197
Karma: 1114225
Join Date: Feb 2015
Location: Austria
Device: PocketBook InkPad 840, Touch HD 2
Quote:
Originally Posted by adv_dp_fan View Post
My problem with a simple regex is things like possessives or other times when you have a ' in the middle of a line. That's why I was looking for a tool that did a little more analysis of the text before making changes. It isn't an easy thing to just convert them I know, just wondered if anyone knew of any tool that did so.
‘’Twas Zeus’ will, wasn’t it?’

I think that any ’ followed by a letter is an apostroph, and any ’ preceded by a letter is an apostroph, too? Because if it were a closing quote, it would be preceded by a punctuation mark? If I'm right, then you can use regular expressions to replace ’[a-zA-Z] and [a-zA-Z]’ with anything you like that doesn't appear in the text, then replace single quotes with double quotes (if opening and closing ones have the same numbers that's a good sign), and finally restore the apostrophs. And then check if this really worked

Hm, you may still have to deal with nested quotes... replace opening and closing double quotes with something else before you do the above, and replace them with single ones as the final step. Never entirely trust any automated process, though...
RobertDDL is offline   Reply With Quote
Old 03-20-2015, 01:39 PM   #5
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by RobertDDL View Post
‘’Twas Zeus’ will, wasn’t it?’

I think that any ’ followed by a letter is an apostroph, and any ’ preceded by a letter is an apostroph, too? Because if it were a closing quote, it would be preceded by a punctuation mark?
I'm afraid it's not so, you can have ‘single’ words or phrases between quotes, not to mention mistakes where the punctuation may be missing. It may be possible to detect these cases by keeping track of whether a quote has been opened or not, though.
Jellby is offline   Reply With Quote
Advert
Old 03-21-2015, 07:04 AM   #6
RobertDDL
Whatever...
RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.
 
RobertDDL's Avatar
 
Posts: 197
Karma: 1114225
Join Date: Feb 2015
Location: Austria
Device: PocketBook InkPad 840, Touch HD 2
Quote:
Originally Posted by Jellby View Post
I'm afraid it's not so, you can have ‘single’ words or phrases between quotes, not to mention mistakes where the punctuation may be missing.
Mistakes, yes, but you are also right about ‘single’ words, I hadn't been thinking properly. Also titles can be put in quotes, like:

A faint type of both characters may be found in the Surinam Yarico of Captain John Gabriel Stedman, whose ‘Narrative of a Five Years’ Expedition’ appeared in 1796.

So, I guess, it cannot be done without manual checking. Am I right that an apostroph at the end of a word will only appear after an s? If that's true, then it should be possible to search for [sS]’[ .,;:!?—] (shouldn't be too many) and replace the ones that are closing quotes with an unambiguous placeholder -- and for the rest the rule "if followed or preceded by a letter it's an apostrophe" applies?

Does this work now?
RobertDDL is offline   Reply With Quote
Old 03-21-2015, 11:09 AM   #7
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by RobertDDL View Post
Am I right that an apostroph at the end of a word will only appear after an s?
Depends on who's speakin'. For some books it will be a lil' complicated, particularly if they include phrases in languages other than English (un po' complicato in Italian, for instance).

(Not to mention that occasionally you can find the apostrophe after some s-sound which is not written with s, like -x or -ce.)

Last edited by Jellby; 03-21-2015 at 11:12 AM.
Jellby is offline   Reply With Quote
Old 03-21-2015, 11:46 AM   #8
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Jellby View Post
Depends on who's speakin'. For some books it will be a lil' complicated, particularly if they include phrases in languages other than English (un po' complicato in Italian, for instance).

(Not to mention that occasionally you can find the apostrophe after some s-sound which is not written with s, like -x or -ce.)
You may also find a lone ranger apostrophe at the end of words ending in 'z' and 'x' -- as in "...and in the kibbutz' fields grew cucumbers and tomatoes" or "The gearbox' selectors were broken."

Probably more often seen in older non-US/Canada publications.

BR
BetterRed is online now   Reply With Quote
Old 03-21-2015, 01:58 PM   #9
RobertDDL
Whatever...
RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.
 
RobertDDL's Avatar
 
Posts: 197
Karma: 1114225
Join Date: Feb 2015
Location: Austria
Device: PocketBook InkPad 840, Touch HD 2
Yes, sorry, you're both right of course.

It will have to be [a-zA-Z]’[ .,;:!?—] then, which depending upon the text, may mean having to look at quite a number of ’
RobertDDL is offline   Reply With Quote
Old 03-21-2015, 08:47 PM   #10
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by RobertDDL View Post
Yes, sorry, you're both right of course.

It will have to be [a-zA-Z]’[ .,;:!?—] then, which depending upon the text, may mean having to look at quite a number of ’
Yep - which begs the question - why bother.

In more than a few non-US/Can publications I've found that real apostrophes are used where they ought be used (contractions and possessives) and single quotes are used around dialogue and quotes. If that's true then maybe just convert the quotes and leave apostrophes alone to fulfill their designated purpose.

<sa-joke>We'll have a shortage of closing single quotes if they keep getting used as apostrophes, just as there's a shortage of semicolons since Algol and is successors started using them as full stops.<\sa-joke>

BR
BetterRed is online now   Reply With Quote
Old 03-22-2015, 03:34 AM   #11
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by BetterRed View Post
I've found that real apostrophes are used where they ought be used (contractions and possessives) and single quotes are used around dialogue and quotes.
What do you mean? When done properly (curly), the apostrophe and the closing single quote are exactly the same character, there's no automatic way to tell which is which, that's the root of the problem. Do you mean that apostrophes are straight and quotes are curly? I would replace that at once, it's so typewriterish...

When I code ebooks, I sometimes use &rsquo; for the quote and &#8217; for the apostrophe. These are just synonyms for the same U+2019 character, but at least they are easy to tell in the code.
Jellby is offline   Reply With Quote
Old 03-22-2015, 05:09 AM   #12
RobertDDL
Whatever...
RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.RobertDDL ought to be getting tired of karma fortunes by now.
 
RobertDDL's Avatar
 
Posts: 197
Karma: 1114225
Join Date: Feb 2015
Location: Austria
Device: PocketBook InkPad 840, Touch HD 2
Just read this today:

This Is How ‘Interstellar’’s Co-Writer Wanted the Movie to End

Unless I'm mistaken, the first ’ is the closing quote, and the second one the apostrophe... (and in OCR'd text, ’’ can also stand for mis-recognized closing double quotes when you have nested quotes...)

And, in headings or in verses, [a-zA-Z]’$ can be either an apostrophe or a closing quote...

I think that using the same symbol for apostrophes and closing quotes really hasn't been the best idea the British ever had.

I'm not an expert on Unicode and stay with Windows 1252 whenever I can, but even Unicode doesn't really help to clear up the mess, it seems:
http://www.unicode.org/L2/L2007/07241-mirroring.txt

So... a bit playing with regular expressions, and then we're back to good old proofreading/editing...
RobertDDL is offline   Reply With Quote
Old 03-22-2015, 05:58 AM   #13
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Jellby View Post
What do you mean? When done properly (curly), the apostrophe and the closing single quote are exactly the same character, there's no automatic way to tell which is which, that's the root of the problem. Do you mean that apostrophes are straight and quotes are curly? I would replace that at once, it's so typewriterish...

When I code ebooks, I sometimes use &rsquo; for the quote and &#8217; for the apostrophe. These are just synonyms for the same U+2019 character, but at least they are easy to tell in the code.
Jellby - I pay little attention to the code, I read the text

It could be that the 'apostrophes' I sometimes see are primes as in feet, inches, minutes and seconds marks. Pretty sure they're not straight quotes, I think I see them most in public domain official & legal texts from courts and tribunals etc, and media transcripts - maybe they come from the transcription technology/services they use.

I don't write the stuff, or republish it, I just download it, skim read it and file it.

But I think I have also seen them in commercial books from UK - would be factual books not fictional.

BR
BetterRed is online now   Reply With Quote
Old 03-23-2015, 12:43 AM   #14
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by BetterRed View Post
Pretty sure they're not straight quotes, I think I see them most in public domain official & legal texts from courts and tribunals etc, and media transcripts - maybe they come from the transcription technology/services they use.
I don't understand some source files where I see grave ` + acute ´ accents being used instead of the actual curly single quotes: ‘ ’.

And typically when I run across these source files, it is just one set of quotes that are completely wrong. Example, all right single quotes = acutes, while all left single quotes = dumb version.

Code:
This is an 'example´ of the 'mess´ I am 'talking´ about.
It is really a head scratcher.

Last edited by Tex2002ans; 03-23-2015 at 12:49 AM.
Tex2002ans is offline   Reply With Quote
Old 03-23-2015, 04:19 AM   #15
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
If it's the opposite, it could be a LaTeX source, or a text file influenced by that:

Code:
This is an `example' of how LaTeX reads `single' and ``double'' quotes
Jellby is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Smarten punctuation only? Psymon Conversion 3 10-20-2013 09:28 AM
Punctuation - who knows where? gmw Writers' Corner 13 08-03-2013 01:16 AM
Strange Punctuation converting PDF to MOBI BuzzB Conversion 1 04-08-2012 04:52 PM
Punctuation Dresden Calibre 7 08-31-2010 05:14 AM
Punctuation jgray Workshop 10 04-14-2010 07:38 AM


All times are GMT -4. The time now is 07:14 AM.


MobileRead.com is a privately owned, operated and funded community.