05-16-2009, 01:06 PM | #16 |
Resident Curmudgeon
Posts: 73,983
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
What happens if there is a missing quote?
|
05-16-2009, 02:57 PM | #17 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
The script should also identify apostrophes, in words like 'em, 'tis, and other transcriptions of spoken language (much too often one finds an opening quote in those cases instead). I also try to properly mark single quotes and apostrophes, so that I could convert single quotes into double quotes without fear of ruining apostrophes.
I do this with a mix of regexp and manual search and replace (each occurrence with the right character, which I map to hotkeys so that it's relatively easy to run along the text). This also helps locating possible missing quotes, and at the end there's always the reading phase, to confirm everything's right. If you are going the LaTeX way, you should also check the spacing after fullstops and question/exclamation marks. By convention LaTeX put a wider space after those, which you'd have to suppress (with a \@ after the sign) in abbreviations or other not-end-of-sentence cases, and force (with a \@ before) when they come after a capital letter. I did that for my Lewis Carroll PDFs, and it's time consuming, but the result looks great (to me). |
Advert | |
|
05-16-2009, 03:35 PM | #18 | |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Nothing - it doesn't get converted at all.
Quote:
|
|
05-17-2009, 05:51 AM | #19 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
I know it looks like I'm just trying to find the most difficult case, but I have actually found this in some of the Wodehouse books I've made, so it's a real case. |
|
05-17-2009, 06:04 AM | #20 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
|
Advert | |
|
05-17-2009, 06:17 AM | #21 |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I would, but it's not vim-regexp... would you care explaining it?
What would it do with: Code:
'Don't come callin' 'em so late' Code:
‘Don&8217;t come callin&8217; &8217;em so late’ |
05-17-2009, 06:28 AM | #22 | ||
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Quote:
Quote:
Code:
<p>'Don't come callin' 'em so late.'</p> Code:
<p>'Don't come callin' 'em so late,' he said angrily.</p> Code:
<p>{left_quote}Don't come callin' 'em so late.{right_quote}</p> |
||
05-17-2009, 06:37 AM | #23 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Ensuring properly nested single and double quotes when the source is not consistent would be harder, though. |
|
05-17-2009, 06:56 AM | #24 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
@pepak re: CSS
I think I need to study up on it. I'm using a sort of bastardized HTML mix. I've been using clips (macros) in NoteTab for a long time now, so I re-checked what I'm doing. I actually use both <a name="chapter_ChapterNumber"> and <h3 id="chapter_ChapterNumber" class="chapter" align="center"> in my files to mark a chapter. A combination of overkill and ignorance. I think it comes from haphazardly learning it as I needed it, and from working towards the obsolete REB1100. (Did you know that if you want a cover to appear on the first page of an REB1100 ebook, you must wrap it in a <center> tag? Otherwise, invisible!) Providing the CSS with the markup allows someone to simply change the CSS file to suit their needs. Awesome. Currently, without a new reader, I can't think of a way to write the code in such a way as to ensure forward-compatability. Another excuse to buy a new reader! And you've inspired me to consider rewriting my macros to include such things as <em class="psionic">. Which just reads cool. As for <span>, I'm still a little confused; I get the <div> styles thing -- open and close a style on what is otherwise a normal something-or-other, only distinguished by its class. And I get that the sections thing makes sense for, say, auto-searching the structure of a document, and offering an outline or some-such. But SPANs? Are we talking something that parallels <h> and <p>? Open a <span> on something and close it so that you can apply a sub-style there too? ie: <span class="dialogue"> or <span class="paragraph">? Or is it lesser? For instance: <span class="italic">? Or am I missing something completely? Which is entirely likely. m a r |
05-17-2009, 07:03 AM | #25 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
<span> is a generic in-line container, it's like <em>, <code> or <strong> but without any default meaning. |
|
05-17-2009, 08:02 AM | #26 | |||||
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Quote:
Quote:
Quote:
Quote:
Quote:
E.g. Code:
<div class="block"> <p>first paragraph</p> <p>second paragraph</p> <h2>header</h2> <p>more paragraphs</p> </div> Code:
<p>this is a <span style="text-decoration: underline;">sentence</span> where I want the word "sentence" underlined, which I can't do with U-tag in XHTML because it is deprecated.</p> |
|||||
05-17-2009, 08:15 AM | #27 | |||
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Thanks Jellby! Guess I was a little vague, there.
BTW, I just tried pepak's regex. Worked pretty well, finding left and right single-quotes (apostrophes, literally -- I just replaced the rsquo's from the example), when someone is quoting something inside dialogue. (ie: "He said 'xylophone,' did he?" asked Boojum.) Let me easily switch to lsquo and rsquo. Only had one false positive. There was a positive there too, but some short distance preceding it was an 'em and it was lumped into the positive, ie: Quote:
But I was working in an HTML document, so I modified it slightly afterward: Quote:
Worked awesomely, had only one similar false positive (that contained two positives) and beat my prior search regex: Quote:
But I did the modified run after initially running his regex. Anyone see a reason why it might not work straight-up? I'm having trouble imagining a sentence that would false positive because of ; and &... m a r Last edited by rogue_ronin; 05-17-2009 at 08:18 AM. |
|||
05-17-2009, 09:05 AM | #28 | ||||
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Quote:
Quote:
Quote:
Quote:
I approach your problem from the other side - I always insert a space between apostrophe/single-quote and quote/double-quote. Not only it makes my regexp work just fine, it is far nicer visually, too. Later on, when all quotes are converted, you can either remove the space or (better yet) convert it to non-breaking space. |
||||
05-17-2009, 09:09 AM | #29 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Anyway, I have more useful regexps prepared for my documents. At the moment their description is written in czech language only, but maybe it would be useful for others if I translated the post into english? Also, many (well, some :-)) of the regexps will be recognizable to the trained eye right away - here (about in the middle of the page). They are geared towards fixing errors made by FineReader 9.
|
05-17-2009, 12:31 PM | #30 | ||
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
You sure did mention that. But I wasn't paying enough attention...
I was thinking about starting a regex thread -- but you should do it. I may have a couple to contribute, after seeing yours. For instance: Quote:
Quote:
Is ' well-supported now? I'm of the mind that you should quote and apostrophe, etc. with either the entity-name tags (in HTML) or with the ascii/unicode character (in text) and not mix them up. But I cannot find that in real life much. Since a blanket search replace of ' with ’ does visually improve a text, I understand why it happens. As for the '{space}" layout you mention, I do try to change things to that -- but the texts I find are not always so neat. Therefore, I have to do it the hard way sometimes. Your regex was a little difficult to use on one text I did this afternoon: it used ’.” ’?” ’!” at the end of sentences and both ‘ ’ and “ ” unicode characters. A simple search/replace on individual characters probably would have been smarter -- and in fact I had to do that at the end. Then switch the rsquo and the punctuation. I've been rushing a bit to complete a goal, so I'm not taking enough time to figure it out ahead. (A set of 56 short stories [some are novellas] by a single author.) But it might not be the regex -- I'm running Win2k in a virtual machine to support NoteTab, and who knows what that can lead to. At one point I was getting only part of what I would copy to the clipboard. (Restarted, of course.) It's funny -- someone went to a lot of trouble to use curly quotes in this text, but did no work on mdashes vs. hyphens, ellipses, or to clean up blockquotes, or even spell-check thoroughly. Quite a haphazard use of italics, too. Weird. m a r |
||
Tags |
conversion, typography |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Kindle Typography | ChaoZ | Amazon Kindle | 21 | 08-14-2010 12:50 PM |
Is there hope for better ebook typography? | tomsem | Amazon Kindle | 0 | 08-12-2010 10:44 PM |
Typography on the iPad | LDBoblo | Apple Devices | 1 | 04-14-2010 03:33 PM |
French Typography | ahi | Workshop | 14 | 09-16-2009 02:22 PM |
Chinese Typography | ahi | Workshop | 81 | 09-14-2009 09:34 AM |