![]() |
#16 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
|
![]() |
![]() |
![]() |
#17 | |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Quote:
I even tried the variant. |
|
![]() |
![]() |
Advert | |
|
![]() |
#18 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Mystery solved.
Tex2002ans, your regexes don't account for classes, how could you! ![]() Fixed: Quote:
|
|
![]() |
![]() |
![]() |
#19 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,071
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Another trailing or leading hyphen gotcha is the emdash substitute (--).
IMHO replace those with a emdash as step 0 I would love a S&R with an additional alternate replace in case of step 2 is a true hyphenated word split between lines where you can select the Normal Replace (merge lines with no space) and alternate replace ( just Merge). Thus saving 2 passes. The other (third) choice is still Find (Skip) |
![]() |
![]() |
![]() |
#20 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,071
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
version Code:
</span></p>\s+<p class="normal"><span class="override1"> ![]() The Span case is very fragile (and destructive) if there are additional mid paragraph spans. Use extra care, work on a single section until valid (and frequent saves every time you show no errors in the Preview. Do not rly on auto fix solutions ![]() |
|
![]() |
![]() |
Advert | |
|
![]() |
#21 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Difference is, classes are expected by default, spans are not.
The generic solution caters to common cases, which this is. spans, not so much. |
![]() |
![]() |
![]() |
#22 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Next time, I will have to be even more specific. (Typically, I color code all the sections of the Regex too!). And yes, they have to be run #1, then #2 (or its variant, depending on if you want to do hyphenation fixes now or later), then #3. I have a lot more I Regex I recommend after that, although it might get a little too technical in here. (And it does take forever to write these things). ![]() Side Note: I convert a ton of non-fiction economics books from PDF -> EPUB, and deal with cleaning up a lot of crap. I use those regex to mostly piece together lines/paragraphs that broke across pages, or were OCRed incorrectly. Quote:
![]() I personally wouldn't recommend the one that handles every <p class="">.... because who knows what a given calibre# associates to ("calibre2" could be your typical paragraph, but "calibre3" could be a blockquote (extra margin on the left), "calibre4" could be right alignment, "calibre5" could be small font, etc. etc.). Example: That "all classes" Regex would break in these cases. Instead of using a <blockquote> tag, the book might have used something along these lines: Code:
<p>This is a quote from Tex2002ans</p> <p class="blockquote1">This is a sample blockquote sentence.</p> <p class="blockquote2">This is some more sentences.</p> <p class="blockquote2">And this is the end.</p> <p>Continue with the story.</p> Code:
<p class="poem">This is a poem,</p> <p class="poem2">that is written by Tex.</p> <p class="poem">This is a poem,</p> <p class="poem2">that will break the Regex.</p> ![]() Quote:
Even though I trust Regex #2 and Regex #3 with my life, I still have them in my Sigil's Saved Searches under the heading, "One at a Time". Last edited by Tex2002ans; 12-23-2014 at 02:40 PM. |
|||
![]() |
![]() |
![]() |
#23 |
Series Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Thank you all!
The books did use "< span >" quite a bit (should anyone else have a similar issue). As for my case, between this thread, experimenting with the "Edit Book" function and some fancy google-foo by my husband, the issue is solved. |
![]() |
![]() |
![]() |
#24 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
![]() And that is a rationale for double-checking each one, not for writing a regex that doesn't handle lots of stuff. Alternatively, you can always do it your way... assuming you add another step for clearing up the classes. ![]() FWIW, I agree that my first step would be to clean up the styles, tossing out everything that wasn't very deliberate. Last edited by eschwartz; 12-23-2014 at 08:25 PM. |
|
![]() |
![]() |
![]() |
#25 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,738
Karma: 30237526
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Are 'borrowed' words ever returned? I know of none. Linguists are as bad as comp scientists at butchering my language ![]() I guessed the regex aficionado troops would turn up to do battle. Hubby's are handy too. BR Last edited by BetterRed; 12-23-2014 at 06:59 PM. |
|
![]() |
![]() |
![]() |
#26 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,786
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
My suggestion is to contact the shop you bought it from and try to get your money back. It's a major botch job. |
|
![]() |
![]() |
![]() |
#27 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
|
|
![]() |
![]() |
![]() |
#28 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,738
Karma: 30237526
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
![]() ![]() ![]() ![]() Jon - People who say it cannot be done should not interrupt those who are doing have already done it. George Bernard Shaw ![]() BR Last edited by BetterRed; 12-23-2014 at 08:00 PM. |
|
![]() |
![]() |
![]() |
#29 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,786
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
But there is no guarantee that it's done correctly. There's no guarantee that all the lines are in all of the correct paragraphs. No matter how much is looks OK, it very well might not be OK.
|
![]() |
![]() |
![]() |
#30 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
* -- Being a perfectionist, I did my next reread with the book and working version of the ebook side-by-side and fixed it anyway. ![]() |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Indenting first line of paragraph | jimvde | Workshop | 3 | 07-22-2013 04:16 AM |
Indenting first line of each paragraph? | dandelioncottage | Sigil | 3 | 04-10-2012 07:08 AM |
Chapters are one giant paragraph. How to fix? | bfollowell | Conversion | 9 | 02-03-2011 01:20 PM |
First paragraph line indents | jhempel24 | Sigil | 10 | 11-23-2010 07:26 PM |
scanned PDF has weird paragraph breaks. Possible to fix | lunixer | 0 | 08-30-2010 10:47 PM |