04-08-2013, 01:32 PM | #1 |
Member
Posts: 14
Karma: 10
Join Date: Apr 2013
Device: Android Smartphone
|
Regex search and replace
I have a file I am trying to clean up. It was converted from a pdf to epub and many new paragraphs were inserted where not necessary.
I stink at regular expression syntax for replace. To find the errant code, I have success with this: Code:
</p>\n\n <p class="calibre2">[a-z] An example would be this: Code:
.....the</p> <p class="calibre2">cat.... |
04-08-2013, 02:02 PM | #2 |
A Hairy Wizard
Posts: 3,059
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Thee are several examples of good regex to join lines like this. Here is a very basic/limited example that might help:
Search: ([a-e])</p>\s*<p class="calibre2">([a-z]) Replace: \1 \2 (space between \1 and \2) But I would recommend doing a quick search of the forum for those example regexes (try the first sticky)....they have some great stuff! Cheers, |
Advert | |
|
04-08-2013, 03:10 PM | #3 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
I would not fully trust any of them. Use them to find them and hit find/replace repeatedly to get rid of them, but there are places where there is a picture at a page, then there are ends of paragraphs, while others should be joined together.
Whatever regex you select try it for perhaps 20 pages, one at a time on find and see if you like where it does its thing. If it works for you, then go for it. |
04-12-2013, 07:46 AM | #4 | |
Member
Posts: 14
Karma: 10
Join Date: Apr 2013
Device: Android Smartphone
|
Thanks all for the advice.
mrmikel you are right. I never trust extensive search and replace even before dipping my toe in regex. For this reason, I have moved editing my epubs out of Sigil. I export all the files to a mirror directory structure on my system and edit using text editors with better search and replace features than the engine provided in Sigil. Quote:
|
|
04-12-2013, 08:59 AM | #5 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
Do you know you can right click on a given file in the book browser and open it in an external program, like Notepad ++, or a graphics editor, while still remaining in Sigil?
I find this gives the best of both worlds. As far as search and replace, no matter what program you use, you still have to be thoughtful because I am always finding one particular odd situation I didn't think of. For me with editing documents which come out of PDFs sometimes, there are errors no s/r can handle, like spaces in odd places in the middle of words. And there is no hope if the word starts with a letter like a that can stand by itself in a spell check. (sigh) |
Advert | |
|
04-12-2013, 09:14 AM | #6 |
Well trained by Cats
Posts: 29,662
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
There is no substitute for careful (and sometimes, tedious) hand editing.
Mr. Smith and so it went . . . . . . The world ended with a whimper. A.B. C. Company Monsters Inc. All you can do is automate the easy ones then get to work on the rest |
04-12-2013, 02:34 PM | #7 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
I just answered this question last month (and again a few weeks ago). These are the regex I use to combine paragraphs from broken paragraphs from PDFs:
https://www.mobileread.com/forums/sho...89#post2446589 |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex help: Edit Meta Search & Replace: Pad with zero | _noel_ | Calibre | 4 | 11-26-2012 04:31 PM |
regex search/replace | Sharlene | Sigil | 10 | 01-28-2012 04:14 AM |
Search & Replace/Regex help!! | millertime13 | Conversion | 4 | 07-22-2011 02:40 AM |
search and replace - drops blanks in replace ? | cybmole | Conversion | 10 | 03-13-2011 03:07 AM |
need regex help search and replace | schuster | Calibre | 4 | 01-10-2011 09:00 AM |