Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 04-08-2013, 01:32 PM   #1
dwlamb
Member
dwlamb began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Apr 2013
Device: Android Smartphone
Regex search and replace

I have a file I am trying to clean up. It was converted from a pdf to epub and many new paragraphs were inserted where not necessary.

I stink at regular expression syntax for replace. To find the errant code, I have success with this:
Code:
</p>\n\n <p class="calibre2">[a-z]
For the replace string I merely wish to substitute <space> and whatever is found with the [a-z] component of the search string.

An example would be this:
Code:
.....the</p>

  <p class="calibre2">cat....
I want the replace to result as "...the cat ..."
dwlamb is offline   Reply With Quote
Old 04-08-2013, 02:02 PM   #2
Turtle91
Guru
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 669
Karma: 3807234
Join Date: Dec 2012
Location: Shannon, Ireland today
Device: iPhone 5/iPad 1&2/Surface Pro/Kindle PW
Thee are several examples of good regex to join lines like this. Here is a very basic/limited example that might help:

Search: ([a-e])</p>\s*<p class="calibre2">([a-z])
Replace: \1 \2 (space between \1 and \2)

But I would recommend doing a quick search of the forum for those example regexes (try the first sticky)....they have some great stuff!

Cheers,
Turtle91 is offline   Reply With Quote
Old 04-08-2013, 03:10 PM   #3
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,086
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
I would not fully trust any of them. Use them to find them and hit find/replace repeatedly to get rid of them, but there are places where there is a picture at a page, then there are ends of paragraphs, while others should be joined together.

Whatever regex you select try it for perhaps 20 pages, one at a time on find and see if you like where it does its thing. If it works for you, then go for it.
mrmikel is offline   Reply With Quote
Old 04-12-2013, 07:46 AM   #4
dwlamb
Member
dwlamb began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Apr 2013
Device: Android Smartphone
Thanks all for the advice.

mrmikel you are right. I never trust extensive search and replace even before dipping my toe in regex.

For this reason, I have moved editing my epubs out of Sigil. I export all the files to a mirror directory structure on my system and edit using text editors with better search and replace features than the engine provided in Sigil.

Quote:
Originally Posted by mrmikel View Post
I would not fully trust any of them. Use them to find them and hit find/replace repeatedly to get rid of them, but there are places where there is a picture at a page, then there are ends of paragraphs, while others should be joined together.

Whatever regex you select try it for perhaps 20 pages, one at a time on find and see if you like where it does its thing. If it works for you, then go for it.
dwlamb is offline   Reply With Quote
Old 04-12-2013, 08:59 AM   #5
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,086
Karma: 1444487
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
Do you know you can right click on a given file in the book browser and open it in an external program, like Notepad ++, or a graphics editor, while still remaining in Sigil?

I find this gives the best of both worlds.

As far as search and replace, no matter what program you use, you still have to be thoughtful because I am always finding one particular odd situation I didn't think of. For me with editing documents which come out of PDFs sometimes, there are errors no s/r can handle, like spaces in odd places in the middle of words. And there is no hope if the word starts with a letter like a that can stand by itself in a spell check. (sigh)
mrmikel is offline   Reply With Quote
Old 04-12-2013, 09:14 AM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,903
Karma: 5880479
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
There is no substitute for careful (and sometimes, tedious) hand editing.

Mr.
Smith

and so it went . . .
. . . The world ended with a whimper.

A.B.
C. Company

Monsters
Inc.

All you can do is automate the easy ones
then get to work on the rest
theducks is offline   Reply With Quote
Old 04-12-2013, 02:34 PM   #7
Tex2002ans
Fanatic
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 510
Karma: 392101
Join Date: Jul 2012
Device: Nook
I just answered this question last month (and again a few weeks ago). These are the regex I use to combine paragraphs from broken paragraphs from PDFs:

http://www.mobileread.com/forums/sho...89#post2446589
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex help: Edit Meta Search & Replace: Pad with zero _noel_ Calibre 4 11-26-2012 04:31 PM
regex search/replace Sharlene Sigil 10 01-28-2012 04:14 AM
Search & Replace/Regex help!! millertime13 Conversion 4 07-22-2011 02:40 AM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM
need regex help search and replace schuster Calibre 4 01-10-2011 09:00 AM


All times are GMT -4. The time now is 05:11 AM.


MobileRead.com is a privately owned, operated and funded community.