Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-15-2011, 10:40 AM   #1
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
Help with regular expression search/replace

When converting ebooks from any format for use on my Kindle, I always convert to epub first and do any editing or cleanup in Sigil before the final conversion to mobi. A common problem I run into when converting files from one format or another to epub is left or right quotes with no space between them and the preceding or following character. An extra space is easy to find but any other character is not so easy to find.

I'm thinking there should be a way to search for these occurrences with a regular expression but I'm not familiar enough with them to come up with one that works. I've tried and haven't had much luck so far.

Would anyone out there more familiar with regular expressions be able to assist me? Basically, I want to be able to find any string where anycharacterexceptspace/“ or ”/anycharacterexceptspace and be able to replace it with anycharacterexceptspace/ “ or ” /anycharacterexceptspace.

Any ideas?

Thanks.

- Byron
bfollowell is offline   Reply With Quote
Old 02-15-2011, 10:49 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,779
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by bfollowell View Post

Would anyone out there more familiar with regular expressions be able to assist me? Basically, I want to be able to find any string where anycharacterexceptspace/“ or ”/anycharacterexceptspace and be able to replace it with anycharacterexceptspace/ “ or ” /anycharacterexceptspace.

Any ideas?

Thanks.

- Byron
My 'cheat sheet' says \S (capital S )
match any Except white space
theducks is offline   Reply With Quote
Advert
Old 02-16-2011, 03:34 PM   #3
Ahmad Samir
Zealot
Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!Ahmad Samir , Klaatu Barada Niktu!
 
Posts: 114
Karma: 5246
Join Date: Jul 2010
Device: none
IINM, \S will match '<' from </p> at the end of each paragraph.

I think \w should work, it matches any alphanumeric character (plus _ ).

So:
Find: ”([\w.,?!])
Replace with: ” \1

i.e. find ” followed by a word character OR . OR , OR ? OR !


And:
Find: ([\w.,?!])“
Replace with: \1 “
Ahmad Samir is offline   Reply With Quote
Old 02-20-2011, 02:47 AM   #4
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
my fix - just looks for A-Z or for a-z as needed
no space following quotes
find "([A_Z])
replace " \1

vary the above as needed. To ensure I search for the right sort of quote I copy / paste a quote mark from the code view into the find box
cybmole is offline   Reply With Quote
Old 02-20-2011, 04:10 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Just a question, why would you want a space between a quote mark and the text? You have to be careful that your xhtml tags are not changed as well.
Toxaris is offline   Reply With Quote
Advert
Old 02-20-2011, 04:17 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i think because "this" is "correct" grammar but"this"is"wrong".
cybmole is offline   Reply With Quote
Old 02-20-2011, 05:59 AM   #7
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Ah, is see. But you have to be careful you don't get thing like:

Then he said: " What is this? "

or

<p class=" stylish " >

That is one reason why I use smart/curly quotes. Another is that I really like those quotes and feel that straight quotes have a different meaning.
Toxaris is offline   Reply With Quote
Old 02-20-2011, 06:12 AM   #8
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Note that Ahmad Samir used curly quotes in his expressions.
Jellby is offline   Reply With Quote
Old 02-20-2011, 06:16 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Jellby View Post
Note that Ahmad Samir used curly quotes in his expressions.
yes, the solutions given only work if open quote looks different to / can be distinguished from closing quote

and quotes within quotes is a whole new ball game!
cybmole is offline   Reply With Quote
Old 06-20-2013, 05:53 AM   #10
Funslinger
Member
Funslinger began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2012
Device: Kobo Touch
Quote:
Originally Posted by cybmole View Post
yes, the solutions given only work if open quote looks different to / can be distinguished from closing quote

and quotes within quotes is a whole new ball game!
This situation can be handled fairly easily using recursion to match opening and closing quotes.

I don't know if the regular expression engine in the text editor TextMate is the same as the one in Sigil. But the following regular expression will find a string consisting of an entire html element in TextMate.

<\?xml[^>]+>|<!DOCTYPE(?:[^\]]*]>|[^>]*>)|<[^/ >]+[^>]*/>|<(?<tagname>[^/ >]+)[^>]*>(?<!/>)(?<html>[^<]|<[^/ >]+[^>]*/>|<(?<tagname>[^/ >]+)[^>]*>(?<!/>)\g<html>*</\k<tagname+0>>)*</\k<tagname+0>>


example: take the following string of text.

<p>This is an <i>example</i> paragraph.</p><p>This is a second paragraph.</p>

If the cursor is at the beginning of the text, the regular expression will match <p>This is an <i>example</i> paragraph.</p>. If the cursor is after the first < and not after the second <, it will match <i>example</i>. If the cursor is after the second <, it will match <p>This is a second paragraph.</p>

In other words, it matches the first opening html tag encountered with its appropriate closing tag. But it will only work on properly formatted html. For example, in this improperly formatted html string

<p>This is the first paragraph<p>This is the second paragraph</p>

it will not match the first paragraph because the first closing tag </p> is missing.

The regular expression can handle tags that close themselves like <p/> or <div/> or <link href="my.css" type="text/css" rel="stylesheet"/> or <a name="chap4" id="chap4"/>.

Last edited by Funslinger; 06-20-2013 at 05:58 AM.
Funslinger is offline   Reply With Quote
Old 06-20-2013, 05:18 PM   #11
signum
Zealot
signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.signum calls his or her ebook reader Vera.
 
Posts: 119
Karma: 64428
Join Date: Aug 2011
Device: none
This will do it:
[^ ]“
where the quote mark is a left curly quote. The key is that a leading caret inside square brackets means "anything but", so we have "match anything but a space, followed by a left curly quote", just as the OP asked.
signum is offline   Reply With Quote
Old 06-20-2013, 05:28 PM   #12
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,887
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Let's say the ePub has a line such as...

This will be a <span class="smallcaps">TEST</span>. This is a <span class="smallcaps">TEST</span>. This is no longer a <span class="smallcaps">TEST</span>.

Notice we have three spans. What I want to do is select each span individually. Can this be done? I want to take the contents of span #1 and span #3 and make them lowercase and leave span #2 alone.
JSWolf is offline   Reply With Quote
Old 06-20-2013, 07:36 PM   #13
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
if all the spans are on the same line you could use the one below. if there are more than 3 spans it captures the first 3, then the next 3. if there are 5 it captures only the first 3.

i think i'd recommend not doing this with regex though.

Code:
find:
(?<=<span class="smallcaps">)([^<\n]+)(</span>[^<\n]*)(<span class="smallcaps">)([^<\n]+)(</span>[^<\n]*)(<span class="smallcaps">)([^<]+)(?=</span>)

replace:
\1\2\3\4\5\6\7
where you'd perform operations on \1, \4 and \7. so given

Code:
This will be a <span class="smallcaps">TEST</span>. This is a <span class="smallcaps">TEST</span>. This is no longer a <span class="smallcaps">TEST</span>.

Code:
first\2\3second\5\6third

This will be a <span class="smallcaps">first</span>. This is a <span class="smallcaps">second</span>. This is no longer a <span class="smallcaps">third</span>.
i'd initially grouped the first <span class="smallcaps"> in the lookahead and then inserted it like \2\3\1\4\5\1\6 but apparently sigil didn't like reusing the backreference. possibly a bug?

--edit

also the ([^<\n]+) is strange to me in that i had to include the \n so that it didn't match across lines. not sure why this is, though.

Last edited by mzmm; 06-20-2013 at 07:41 PM.
mzmm is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Help Azhad Calibre 86 09-27-2011 02:37 PM
Search & Replace - Regular expression oldbwl Calibre 2 01-09-2011 09:33 AM
Regular Expression Help iKarampa Calibre 13 12-15-2010 07:17 AM
Regular expression help krendk Calibre 4 12-04-2010 04:32 PM
Find/Replace with regular expression hydrolith Sigil 6 03-01-2010 08:42 PM


All times are GMT -4. The time now is 03:28 AM.


MobileRead.com is a privately owned, operated and funded community.