Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2014, 02:39 PM   #436
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by signum View Post
Yeah, it works, sorta. But it matches all paragraphs with quotation marks, whether or not they are "valid", i.e. matched or balanced. This was first reported by the OP in msg #428, above.
Well, my regex was not meant to find paragraphs where the only quotation marks were at the end. In fact, I specifically precluded the idea. You need a second regex for that.

But they should correctly have found #1 and #3, and I cannot see why they would find, for example,
Code:
<p>"Hello."</p>
So how exactly was my regex matching valid paragraphs, I wonder? This bit should make it impossible (unless there was random extra space at the end or some such?):

Code:
[^"”]</p>
eschwartz is offline   Reply With Quote
Old 11-18-2014, 03:01 PM   #437
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Buchstabensalat View Post
Hi,

Thank you.

Using Sigil`s search (regex mode) your first string finds every single word in the text, the second string matches every uppercase word in my text and not only those that meet my search criteria ( uppercase letter(s) within a word). Is there a possiblity to match only those words with upercase letters in the word ?

Thanks again

Regards,

Buchstabensalat
I would use the following:

Find:
Code:
\b(\p{L})((?:\p{Ll}*\p{Lu})+\p{Ll}*)\b
Replace:
Code:
\1\L\2\E
NOTE: This uses unicode code points, on recommendation. Also, I have specifically saved the current case value of the first letter of the word, rather than assuming all words are lowercase.

Other than that, the only real difference from Doitsu's regex is that I specifically looked for uppercase letters inside the match, rather than applying an equalizer on all words -- this has the same result, but ignores matches where we wouldn't end up changing anything.
eschwartz is offline   Reply With Quote
Advert
Old 11-18-2014, 03:05 PM   #438
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
On an almost-related note, Kovid Goyal has just (as of 6 hours ago) begun work on function mode in calibre's editor!
eschwartz is offline   Reply With Quote
Old 11-19-2014, 02:00 PM   #439
Buchstabensalat
Member
Buchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one handBuchstabensalat can read with one hand
 
Posts: 12
Karma: 79192
Join Date: Nov 2014
Device: Kindle
Quote:
Originally Posted by eschwartz View Post
I would use the following:

Find:
Code:
\b(\p{L})((?:\p{Ll}*\p{Lu})+\p{Ll}*)\b
Replace:
Code:
\1\L\2\E
...
works fine
Buchstabensalat is offline   Reply With Quote
Old 11-19-2014, 02:24 PM   #440
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by Buchstabensalat View Post
works fine
You're welcome.
eschwartz is offline   Reply With Quote
Advert
Old 12-04-2014, 06:21 AM   #441
rubeus
Banned
rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.
 
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
Hi,

i'm currently working on combining paragraphs. I've a bunch of regex' which are working fine.

But for this:
Code:
...he went on,</i></p>
<p><i> shaking his head...
i'm unable to find a solution. Italic is just an example, it can be a <b>, <span> <img> something completely different.

What i can do is of course something like this:

,</i></p>\s+<p><i>

to replace it with

,

and have one for italic, one for bold (or even combine them with | but is it possible to have that a little bit more in general?

Thx

rubeus
rubeus is offline   Reply With Quote
Old 12-04-2014, 03:34 PM   #442
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Simple, a backreference.

Capture the first used tag, and match it later -- yes, backreferences work inside the regex, not just for replacing.

Instead of:
Code:
,</i></p>\s+<p><i>
Use:
Code:
,</(i|b|others)></p>\s+<p><\1>
Actually, you might want to think of matching tags with attributes as well.

Code:
,</(i|b|others)></p>\s+<p( .+)?><\1>
eschwartz is offline   Reply With Quote
Old 12-05-2014, 07:32 AM   #443
rubeus
Banned
rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.rubeus ought to be getting tired of karma fortunes by now.
 
Posts: 272
Karma: 1224588
Join Date: Sep 2014
Device: Sony PRS 650
Quote:
Originally Posted by eschwartz View Post
Code:
,</(i|b|others)></p>\s+<p><\1>
Thanks, and i will read this chapter

Quote:
Originally Posted by eschwartz View Post
Actually, you might want to think of matching tags with attributes as well.

Code:
,</(i|b|others)></p>\s+<p( .+)?><\1>
To be honest, i'm doing that, but was too lazy to write this down in my example
rubeus is offline   Reply With Quote
Old 12-05-2014, 09:00 AM   #444
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by rubeus View Post
Thanks, and i will read this chapter
Any time.

Quote:
To be honest, i'm doing that, but was too lazy to write this down in my example
eschwartz is offline   Reply With Quote
Old 12-29-2014, 04:05 AM   #445
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Have I over-complicated this?

My aim was to come up with one regex to find ‘s and <space>‘s that followed any Unicode text and could be followed by either a space or punctuation, then replace the curly quote and remove the extra space.
Find
Code:
((?<=([\p{Ll}]))‘s(?=([\p{P}|\s]))|(?<=([\p{Ll}]))\s‘s(?=([\p{P}|\s])))
Replace
Code:
’s

fooè ‘s foo foo‘s foo fo's foo ‘s foo ’s dog ‘scooter’
Steadyhands is offline   Reply With Quote
Old 01-02-2015, 01:59 PM   #446
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Code:
(?<=\p{Ll})\s?‘s(?=\p{P}|\s)
Mainly, just make the space optional which cuts the regex in half. You can also remove some of the groupings you used.
eschwartz is offline   Reply With Quote
Old 01-02-2015, 02:36 PM   #447
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Quote:
Originally Posted by eschwartz View Post
Code:
(?<=\p{Ll})\s?‘s(?=\p{P}|\s)
Mainly, just make the space optional which cuts the regex in half. You can also remove some of the groupings you used.
Thanks,

Changing it a little more I've started looking for things other than the s, allowing for uppercase and full stops.
Code:
(?<=\p{Ll}|\p{Lu}|\.)\s?‘(d|m|s|t|ll|re|ve)(?=\p{P}|\s)
Steadyhands is offline   Reply With Quote
Old 01-02-2015, 03:24 PM   #448
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Code:
(?<=\p{L}|\.)\s?‘(d|m|s|t|ll|re|ve)(?=\p{P}|\s)
Just look for letters, no need to look for "uppercase_letters OR lowercase_letters"

Do you anticipate any of these happening after a full stop? Perhaps that should be fixed... I can hear there being an uppercase word, e.g. someone screaming.

Last edited by eschwartz; 01-02-2015 at 03:26 PM.
eschwartz is offline   Reply With Quote
Old 01-02-2015, 03:55 PM   #449
Steadyhands
Connoisseur
Steadyhands began at the beginning.
 
Steadyhands's Avatar
 
Posts: 57
Karma: 10
Join Date: Dec 2011
Device: Samsung Tablet
Thanks again. The full stops count for the abbreviations ie A.B.C. ‘s
Steadyhands is offline   Reply With Quote
Old 01-02-2015, 04:27 PM   #450
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Oh, OK -- makes sense now.

And you're welcome.
eschwartz is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 09:45 AM.


MobileRead.com is a privately owned, operated and funded community.