MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Struggling with RegEx (https://www.mobileread.com/forums/showthread.php?t=201490)

phossler 01-02-2013 11:37 PM

Struggling with RegEx
 
Not having ANY luck getting a RegEx to work right

Need 1. Break 'joined words with single UC letter inside:

lowerCase --> lower+Case ( the + is really a space char)
LowerCase --> Lower+Case ( the + is really a space char)

Need 2. Superscript ordinal suffixes

89th --> 89<sup>th</sup>
1st --> 1<sup>st</sup>

Thanks

Paul

Danger 01-03-2013 01:51 AM

You could try the following...

Need 1:
Find: ([a-z])([A-Z])
Replace: \1+\2
Where + is really a space

Need 2:
Find: ([0-9])([a-z])([a-z])+
Replace: \1<sup>\2\3</sup>+
Where + is really a space

WS64 01-03-2013 01:54 PM

Quote:

Originally Posted by Danger (Post 2365112)
Need 2:
Find: ([0-9])([a-z])([a-z])+
Replace: \1<sup>\2\3</sup>+
Where + is really a space

That might be a bit dangerous since there could be other cases which should not be changed (like 50mph).

I use
find: (\s[0-9]+)(st|nd|rd|th)\s
replace: \1<sup>\2</sup>\s

Did I miss any (st|nd|rd|th) in my find statement? (not my native language)

Jellby 01-03-2013 02:13 PM

According to Wikipedia, the use of superscripts for ordinals in English should be avoided nowadays.

phossler 01-03-2013 02:57 PM

@Jellby --

Quote:

According to Wikipedia, the use of superscripts for ordinals in English should be avoided nowadays.
Thanks for the link

Quote:

The 16th edition of The Chicago Manual of Style states: "The letters in ordinal numbers should not appear as superscripts (e.g., 122nd not 122nd)", as do the Bluebook[1] and style guides by the Council of Science Editors,[2] Microsoft,[3] and Yahoo!.[4] Two problems are that superscripts are used "most often in citations" and are "tiny and hard to read".[1] Some word processors format ordinal indicators as superscripts by default (e.g. Microsoft Word[5]). Style guide author Jack Lynch (Rutgers) recommends turning off automatic superscripting of ordinals in MS Word, because "no professionally printed books use superscripts."[6]
Who can argue with that? :book2:

I'll also turn it off in my MS Word also


I will keep the RegEx and probably use it as a starting point for other things.

However, the most frustrating editing task right now is 'Need #1' to insert a space for text like "JohnSmith" (s/b "John Smith") and "missingLink" (s/b "missing Link"). I have no idea how so many words got joined:smack:

@Danger --

I'll try the suggestion tonight

Paul

Danger 01-03-2013 04:25 PM

Quote:

Originally Posted by WS64 (Post 2365829)
That might be a bit dangerous since there could be other cases which should not be changed (like 50mph).

While that might be dangerous. That search will only find any 2 lowercase letters that follow a number and have a space trailing them. So 50mph would not be picked up as it doesn't fit the criteria. Not saying it's fool proof though :chinscratch:

phossler 01-03-2013 11:29 PM

@Danger -- RegEx works very well

I was overthinking it

Simpler is better

Paul

Serpentine 01-04-2013 02:42 PM

When in doubt : RegexBuddy

Really cant recommend it enough, tho it might be closed and windows-centric, it's got no equal. (ok there's a pretty good tk tool similar, but it's very tricky for most to use).

WS64 01-05-2013 05:50 AM

[removed, I was reading the "+" wrong]

Hitch 01-06-2013 04:52 AM

Quote:

Originally Posted by Serpentine (Post 2367505)
When in doubt : RegexBuddy

Really cant recommend it enough, tho it might be closed and windows-centric, it's got no equal. (ok there's a pretty good tk tool similar, but it's very tricky for most to use).

+1 !

Hitch

ElMiko 01-13-2013 03:00 AM

you may want to be careful with the lowercase-upper separator, as it will match (and modify) body text such as "McDonalds" and html code such as "preserveAspectRatio" and "viewBox".

I use something like:

Code:

(?<!Mac|Mc)(?<=\p{Ll})\p{Lu}(?!spect|atio|ox[=])
to exclude false positives, but even then I correct each instance individually (ie, not in bulk), just to be sure.

you could add parentheses around "\p{Lu}" and set the replace value as "[blank space]\1", but again, i would still recommend cycling through each instance individually.


All times are GMT -4. The time now is 10:27 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.