MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Regex examples (https://www.mobileread.com/forums/showthread.php?t=167971)

Jellby 07-04-2012 04:59 AM

An ePub is a zip, so just extract the file you want to modify, change it with vim (or your preferred editor), and zip it back.

mrjoeyman 07-04-2012 09:02 PM

Doh! I should have thunk of that! Thanks!! By the way I got to the end of my first tutorial with Vim. I can now say I performed my first "yank and put". Pretty neat editor. I have tried to make the ¬ character in Vim but it doesn't work as it does in this message. (alt-170). Can you tell me how to make it in Vim?

update: well it works as you said, sweet! I just copied the "¬" but I still would like to learn how to make it in Vim. Thanks.

Jellby 07-05-2012 05:26 AM

I use ¬ because I can easily input it with my keyboard layout (Spanish): AltGr+6. Use whatever symbol you can find in your keyboard that's not used elsewhere: #, ~, @...

signum 07-07-2012 01:05 AM

Quote:

Originally Posted by mrjoeyman (Post 2137262)
I have tried to make the ¬ character in Vim but it doesn't work as it does in this message. (alt-170). Can you tell me how to make it in Vim?

update: well it works as you said, sweet! I just copied the "¬" but I still would like to learn how to make it in Vim. Thanks.

Assuming you are in insert (or append) mode in vim:

<ctrl-v>uac<esc>

In human language, this means: hold down the "ctrl" key and press v, release both, type uac, then tap the "esc" key.

The "ctrl-v" says a multi-keystroke character follows, "u" means it is UTF, "ac" is the hex code for the "not" symbol, and "escape" ends the sequence.

Having said all that, it is a lot easier to just use some other seldom-used character that appears on your keyboard, such as "@", instead of the "not" character.

mrjoeyman 07-07-2012 10:49 AM

Thanks, you are right, simple is better :)

Danger 08-06-2012 11:22 AM

First thanks for everyones help here. While I haven't posted for help the answers to other peoples problems has helped me as well when I had similar questions. however I have a question that I don't see an answer to.

I am trying to remove a start and end div tag. These span an entire chapter.
Code:

<body>
  <div class="story" id="part-27">
...
  </div>
</body>

I've tried:
FIND
Code:

<div class="story" id="part-\d+">(.*?)</div>
&
Code:

<div class="story" id="part-\d+">(.*)</div>
and a few other variations but Sigil always returns a zero count. Just wondering what I am doing wrong. This isn't the first time I've run into this problem. Before I've just worked around it by working with much smaller bits but I'd like to know just what it is I am doing wrong because as far as I can tell that should work. Using Sigil 0.5.902

EDIT:
Ok it seems that the regex was fine, it just doesn't work in 0.5.902 but does work in 0.5.3 which I don't like using much for finding/replacing because over half the time I get left with a literal \1 instead of the actual text. Which of course I have to UNDO, FIND, REPLACE for each. Easy enough when it's a large block of text, not so easy when it's a word or sentence forcing me to do another FIND for any 1< instances. A REPLACE ALL is just a nightmare if you don't have a backup.

Pablo 08-06-2012 11:49 AM

Quote:

Originally Posted by Danger (Post 2175925)
Code:

<div class="story" id="part-\d+">(.*?)</div>

Try this:

Code:

<div class="story" id="part-[0-9]+">(.*?)</div>

Danger 08-06-2012 12:29 PM

Quote:

Originally Posted by Pablo (Post 2175947)
Try this:

Code:

<div class="story" id="part-[0-9]+">(.*?)</div>

Checked it on a backup (I'd already cleaned up the code on my working copy) but I still get "no matchs found" in v0.5.902. But thanks, that was one variation I hadn't tried yet.

paulfiera 08-06-2012 12:51 PM

Help with regex and chapters
 
I have a book I'm fixing where the chapters are named:

Code:

<p class="calibre4">1</p>
<p class="calibre4">2</p>

...and so on.

How can I change these occurences with, for instance
Code:

<h3>Chapter 1</h3>
<h3>Chapter 2</h3>

...and so on?

I've tried all the combinations I know of but can't seem to get it done.

Many thanks !
Paul

Doitsu 08-06-2012 01:55 PM

I suck at regular expressions, but this should work in Sigil 0.5.3:

Find: <p class="calibre4">(\d+)</p>
Replace: <h3>Chapter \1</h3>

Timur 08-06-2012 01:56 PM

@Danger: In most regex flavors dot(.) does not match newline characters by default. Your case requires the dot to match newlines. In Sigil either select Regex Dotall from the mode listbox(beta version does not have that mode iirc), or append (?s) in front of your find pattern. Example:

Code:

(?s)<div class="story" id="part-\d+">(.*)</div>

paulfiera 08-06-2012 01:59 PM

Quote:

Originally Posted by Doitsu (Post 2176057)
I suck at regular expressions, but this should work in Sigil 0.5.3:

Find: <p class="calibre4">(\d+)</p>
Replace: <h3>Chapter \1</h3>

Many thanks, Doitsu.

It definitely worked ! :thumbsup:

Danger 08-06-2012 02:04 PM

Quote:

Originally Posted by Timur (Post 2176059)
@Danger: In most regex flavors dot(.) does not match newline characters by default. Your case requires the dot to match newlines. In Sigil either select Regex Dotall from the mode listbox(beta version does not have that mode iirc), or append (?s) in front of your find pattern. Example:

Code:

(?s)<div class="story" id="part-\d+">(.*)</div>

Awesome, I knew it wasn't matching newlines but couldn't figure out how to get it to do so. Thank you Timur, that works great.

theducks 08-06-2012 10:52 PM

Quote:

Originally Posted by Danger (Post 2175925)
First thanks for everyones help here. While I haven't posted for help the answers to other peoples problems has helped me as well when I had similar questions. however I have a question that I don't see an answer to.

I am trying to remove a start and end div tag. These span an entire chapter.
Code:

<body>
  <div class="story" id="part-27">
...
  </div>
</body>

I've tried:
FIND
Code:

<div class="story" id="part-\d+">(.*?)</div>
&
Code:

<div class="story" id="part-\d+">(.*)</div>
and a few other variations but Sigil always returns a zero count. Just wondering what I am doing wrong. This isn't the first time I've run into this problem. Before I've just worked around it by working with much smaller bits but I'd like to know just what it is I am doing wrong because as far as I can tell that should work. Using Sigil 0.5.902

EDIT:
Ok it seems that the regex was fine, it just doesn't work in 0.5.902 but does work in 0.5.3 which I don't like using much for finding/replacing because over half the time I get left with a literal \1 instead of the actual text. Which of course I have to UNDO, FIND, REPLACE for each. Easy enough when it's a large block of text, not so easy when it's a word or sentence forcing me to do another FIND for any 1< instances. A REPLACE ALL is just a nightmare if you don't have a backup.

You need to tell it it is multiline

Code:

(?sm)<div class="story" id="part-\d+">(.*?)</div>

Gunnerp245 08-11-2012 09:46 AM

I would like to change the capitalization a particular phrase across a book e.g. chapter one to Chapter One. I can detect the instances using (\D+) (\D+) and know the replacement would be \1 \2, but not how to change the capitalization.


All times are GMT -4. The time now is 07:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.