![]() |
#1 |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
|
question about search and replace 0.4.904
Hello
.* is not expanding to multiple lines. So How do I do search over multiple lines? What happened to minimum search? |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,891
Karma: 59840954
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
|
what if I have 6 or 7 lines.
|
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 18
Karma: 10
Join Date: Dec 2010
Location: Geelong, Australia
Device: Kobo
|
Try putting (?s) at the start of your pattern.
e.g. Find like this - including what to keep within the brackets i.e. (.*?) (?s)<head>.*?<title>(.*?)</title>.*?</head> |
![]() |
![]() |
![]() |
#5 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,891
Karma: 59840954
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Keep on going
![]() every line end (EOL) there should be a \s+ I replace all white space in the Literal portion of the patterns with a \s+ <p>Foo\s+Bar\s+on(.+)\s+ The nice thing about Sigil is you get to test your pattern and see what it finds without doing a replace |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
|
(?s)<head>.*?<title>(.*?)</title>.*?</head>
The above works really well. I'm not familiar with ?s. Is that specific to sigil? Thank |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 18
Karma: 10
Join Date: Dec 2010
Location: Geelong, Australia
Device: Kobo
|
I'm not sure exactly where I learn't about it but it is used by PCRE which is the regex engine now used by Sigil.
Have a look at this link for background info that might help explain a bit about PCRE. http://en.wikipedia.org/wiki/Perl_Co...ar_Expressions It took me about 10 weeks to start to get my head around regex but once I started to get it I found that I could cut up to 80% off my conversion times (I take web based yarns and convert them to epub for my later re-reading). Look for regex tutorials on the web and keep butting your head against it and eventually it will get through. |
![]() |
![]() |
![]() |
#8 |
Sigil developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,274
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
Also take a look at the at the very good regex intro in the Calibre forum - https://www.mobileread.com/forums/sho...d.php?t=118569 which also links to the more extensive, although python specific, regex details at http://docs.python.org/library/re.html
The use of (?s) tells the expression to let '.' match everything including newlines - by default '.' matches everything except newlines. Since ? is used for different things in regex it can be a bit confusing (here with (? it means the next character(s) are flags for special handling and there are lots of them, ? after a normal character means 0 or 1 occurrences, and ? after * or + or ? means non-greedy/minimal matching). No wonder regex is confusing ![]() I have a feeling there will be more and more questions about regex. Maybe we need a sticky at the top (even to link to the Calibre post?) or a quick help guide link in Sigil itself with examples. I guess the trouble is that everyone has their own way of using regex, and any examples can quickly become overwhelming. |
![]() |
![]() |
![]() |
#9 |
Sigil developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,274
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
Since I haven't used (?s) before I was testing it out in 0.4.904. The head example above works fine, but when I modified it it didn't always work. It looks like it should, but it doesn't - so I'm interested in what I'm doing wrong, if this is what its supposed to do, or if there is a code issue.
Below is a short extract of a gutenberg book picked at random. Start Sigil with an empty book and go to Code View. Delete all the lines in the window. Paste the text below into the window. Put your cursor at the top and hit ctrl-f to bring up Find&Replace. Set Regex in Current File. Enter the Find expression (meaningless text split across lines):
Now put your cursor on line 28 - just before the line with Stoker in it, and press the Find button again. This time it says the search term is not found. In fact, it appears this is even simpler to demonstrate. Put your cursor on say line 31 at the start of 'class' then do a Find for ".*Release" (without quotes) - search term is not found. Now put your cursor at the very start of line 31 (or on line 30) and repeat the find - this time the line up to and including Release is highlighted. In 0.4.2 Find correctly highlights the subset of the line up to and including the word even if the cursor is not at the start of the line. Searching for Release.* correctly highlights the word and rest of line. And its easily checked if you just open Sigil with an empty file and change to code view and search for .*encoding with the cursor at the start of the line or a couple characters in, but since I already entered the test book I figured I'd leave it in the post. Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org" name="generator" /> <title>The Project Gutenberg eBook of Dracula's Guest, by Bram Stoker</title> <meta content="Project Gutenberg EPUB-Maker v0.02 by Marcello Perathoner <webmaster@gutenberg.org>" name="generator" /> <link href="../Styles/0.css" rel="stylesheet" type="text/css" /> <link href="../Styles/1.css" rel="stylesheet" type="text/css" /> <link href="../Styles/pgepub.css" rel="stylesheet" type="text/css" /> </head> <body> <h1 id="pgepubid00000">The Project Gutenberg eBook, Dracula's Guest, by Bram Stoker</h1> <div class="pgmonospaced pgheader"> <br /> This eBook is for the use of anyone anywhere at no cost and with<br /> almost no restrictions whatsoever. You may copy it, give it away or<br /> re-use it under the terms of the Project Gutenberg License included<br /> with this eBook or online at <a>www.gutenberg.org</a><br /> </div> <p class="noindent">Title: Dracula's Guest</p> <p class="noindent">Author: Bram Stoker</p> <p class="noindent">Release Date: November 20, 2003 [eBook #10150]<br /> [Most recently updated: November 7, 2006]</p> <p class="noindent">Language: English</p> <p class="noindent">Character set encoding: ISO-8859-1</p> <p class="noindent">***START OF THE PROJECT GUTENBERG EBOOK DRACULA'S GUEST***</p> </body> </html> |
![]() |
![]() |
![]() |
#10 |
Member
![]() Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
|
Thank you for the explanation on ?s. Regex itself is like a programing language. It takes a while to learn it.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Search and Replace Question | MacEvansCB | Conversion | 1 | 12-10-2011 02:19 PM |
calibre search & replace question | Kelby | ePub | 1 | 09-29-2011 01:14 PM |
Search & Replace question - something not right | curiosity | Library Management | 21 | 06-15-2011 11:33 AM |
Search/Replace Question | seagull | Sigil | 22 | 03-21-2011 01:30 PM |
search and replace - drops blanks in replace ? | cybmole | Conversion | 10 | 03-13-2011 03:07 AM |