Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-04-2012, 06:34 PM   #1
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
question about search and replace 0.4.904

Hello

.* is not expanding to multiple lines. So How do I do search over multiple lines? What happened to minimum search?
congngo is offline   Reply With Quote
Old 01-04-2012, 07:21 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,798
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by congngo View Post
Hello

.* is not expanding to multiple lines. So How do I do search over multiple lines? What happened to minimum search?
\s+ will continue at the line end
line1 pattern\s+line2pattern
theducks is online now   Reply With Quote
Advert
Old 01-04-2012, 08:21 PM   #3
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
what if I have 6 or 7 lines.
congngo is offline   Reply With Quote
Old 01-04-2012, 08:54 PM   #4
BartB
Member
BartB began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Dec 2010
Location: Geelong, Australia
Device: Kobo
Try putting (?s) at the start of your pattern.

e.g.

Find like this - including what to keep within the brackets i.e. (.*?)
(?s)<head>.*?<title>(.*?)</title>.*?</head>
BartB is offline   Reply With Quote
Old 01-04-2012, 09:05 PM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,798
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by congngo View Post
what if I have 6 or 7 lines.
Keep on going
every line end (EOL) there should be a \s+
I replace all white space in the Literal portion of the patterns with a \s+
<p>Foo\s+Bar\s+on(.+)\s+

The nice thing about Sigil is you get to test your pattern and see what it finds without doing a replace
theducks is online now   Reply With Quote
Advert
Old 01-05-2012, 12:02 AM   #6
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
(?s)<head>.*?<title>(.*?)</title>.*?</head>

The above works really well. I'm not familiar with ?s. Is that specific to sigil?

Thank
congngo is offline   Reply With Quote
Old 01-05-2012, 12:23 AM   #7
BartB
Member
BartB began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Dec 2010
Location: Geelong, Australia
Device: Kobo
I'm not sure exactly where I learn't about it but it is used by PCRE which is the regex engine now used by Sigil.

Have a look at this link for background info that might help explain a bit about PCRE.

http://en.wikipedia.org/wiki/Perl_Co...ar_Expressions

It took me about 10 weeks to start to get my head around regex but once I started to get it I found that I could cut up to 80% off my conversion times (I take web based yarns and convert them to epub for my later re-reading).

Look for regex tutorials on the web and keep butting your head against it and eventually it will get through.
BartB is offline   Reply With Quote
Old 01-05-2012, 06:19 AM   #8
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Also take a look at the at the very good regex intro in the Calibre forum - https://www.mobileread.com/forums/sho...d.php?t=118569 which also links to the more extensive, although python specific, regex details at http://docs.python.org/library/re.html

The use of (?s) tells the expression to let '.' match everything including newlines - by default '.' matches everything except newlines. Since ? is used for different things in regex it can be a bit confusing (here with (? it means the next character(s) are flags for special handling and there are lots of them, ? after a normal character means 0 or 1 occurrences, and ? after * or + or ? means non-greedy/minimal matching). No wonder regex is confusing

I have a feeling there will be more and more questions about regex. Maybe we need a sticky at the top (even to link to the Calibre post?) or a quick help guide link in Sigil itself with examples. I guess the trouble is that everyone has their own way of using regex, and any examples can quickly become overwhelming.
meme is offline   Reply With Quote
Old 01-05-2012, 07:05 AM   #9
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Since I haven't used (?s) before I was testing it out in 0.4.904. The head example above works fine, but when I modified it it didn't always work. It looks like it should, but it doesn't - so I'm interested in what I'm doing wrong, if this is what its supposed to do, or if there is a code issue.

Below is a short extract of a gutenberg book picked at random. Start Sigil with an empty book and go to Code View. Delete all the lines in the window. Paste the text below into the window.

Put your cursor at the top and hit ctrl-f to bring up Find&Replace. Set Regex in Current File. Enter the Find expression (meaningless text split across lines):
  • (?s)Stoker.*?Release
Press the find button - it correctly highlights from Stoker on line 9 to Release on line 31

Now put your cursor on line 28 - just before the line with Stoker in it, and press the Find button again. This time it says the search term is not found.

In fact, it appears this is even simpler to demonstrate.

Put your cursor on say line 31 at the start of 'class' then do a Find for ".*Release" (without quotes) - search term is not found. Now put your cursor at the very start of line 31 (or on line 30) and repeat the find - this time the line up to and including Release is highlighted. In 0.4.2 Find correctly highlights the subset of the line up to and including the word even if the cursor is not at the start of the line. Searching for Release.* correctly highlights the word and rest of line.

And its easily checked if you just open Sigil with an empty file and change to code view and search for .*encoding with the cursor at the start of the line or a couple characters in, but since I already entered the test book I figured I'd leave it in the post.


Code:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org" name="generator" />

  <title>The Project Gutenberg eBook of Dracula's Guest, by Bram Stoker</title>
  <meta content="Project Gutenberg EPUB-Maker v0.02 by Marcello Perathoner &lt;webmaster@gutenberg.org&gt;" name="generator" />
  <link href="../Styles/0.css" rel="stylesheet" type="text/css" />
  <link href="../Styles/1.css" rel="stylesheet" type="text/css" />
  <link href="../Styles/pgepub.css" rel="stylesheet" type="text/css" />
</head>

<body>
  <h1 id="pgepubid00000">The Project Gutenberg eBook, Dracula's Guest, by Bram Stoker</h1>

  <div class="pgmonospaced pgheader">
    <br />
    This eBook is for the use of anyone anywhere at no cost and with<br />
    almost no restrictions whatsoever. You may copy it, give it away or<br />
    re-use it under the terms of the Project Gutenberg License included<br />
    with this eBook or online at <a>www.gutenberg.org</a><br />
  </div>

  <p class="noindent">Title: Dracula's Guest</p>

  <p class="noindent">Author: Bram Stoker</p>

  <p class="noindent">Release Date: November 20, 2003 [eBook #10150]<br />
  [Most recently updated: November 7, 2006]</p>

  <p class="noindent">Language: English</p>

  <p class="noindent">Character set encoding: ISO-8859-1</p>

  <p class="noindent">***START OF THE PROJECT GUTENBERG EBOOK DRACULA'S GUEST***</p>

</body>
</html>
meme is offline   Reply With Quote
Old 01-06-2012, 02:25 AM   #10
congngo
Member
congngo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Apr 2011
Device: kindle dx
Thank you for the explanation on ?s. Regex itself is like a programing language. It takes a while to learn it.
congngo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search and Replace Question MacEvansCB Conversion 1 12-10-2011 02:19 PM
calibre search & replace question Kelby ePub 1 09-29-2011 01:14 PM
Search & Replace question - something not right curiosity Library Management 21 06-15-2011 11:33 AM
Search/Replace Question seagull Sigil 22 03-21-2011 01:30 PM
search and replace - drops blanks in replace ? cybmole Conversion 10 03-13-2011 03:07 AM


All times are GMT -4. The time now is 07:03 PM.


MobileRead.com is a privately owned, operated and funded community.