Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 01-22-2014, 03:49 PM   #1
tarisea
Zealot
tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.tarisea got an A in P-Chem.
 
Posts: 114
Karma: 6288
Join Date: Dec 2012
Device: iphone
Searching NOT

For the life of me I can't figure out how to do these 2 searches.

1. How would I search for </p> where the is NOT a . before it?
The the first line would come up but the second one wouldn't.

<p class="calibre11">But that didn’t mean he wanted to turn out</p>
<p class="calibre11"> the lights and go to sleep either.</p>


2. Same problem different version. How would I search for </p> where it is followed by a character [a-z] instead of a <


<p class="calibre11">But that didn’t mean he wanted to turn out</p>
the lights and go to sleep either.</p>


Any help would be greatly appreciated.

TTRS
tarisea is offline   Reply With Quote
Old 01-22-2014, 04:13 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,807
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by tarisea View Post
For the life of me I can't figure out how to do these 2 searches.

1. How would I search for </p> where the is NOT a . before it?
The the first line would come up but the second one wouldn't.

<p class="calibre11">But that didn’t mean he wanted to turn out</p>
<p class="calibre11"> the lights and go to sleep either.</p>


2. Same problem different version. How would I search for </p> where it is followed by a character [a-z] instead of a <


<p class="calibre11">But that didn’t mean he wanted to turn out</p>
the lights and go to sleep either.</p>


Any help would be greatly appreciated.

TTRS
([a-z,"])</p>\s+<p class=calibre\d+">([a-z])
searches for lowercase or comma or straight quote

\1 \2

This will miss some words that start with a capital: eg Martha or a quote.

Last edited by theducks; 01-22-2014 at 04:17 PM. Reason: full search added
theducks is offline   Reply With Quote
Advert
Old 01-22-2014, 04:24 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
1) Regex mode.
Code:
[^.]</p>
Although you're probably going to want something like:
Code:
[^.?!"']</p>
if you're looking for broken paragraphs.

Number two seems like a really odd search. Your code's in a lot of trouble if you have naked text immediately following a closing paragraph tag. But:
Code:
</p>[a-z]
should find it, if for any reason there is.

Or if it should literally be anything OTHER than the beginning of a new tag... then:
Code:
</p>[^<]
DiapDealer is online now   Reply With Quote
Old 01-22-2014, 04:38 PM   #4
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
theducks's suggestion is, of course, perfectly fine, however, if you want to merge sentences, you might find this extended expression helpful that I use to merge sentences:

Find: ([[:lower:]],*;*:*)</p>\s+<p[^>]*>\s*([[:lower:]])
Replace: \1 \2

This expression will search for:
- a single lower case character followed by zero or more commas, semi-colons or colons
- followed by </p>,
- followed by one or more white-spaces (including line-breaks),
- followed by <p> (with optional attributes),
- followed by zero or more spaces and a single lower-case character.
Doitsu is offline   Reply With Quote
Old 01-22-2014, 04:54 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
These are the three regex I use.

Regex #1: This catches all hyphenation at the end of the paragraphs. I replace this in a case by case basis just to make sure the combined word does not actually require the hyphen.

Code:
-</p>\s+<p>
Replace:

Code:
(NOTHING, just a complete blank)
Sample:

Code:
<p>This is a paragraph which has a hyphen-</p>
<p>ated paragraph.</p>
Regex #2: This catches every paragraph that ends in a character that is NOT a '>', '”' (right double quote), '?', '!', '.':

Code:
([^>”\?\!\.])</p>\s+<p>
Replace (make sure there is a SPACE afterwards):

Code:
\1
Sample:

Code:
<p>This is a sample,</p>
<p>of a paragraph that will</p>
<p>be caught by the regex above.</p>
Regex #3: This usually catches all paragraphs which were not combined by the above two, but begin with a lowercase letter (usually these should be combined, or there was a blockquote beforehand, or something odd in the text).

Code:
<p>[a-z]
Sample:

Code:
<p>In 2014, Tex gave an informative sample:</p>
<blockquote><p>Here is a quote.</p></blockquote>
<p>here is more information.</p>
These three tackle nearly all broken paragraphs in my experience.

Other oddities such as semi-colons or colons will be pointed out by Regex #2, and those can be fixed on a case-by-case basis. (Sometimes they should be combined, sometimes they should not).

I keep these three in my Sigil Saved Searches (Tools - Saved Searches). More info on how to use Saved Searches can be found here:

http://web.sigil.googlecode.com/git/..._searches.html
Tex2002ans is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
searching for books on the go curious Library Management 10 07-23-2013 02:18 PM
Amazon searching LuvReadin General Discussions 5 05-17-2012 03:24 PM
Been searching, need help!! SpaceKake Which one should I buy? 1 10-04-2011 07:21 AM
Searching your library Dr. T enTourage Archive 3 12-09-2010 01:01 PM
Searching on the Kindle bob315 Amazon Kindle 13 12-01-2007 04:42 PM


All times are GMT -4. The time now is 06:22 PM.


MobileRead.com is a privately owned, operated and funded community.