![]() |
#1 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 490
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
|
Find this NOT that
I'm trying to do a search but a narrow one. Basically I've converted some PDFs to ePub but some paragraphs are broken up say one ends with half a sentence and the other paragraph continues on with the sentence.
I want to do search for any 2 characters and </p> but don't find ."</p>, .</p>, ?</p>, ."</p>, ?"</p>, !</p>, !"</p> as those should be proper sentence enders. Right now I have [^.,^\?,^\!][a-z,A-Z,”,\,, ,+]</p> and it seems to work but is there a simpler way of doing this? |
![]() |
![]() |
![]() |
#2 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 121
Karma: 5070
Join Date: Dec 2010
Device: none
|
I#m doing this with
Code:
s: ([a-zA-Z])</p>\s*<p> r: \1## o: minimal matching Now i'm looking for a lower case letter, followed by ##, followed by an uppercase letter. This is for sure a sign for two seperate words. Code:
s: ([a-z])##([A-Z]) r: \1_\2 o: minimal matching, match case (the underscore represents a blank) Code:
s: ([a-zA-Z])##([a-zA-Z]) r: \1_\2 o: minimal matching, match At the end only the ## splitting a word are remaining and i'm substituting them with Code:
s: ([a-zA-Z])##([a-zA-Z]) r: \1\2 o: minimal matching Code:
s: ([,?!])</p>\s*<p> r: \1_ o: minimal matching |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Jun 2010
Device: none
|
When I'm joining split sentences like that, I search for a lowercase letter after the <p> tags...
search: </p>\s*<p>([a-z]) replace: _\1 (Note the space before \1.) Hope that helps. |
![]() |
![]() |
![]() |
#4 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 121
Karma: 5070
Join Date: Dec 2010
Device: none
|
Code:
<p>I had splitted sen</p> </P>tences like this one</p> |
![]() |
![]() |
![]() |
#5 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 490
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
|
Quote:
Basically I was trying to catch that all in one go. Looking at other examples I think I have too many commas to separate stuff that don't need it. [^.^\?^\!][a-zA-Z”\,\?\!+]</p> Should find: ?</p> but not ?”</p> or z”</p> but not z.”</p> Just tested this with the following BOLD found ITALICS skipped: <p>x?</p> <p>x.”</p> <p>x?</p> <p>X!</p> <p>x!”</p> <p>x,</p> <p>x,”</p> <p>x”</p> EDIT: The above can be simplified further: [^.?!][a-zA-Z”,?!]</p> So basically we now have: if any of those 3 characters [^.?!] appear before any of these characters [a-zA-Z”,?!] (specifically the”) & </p> then skip that find. Last edited by Danger; 12-27-2010 at 03:57 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 490
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
|
Working on a book this morning and I found after stripping out all the class, style and useless spans/divs I was left once again with broken up sentences like this:
<p>this is</p> <p>part of a</p> <p>paragraph.</p> <p> </p> So I came up with: FIND: ([a-z,’”.?!-])</p>\n\n\s\s<p>([a-z,A-Z“-]) REPLACE: \1 \2 \n = new line \s = white space All the <p> </p> are ignored and then I just strip them out when all the paragraphs are back together. Interesting info here |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
I know what I want but I cant find it! | jakedubbleya | Which one should I buy? | 1 | 04-07-2009 03:51 PM |
Anyone know where I can find..... | sarahw2275 | Sony Reader | 2 | 10-06-2008 08:56 AM |
Can someone help me find... | Nate the great | Reading Recommendations | 2 | 07-08-2007 09:30 PM |
How to find BD 5? | Patricia | Sony Reader | 23 | 05-18-2007 08:56 AM |
Cannot find something you are looking for...? | TadW | Lounge | 1 | 07-06-2003 10:48 AM |