10-20-2010, 12:22 PM | #1 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
line feed search
I'm working with an html file from Project Gutenberg, and am trying to get rid of many annoying line feeds using Sigil. I need to delete 2 line feeds between "</p>" and "<span". So I use wildcard search, minimal matching, and use
"</p>\r\r<span" or "</p>\n\n<span" or even "</p>\r\n<span" or reverse. But Sigil finds nothing. I tried the same search in Notepad++ with the same results. Can anyone clue me in to my mistakes? |
10-20-2010, 12:36 PM | #2 |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Try <\p>\s
|
Advert | |
|
10-20-2010, 01:22 PM | #3 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
Theducks,
Thanks for the suggestion, but that doesn't work either. (In my post I mistyped </p> as <\p> but I was not doing that in my search) Bob Last edited by bobcdy; 10-20-2010 at 01:27 PM. |
10-20-2010, 01:40 PM | #4 | |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
AND this is a REGEX earch (Minimal ticked) But try the \s instead of the \r or \n stuff. it works in CV here |
|
10-20-2010, 01:49 PM | #5 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
Nothing seems to work for Sigil. I tried the \n for notepad++ and it works there for a single \n but if I try \n\n it will NOT find two consecutive line feeds. Neither \s or \r seemed to work with Notepad++
Bob |
Advert | |
|
10-20-2010, 01:53 PM | #6 | |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
so it repeats. Can you legally post a sample file (chapter) here? |
|
10-20-2010, 02:08 PM | #7 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
For Sigil, using regex works with \s+, but then the "</p>" and the "<span" aren't allowed, and I know almost nothing about using regex. I've tried learning it but Sigil's regex always seems different than the "lesson" I try from the internet. I even tried regexbuddy but Sigil didn't like those expressions either. I've never encountered such obscurity and difficulty with any other learning experience as with regex!
Bob |
10-20-2010, 02:18 PM | #8 | |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
They work just fine here. Use them all the time. as for Regex, This guy wrote the stuff that finally got through my thick skull and the to come on. https://www.mobileread.com/forums/showthread.php?t=99258 |
|
10-20-2010, 02:22 PM | #9 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
I always use the code view, never the book view. I'll check out the url for regex, but I've tried so many and NONE consistently work with Sigil.
Bob |
10-20-2010, 04:06 PM | #10 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
Theducks
For some reason, now the </p>s+ works with Sigil regex. Thanks for the patient help! But now my problem is that Sigil always creates new and numerous errors by inserting meaningless "corrections" in my partially completed html when imported into Sigil. It looks like I can't use Sigil at all for the html document, but must complete it correctly before opening the html in Sigil. Thus I need to use Notepad++ or equivalent, but notepad++ won't find </p>s+ or n+ or r+ It would be nice if there were a way to turn off the "corrections" in Sigil. Sigh.... Bob |
10-20-2010, 04:29 PM | #11 | |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
That means that everything must be "enclosed" in the proper tags (a good example is Sigils blank stater page All the basics are there.). Work from the Middle, out. By that, I mean always place the opening and closing tags (sort of like when you use some of the advanced features in this forum) Sigil will never have to guess where the closing tag belongs. (Spans and Div's can really turn into a mess if you leave one unbalanced. |
|
10-20-2010, 05:02 PM | #12 |
Created Sigil, FlightCrew
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
|
10-20-2010, 09:58 PM | #13 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
Valloric and Theducks
It will be a big improvement when turning off corrections in Sigil is implemented! And it isn't just unclosed code. For example, in the file I had a separate line <span class="page">jkl;asdfa</span> This was what I was trying to get rid of all the extra line feeds with regex. Sigil insisted on adding <p> and </p> for this and all similar lines, and every once in a while it likes to add <p> </p> and other additions. I finally went back to Word 2003 for the html editing. If I chop off the first few lines of the html code, Word will accept the file and I can use Wildcards to help make corrections and changes , then I save as utf-8 text file. After restoring the chopped off html, I can then open it in Sigil and polish it a bit. My simple mind does ok with Word wildcards, but have never been successful trying to work with regex and with other program wildcards. Bob |
10-20-2010, 10:29 PM | #14 | |
Well trained by Cats
Posts: 29,891
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
I remove <p> </p> all the time. You ar missing something (that is invisible to the eye), like a space somewhere else or you added something. BTW you might need to esc the \; or othe special REGEX chars if they appear in the string. Use find until you get it right., the replace a bunch before using "all" |
|
10-21-2010, 12:40 AM | #15 |
Fanatic
Posts: 527
Karma: 1048576
Join Date: May 2009
Device: bebook; prs-950; nook simple touch; HTC Jetstream tablet
|
I'll eventually have to bite the bullet and really really try to learn regex well enough to use it in Sigil. A problem, though, is that the regex flavors of Notepad++ and Sigil are different and I use both extensively. It's very confusing to me...
Bob |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Break up feed | BrianG | Calibre | 2 | 01-09-2010 06:13 PM |
RSS Feed | timezone | Feedback | 8 | 01-02-2010 06:55 PM |
the feed only contains a link ... | alexxxm | Calibre | 7 | 02-18-2009 08:43 AM |
Google Book Search to search full-text books online | Bob Russell | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 08-19-2006 12:13 PM |