Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-01-2013, 08:26 AM   #1
Contre-jour
Junior Member
Contre-jour began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2012
Device: Kindle
Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil



Hi everyone!

I went through the forum posts on using Regex in Sigil to find and replace characters in an ebook and I tried the methods suggested resulting in abject failure. Please help.

I have an e-book with 400+ page numbers appearing like this:

Code:
<p>I want to keep this text</p>

<p>100</p>

<p>I want to keep this text</p>
Note that there are empty lines above and below the page number line (100, in this example).
I need to remove the page number line and the empty lines above and below it.

I did this in Sigil (Mode: Regex):

Find: <p>([0-9]+)</p>
Replace: /1 ------> (space - slash - one)

It found the page number but only removed the <p></p> tags plus the first 2 digits, leaving the last digit in between intact. (In this example, "0"). It also did not remove the empty lines before and after it.

Please could someone help to correct my code so that I will end up with this:

Code:
<p>I want to keep this text</p>
<p>I want to keep this text</p>
<p>I want to keep this text</p>
Thanks in advance!!

Contre-jour
Contre-jour is offline   Reply With Quote
Old 02-01-2013, 08:38 AM   #2
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
generally there's no reason to remove the blank lines in html, apart from just aesthetics.

if you're just trying to remove the <p>number</p> then this should work:

Code:
find:

<p>\d+</p>
with the replace field empty. also, if you're using back references, which you shouldn't need to here, you should be using \ instead of /. as in, \1, not /1
mzmm is offline   Reply With Quote
Old 02-01-2013, 08:54 AM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,243
Karma: 42056120
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
As mzmm mentioned, the blank line between paragraphs in code view is mostly irrelevant. If there's a blank line between paragraphs of the rendered html that you want to eliminate, then that's a styling/css issue. Removing the blank line in code view won't affect the rendered text at all (and Tidy/Pretty Print will just put the blank line back unless you have it turned completely off).
DiapDealer is online now   Reply With Quote
Old 02-01-2013, 08:56 AM   #4
Turtle91
Guru
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 669
Karma: 3807234
Join Date: Dec 2012
Location: Shannon, Ireland today
Device: iPhone 5/iPad 1&2/Surface Pro/Kindle PW
Are the empty lines hard coded? It doesn't look like it in your example but if there is...

Is there a <p><br /></p> or a <p>&nbsp;</p> or something like that??

If there is, you could use a "\s*" between the groups to find any space between. Something along these lines (assuming the blank lines are hard coded as "<p>&nbsp;</p>" :

find: <p>&nbsp;</p>\s*<p>\d+</p>\s*<p>&nbsp;</p>
replace: {nothing - empty}

That will find a blank line before, the line with the number, and a blank line after.

Cheers!
Turtle91 is offline   Reply With Quote
Old 02-01-2013, 09:34 AM   #5
Danger
Evangelist
Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.
 
Danger's Avatar
 
Posts: 486
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
As an additional to what the others have said, once you remove the <p>page#</p> Siglil will remove the extra blank line when you click save. You won't be ending up with 2 blank lines if you just remove the page number line so there is no need to try and remove them with find/replace. And of course blank lines in View Code do not show up when reading.

If however you want to remove the spaces between paragraphs that you see when reading then you need to set the paragraph margins in your CSS sheet:

p {
margin-top: 0;
margin-bottom: 0;
}

The above will affect ALL <p> tags, so if you need spacing in a few paragraphs (scene changes) you need to add a scene change tag, I use:

.scenechange {
margin-top: 0.25em;
margin-bottom: 0.25em'
}

and then:

<p class="scenechange">&nbsp;</p>

Last edited by Danger; 02-01-2013 at 09:41 AM.
Danger is offline   Reply With Quote
Old 02-01-2013, 10:29 AM   #6
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
it just occurred to me that if you're cleaning up an epub that's been generated by Pages that you might end up seeing the blank spaces in the html being rendered in the reader.

i come across
Code:
* {white-space: pre;}
(or something similar) in the css in these epubs periodically.
mzmm is offline   Reply With Quote
Old 02-01-2013, 10:35 AM   #7
Contre-jour
Junior Member
Contre-jour began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2012
Device: Kindle
Solved!

While waiting for a reply, I dug deep, played around with the code and got it

Find: <p>[0-9]+</p>
Replace: Nothing

In the code view, it looks like there are line breaks in between my paragraph but in the book view those lines are not visible so I didn't have to put in anything to remove line breaks such as \n.

Not sure why the other posts were going on and on about \1 and all that. It confused me.

I apologise if I wasted your time. These may look easy peasy to many but it is a struggle for me without any programming knowledge. Thanks!
Contre-jour is offline   Reply With Quote
Old 02-01-2013, 10:41 AM   #8
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 162
Karma: 86115
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
glad you worked it out.

\1 would be for reinserting (re-placing) a group that you've captured in the find field.

great reference here: http://www.regular-expressions.info/brackets.html

happy epub-ing
mzmm is offline   Reply With Quote
Old 02-01-2013, 10:45 AM   #9
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,243
Karma: 42056120
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Contre-jour View Post
Not sure why the other posts were going on and on about \1 and all that. It confused me.
The other posts were "going on and on" about it because your original post included:
Quote:
Originally Posted by Contre-jour View Post
Replace: /1 ------> (space - slash - one)
So naturally, everyone one wanted you to know that /1 was syntactically incorrect. It should have been \1.
DiapDealer is online now   Reply With Quote
Old 02-01-2013, 10:47 AM   #10
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,834
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by DiapDealer View Post
The other posts were "going on and on" about it because your original post included:

So naturally, everyone one wanted you to know that /1 was syntactically incorrect. It should have been \1.
Flightcrew would have found it as 'character data not allowed....'
theducks is online now   Reply With Quote
Reply

Tags
line breaks, regex, regular expressions, sigil

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sigil Wildcards/Regex Find/Replace Adman35 Sigil 7 08-16-2014 01:02 PM
Regex find and replace SanatyrZeo Sigil 5 10-29-2012 07:03 AM
Find/Replace bogus line breaks in Text editor, w/Regular Expression scubaddictions Conversion 15 07-21-2011 08:52 AM
RegEx find and replace iblesq Sigil 1 01-10-2011 09:26 PM
REGEX find and replace help please potestus Sigil 13 09-18-2010 04:14 PM


All times are GMT -4. The time now is 02:12 PM.


MobileRead.com is a privately owned, operated and funded community.