MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil (https://www.mobileread.com/forums/showthread.php?t=204527)

Contre-jour 02-01-2013 09:26 AM

Regex Help: Find page number & Replace+Remove 2x Line Breaks in Sigil
 
:help:

Hi everyone!

I went through the forum posts on using Regex in Sigil to find and replace characters in an ebook and I tried the methods suggested resulting in abject failure. Please help.

I have an e-book with 400+ page numbers appearing like this:

Code:

<p>I want to keep this text</p>

<p>100</p>

<p>I want to keep this text</p>

Note that there are empty lines above and below the page number line (100, in this example).
I need to remove the page number line and the empty lines above and below it.

I did this in Sigil (Mode: Regex):

Find: <p>([0-9]+)</p>
Replace: /1 ------> (space - slash - one)

It found the page number but only removed the <p></p> tags plus the first 2 digits, leaving the last digit in between intact. (In this example, "0"). It also did not remove the empty lines before and after it.

Please could someone help to correct my code so that I will end up with this:

Code:

<p>I want to keep this text</p>
<p>I want to keep this text</p>
<p>I want to keep this text</p>

Thanks in advance!! :blink:

Contre-jour

mzmm 02-01-2013 09:38 AM

generally there's no reason to remove the blank lines in html, apart from just aesthetics.

if you're just trying to remove the <p>number</p> then this should work:

Code:

find:

<p>\d+</p>

with the replace field empty. also, if you're using back references, which you shouldn't need to here, you should be using \ instead of /. as in, \1, not /1

DiapDealer 02-01-2013 09:54 AM

As mzmm mentioned, the blank line between paragraphs in code view is mostly irrelevant. If there's a blank line between paragraphs of the rendered html that you want to eliminate, then that's a styling/css issue. Removing the blank line in code view won't affect the rendered text at all (and Tidy/Pretty Print will just put the blank line back unless you have it turned completely off).

Turtle91 02-01-2013 09:56 AM

Are the empty lines hard coded? It doesn't look like it in your example but if there is...

Is there a <p><br /></p> or a <p>&nbsp;</p> or something like that??

If there is, you could use a "\s*" between the groups to find any space between. Something along these lines (assuming the blank lines are hard coded as "<p>&nbsp;</p>" :

find: <p>&nbsp;</p>\s*<p>\d+</p>\s*<p>&nbsp;</p>
replace: {nothing - empty}

That will find a blank line before, the line with the number, and a blank line after.

Cheers!

Danger 02-01-2013 10:34 AM

As an additional to what the others have said, once you remove the <p>page#</p> Siglil will remove the extra blank line when you click save. You won't be ending up with 2 blank lines if you just remove the page number line so there is no need to try and remove them with find/replace. And of course blank lines in View Code do not show up when reading.

If however you want to remove the spaces between paragraphs that you see when reading then you need to set the paragraph margins in your CSS sheet:

p {
margin-top: 0;
margin-bottom: 0;
}

The above will affect ALL <p> tags, so if you need spacing in a few paragraphs (scene changes) you need to add a scene change tag, I use:

.scenechange {
margin-top: 0.25em;
margin-bottom: 0.25em'
}

and then:

<p class="scenechange">&nbsp;</p>

mzmm 02-01-2013 11:29 AM

it just occurred to me that if you're cleaning up an epub that's been generated by Pages that you might end up seeing the blank spaces in the html being rendered in the reader.

i come across
Code:

* {white-space: pre;}
(or something similar) in the css in these epubs periodically.

Contre-jour 02-01-2013 11:35 AM

Solved!
 
While waiting for a reply, I dug deep, played around with the code and got it :2thumbsup

Find: <p>[0-9]+</p>
Replace: Nothing

In the code view, it looks like there are line breaks in between my paragraph but in the book view those lines are not visible so I didn't have to put in anything to remove line breaks such as \n.

Not sure why the other posts were going on and on about \1 and all that. It confused me.

I apologise if I wasted your time. These may look easy peasy to many but it is a struggle for me without any programming knowledge.:cool: Thanks!

mzmm 02-01-2013 11:41 AM

glad you worked it out.

\1 would be for reinserting (re-placing) a group that you've captured in the find field.

great reference here: http://www.regular-expressions.info/brackets.html

happy epub-ing

DiapDealer 02-01-2013 11:45 AM

Quote:

Originally Posted by Contre-jour (Post 2406623)
Not sure why the other posts were going on and on about \1 and all that. It confused me.

The other posts were "going on and on" about it because your original post included:
Quote:

Originally Posted by Contre-jour (Post 2406623)
Replace: /1 ------> (space - slash - one)

So naturally, everyone one wanted you to know that /1 was syntactically incorrect. It should have been \1.

theducks 02-01-2013 11:47 AM

Quote:

Originally Posted by DiapDealer (Post 2406639)
The other posts were "going on and on" about it because your original post included:

So naturally, everyone one wanted you to know that /1 was syntactically incorrect. It should have been \1.

Flightcrew would have found it :p as 'character data not allowed....'


All times are GMT -4. The time now is 11:00 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.