Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-04-2021, 10:34 PM   #1
flyash
Groupie
flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.
 
Posts: 196
Karma: 1003498
Join Date: Jun 2010
Device: none
pdf regex question - regex that wraps to a new line

I'm trying to eliminate chapter titles that show up as headers in a pdf.

The pdf text looks something like this:

Blah blah blah blah Blah blah blah blah <br>
Blah blah blah blah Blah blah blah blah blah blah <br>
Blah blah blah blah Blah blah blah blah <br>
Blah blah blah blah Blah blah blah blah blah <br>
<hr/>
<a id="p55"></a>Some Chapter Title<br>
55<br>

blah blah blah blah <br>
rBlah blah blah blah blah.<br>

Using the following regex, I'm able to select this text:
<a id="p55"></a>Some Chapter Title<br>

regex: <a id="p[0-9]*"></a>[A-Z][^<]*<br>

But what I really want to match is the same text as above AND the page number on the next row:
<a id="p55"></a>Some Chapter Title<br>
55<br>

The reason I want to do this is not just to get rid of the page numbers, but also sometimes actual sentences of the book get captured by this regex, but these sentences are not followed by page numbers - the page numbers only follow the chapter title headers in this particular sequence.

Problem is the regex won't wrap to the next line, so if I try:
regex: <a id="p[0-9]*"></a>[A-Z][^<]*<br>[0-9]*

I get zero matches.

Any ideas?
flyash is offline   Reply With Quote
Old 09-05-2021, 09:00 AM   #2
flyash
Groupie
flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.flyash ought to be getting tired of karma fortunes by now.
 
Posts: 196
Karma: 1003498
Join Date: Jun 2010
Device: none
Figured it out.

regex: <a id="p[0-9]*"></a>[^<]*<br>[\r\n]*[0-9]*<br>

Will match:
<a id="p55"></a>Some Chapter Title<br>
55<br>
flyash is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex to count line wraps? kboogie222 Library Management 12 09-15-2019 09:12 PM
Removing Line breaks using regex in PDF when converting tankervin Conversion 3 01-12-2017 04:23 PM
how do I span more than one line with regex BartB Sigil 3 12-11-2011 05:12 PM
Importing RegEx Line TheEldest Calibre 1 07-05-2011 10:18 PM
Insert new line with regex deckoff Sigil 6 08-08-2010 11:24 AM


All times are GMT -4. The time now is 12:59 PM.


MobileRead.com is a privately owned, operated and funded community.