Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-25-2023, 06:47 AM   #1
HenryHutton
Junior Member
HenryHutton began at the beginning.
 
HenryHutton's Avatar
 
Posts: 3
Karma: 10
Join Date: Feb 2023
Device: Kobo Nia (Not so happy, so far) [Formerly Kindle (7th gen)]
[Regex] How to remove a whole final section from a blog post?

I "news fetched" some posts from a blog of an author I would like to have on my e-reader as an ebook.

Now I am trying to edit out with the Editor the last section of each post , which contains links and informations I don't need on the final ebook.

All posts end with a signature, a motto, which is:
Code:
<p class="calibre10"><span>[Il mondo è bello, siamo noi ad esser ciechi]</span></p>
So my aim is to get an expression that includes this last previous bit (as a (group) to feed the "replace" field), down to
Code:
</body>

</html>
(ideally a second (group) ), so to trim out all the links and unneeded infos.

Well, so far I didn't achieved much..

My BEST () guess was...
Code:
(\[Il mondo è bello, siamo noi ad esser ciechi\])*</body>\w+</html>
but of course it doesn't work.

Any other functions/tricks that would achieve the same output are welcome!

I running short of time, that's why I am asking some hints instead of reading and learning more (or edit them all out manually).



I attach one of the html fetched.

The blog is reachable here, for the record:
http://www.salvatorebrizzi.com/
Attached Files
File Type: zip index_u10.html.zip (7.4 KB, 132 views)
HenryHutton is offline   Reply With Quote
Old 02-25-2023, 07:29 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,021
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
try replacing the \w+ with \s+ after </body>
theducks is offline   Reply With Quote
Advert
Old 02-25-2023, 08:30 AM   #3
lomkiri
Groupie
lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.lomkiri ought to be getting tired of karma fortunes by now.
 
lomkiri's Avatar
 
Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
Quote:
Originally Posted by theducks View Post
try replacing the \w+ with \s+ after </body>
To strip out the whole footer, I would rather do this search/replace, don't you think so ?
Code:
search:
\s<p class="calibre10"><span>\[Il mondo è bello, siamo noi ad esser ciechi\].*</body>

replace:
</div>\n\n  </div>\n\n</body>
@Henry:
"dot all" must be checked.
(the cursor must be on top of the file, or, at least, before the part that will be removed)

No group is necessary (unless you want put </body> in a group), since you're not reusing anything from the selected expression.
The 2 </div> in the replace field are necessary, if not, the code would be unbalanced and the book checking (F7) will fail
* is not enough to "select everything", it's only a multiplicator. You need .* or .*? to select everything (respectively greedy or not greedy)

Last edited by lomkiri; 02-25-2023 at 08:55 AM.
lomkiri is offline   Reply With Quote
Reply

Tags
edit books, news fetch, regex function


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it possible to remove the section/article table of contents & header navigation? Maleficent-Fly Recipes 1 05-28-2022 11:01 PM
epub → pdf conversion: remove a section dma_k Conversion 8 08-31-2016 05:40 PM
Regex to remove the first 4 characters nynaevelan Library Management 3 07-19-2014 06:41 PM
Regex to remove header from PDF neonbible Calibre 4 09-07-2010 10:08 AM


All times are GMT -4. The time now is 05:41 AM.


MobileRead.com is a privately owned, operated and funded community.