![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2023
Device: Kobo Nia (Not so happy, so far) [Formerly Kindle (7th gen)]
|
[Regex] How to remove a whole final section from a blog post?
I "news fetched" some posts from a blog of an author I would like to have on my e-reader as an ebook.
Now I am trying to edit out with the Editor the last section of each post , which contains links and informations I don't need on the final ebook. All posts end with a signature, a motto, which is: Code:
<p class="calibre10"><span>[Il mondo è bello, siamo noi ad esser ciechi]</span></p> Code:
</body> </html> Well, so far I didn't achieved much.. ![]() My BEST ( ![]() Code:
(\[Il mondo è bello, siamo noi ad esser ciechi\])*</body>\w+</html> ![]() Any other functions/tricks that would achieve the same output are welcome! ![]() I running short of time, that's why I am asking some hints instead of reading and learning more (or edit them all out manually). ![]() ![]() I attach one of the html fetched. The blog is reachable here, for the record: http://www.salvatorebrizzi.com/ |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,021
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
try replacing the \w+ with \s+ after </body>
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 167
Karma: 1497966
Join Date: Jul 2021
Device: N/A
|
To strip out the whole footer, I would rather do this search/replace, don't you think so ?
Code:
search: \s<p class="calibre10"><span>\[Il mondo è bello, siamo noi ad esser ciechi\].*</body> replace: </div>\n\n </div>\n\n</body> "dot all" must be checked. (the cursor must be on top of the file, or, at least, before the part that will be removed) No group is necessary (unless you want put </body> in a group), since you're not reusing anything from the selected expression. The 2 </div> in the replace field are necessary, if not, the code would be unbalanced and the book checking (F7) will fail * is not enough to "select everything", it's only a multiplicator. You need .* or .*? to select everything (respectively greedy or not greedy) Last edited by lomkiri; 02-25-2023 at 08:55 AM. |
![]() |
![]() |
![]() |
Tags |
edit books, news fetch, regex function |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is it possible to remove the section/article table of contents & header navigation? | Maleficent-Fly | Recipes | 1 | 05-28-2022 11:01 PM |
epub → pdf conversion: remove a section | dma_k | Conversion | 8 | 08-31-2016 05:40 PM |
Regex to remove the first 4 characters | nynaevelan | Library Management | 3 | 07-19-2014 06:41 PM |
Regex to remove header from PDF | neonbible | Calibre | 4 | 09-07-2010 10:08 AM |