![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 38
Karma: 1158
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1 & PW3)
|
Automate Ad Page removal by filename
I've found a bunch of books that I'm adding contain a distributor ad at the end of the book.
I ran into another distributor who has the same signup.xhtml but the page is completely different. In any case I always want to remove the page. So far I do it by manually [right-click] > [delete] but since it's the same page name there should be an easier way for me to add it to my automation list. I currently run an automation list on just about every book I add, with things like
I know the default tools I can add have options for DeleteUnusedMedia & DeleteUnusedStyles & RemoveNCXGuideFromEpub3 & there's the MendHTML & MendPrettifyHTML would it be possible to perhaps modify one of those to do it? I assume not since they won't even change books that have the files stupidly named .html instead of .xhtml but maybe it's possible. There's also the option to call plugins, maybe there's a plugin that has the ability to (even if it's not the intent of the plugin) delete a page by name? My hope is that once I get this to work I'll put it in my automation list, then after it it'll remove unused media & get rid of the images on the ad pages as well, but even if it can't do that I'll consider it a success if I can delete the page. |
![]() |
![]() |
![]() |
#2 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 585
Karma: 1952146
Join Date: Jan 2017
Location: Poland
Device: Kindle (Key3, PW2, PW3), Nook (ST, GLP), Kobo Touch, Tolino Vision 2
|
Use a tiny plugin that will remove such a single file.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 38
Karma: 1158
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1 & PW3)
|
Thanks a bunch. I looked but didn't find it. I'm surprised it doesn't have a preferences file to change what gets removed, but I guess enough people dislike the signup page specifically that it warranted a tool just for it, lol.
|
![]() |
![]() |
![]() |
#4 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,892
Karma: 4526138
Join Date: Nov 2009
Device: many
|
I think BeckyEbook just wrote that for you (and others) to see how a simple plugin can be easily created and used.
|
![]() |
![]() |
![]() |
#5 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 38
Karma: 1158
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1 & PW3)
|
Quote:
For me starting from scratch it's a no-go, I've never enjoyed programing, I can get by, but I don't know where to start. Editing & adjusting I can do. Like giving ImgShrinker more Options, I had something to start with & I can use that to make it do what I want that wasn't originally part of the options. I was hoping for something like that, a plugin that removed something as part of it's thing that I could extract the parts I needed & modify them to my desires, then get rid of anything extra I don't need. Like I modified the plugin she provided (Even more thanks if you actually created it) to delete any page that contains the line "Sign up for our mailing list" so if a page has a different name it can work, & to work if the name is signup.xhtml or Signup.xhtml or SignUp.xhtml or SIGNUP.xhtml or SIGNUP.XHTML. I plan to tweak it further to work for .html files as well, & possibly add a preferences JSON file to add a list of page names to automatically remove, plus a list of lines that the page shoud be removed if it contains. I'm also trying to work out a way to do something I was already told on here wasn't possible, remove blank pages. So far I've yet to succeed, but with this I feel like I'm a lot closer than anything before has ever been. All that I can figure out so long as I have a starting point. But without a starting point I'm pretty much useless for things like that |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 585
Karma: 1952146
Join Date: Jan 2017
Location: Poland
Device: Kindle (Key3, PW2, PW3), Nook (ST, GLP), Kobo Touch, Tolino Vision 2
|
This few-line script was supposed to be something like that – a starting point for your own extensions. I could only hope you’d be involved enough to develop it into something perfectly suited to your needs.
If you can automate repetitive tasks, that's what Sigil scripts are for ![]() |
![]() |
![]() |
![]() |
#7 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,519
Karma: 22718641
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Spoiler:
Please note that the plugin will crash Sigil if you use it with an epub that only contains blank pages. |
|
![]() |
![]() |
![]() |
#8 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,892
Karma: 4526138
Join Date: Nov 2009
Device: many
|
It should not crash Sigil. Sigil's plugin runner should make sure that at least one file remains at all times. The plugin runner code explicitly is supposed to detect and prevent that.
I will look at it. The pluginrunner process can fail but nothing the plugin does should cause Sigil itself to actually crash in an unrecoverable way. Update: The PluginRunner has a bug: Code:
// don't allow changes to proceed if they will remove the very last xhtml/html file if (m_xhtml_net_change < 0) { QList<Resource *> htmlresources = m_book->GetFolderKeeper()->GetResourceListByType(Resource::HTMLResourceType); if (htmlresources.count() + m_xhtml_net_change < 0) { Utility::DisplayStdErrorDialog(tr("Error: Plugin Tried to Remove the Last XHTML file .. aborting changes")); ui.statusLbl->setText(tr("Status: No Changes Made")); m_result = "failed"; return; } } Code:
if (htmlresources.count() + m_xhtml_net_change <= 0) { Last edited by KevinH; 02-15-2023 at 06:26 PM. |
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,892
Karma: 4526138
Join Date: Nov 2009
Device: many
|
A fix for that bug was just pushed to master and will appear in the next release. This change will prevent a plugin from crashing Sigil by deleting the last xhtml file.
|
![]() |
![]() |
![]() |
#10 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 38
Karma: 1158
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1 & PW3)
|
Quote:
I mean it does work, but it works too well... It removes the blank pages, but it also removes non-blank pages, particularly pages that have only an image on them, which, in my case, is what I'm trying to get that sometimes ends up creating a blank page. I have an automate that works perfectly for some books, particularly ones that I regularly get each volume of for a couple series as they all come from the same publisher. It find the images that are inline, which sometimes display not ideally, especially after I run ImgShrinker on them, & adds a split marker, then splits at the split marker. Having these images on their own page is generally how I want my books, & I'll run the automate on other publishers stuff & check them, most of the time they work well, with only a few having images that are supposed to be part of a page getting split as well. But half the time the books have some, but not all images already on their own page, in which case they end up with a 2nd blank page. Certain publishers always separate the full-page images so I have to have a 2nd automate for those that skips that part, but I'd prefer to have 1 that I can use more universally. That's my main goal for the "remove-blank-pages" thing. Unfortunately with this one it ends up making all the images on their own page, then removing all the pages that are blank as well as all those with just images. Thanks for the attempt, but anyone looking into this should be aware that it will remove pages with just an image, like most cover pages, as well as the actual blank ones When I have time I will try again to make it work & hopefully this will make that easier |
|
![]() |
![]() |
![]() |
#11 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 70,457
Karma: 117888887
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
There's no way to remove all ad pages by filename because the filenames are not all the same. It's a lot easier (and safer) to do the removal by hand.
|
![]() |
![]() |
![]() |
#12 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,892
Karma: 4526138
Join Date: Nov 2009
Device: many
|
Yes, that is what the code Doitsu wrote does. If you do not want to delete pages with img tags and or svg tags then you need to modify that code to check for those tags before deleting.
Something like the following modification to Doitsu's code could be used to detect img and svg tags and prevent deletion if either is found: Code:
#!/usr/bin/env python # -*- coding: utf-8 -*- from sigil_bs4 import BeautifulSoup def run(bk): for html_id, href in bk.text_iter(): html = bk.readfile(html_id) soup = BeautifulSoup(html, 'html.parser') body_text = soup.body.text.strip() node = soup.body if body_text == '' or len(body_text) <= 6: if not node.img and not node.svg: print('INFO: Removing {}... '.format(href)) bk.deletefile(html_id) print('\nPlease click OK to close the Plugin Runner window.') return 0 def main(): print("I reached main when I should not have\n") return -1 if __name__ == "__main__": sys.exit(main()) Last edited by KevinH; 03-07-2023 at 07:04 AM. |
![]() |
![]() |
![]() |
#13 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 38
Karma: 1158
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1 & PW3)
|
Quote:
It isn't "a lot easier" to manually do it. I add at least 2 books a month that have the same page that needs to be deleted. Sometimes I start a series that already has 20 volumes & get all the books in bulk at the same time. Manually going in & deleting the pages is tedious & unnecessary as I now have, from this thread, an automation step that removes them automatically for me. As far as "safer" well yeah, but I'm fairly confident that I'm not going to find a book with a signup page that I DON'T want to delete, & if I accidentally delete one, I always have the original I can use or just re-download a fresh copy from my account. I understand the risk, & it's a known risk, taken willingly & I'm okay with that |
|
![]() |
![]() |
![]() |
#14 | |||
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 10,996
Karma: 152280763
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
Quote:
Quote:
You could take several of these clips and modify them and be done with it, but at this point, you've probably put more time in, trying to figure it out than you would click-select-deleting 200 of them. :-) I know the feeling, Regex hasn't always been my buddy, either. Hitch |
|||
![]() |
![]() |
![]() |
#15 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 70,457
Karma: 117888887
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The thing is, eBooks with this file titled signup has other files I'll want to delete. So it's just easier to delete all the files you don't want by hand (IMHO).
|
![]() |
![]() |
![]() |
Tags |
automation, delete file in epub, removal |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Automate a page split... Find & Replace doesn't work for this | LostOnTheLine | Sigil | 24 | 11-28-2022 12:55 PM |
Any way to automate Calibre actions? | bob.f | Library Management | 4 | 08-16-2021 08:07 AM |
Automate epub editing? | Montana Harper | Calibre | 2 | 12-27-2016 10:57 PM |
Title page showing filename and not true book title... | hikerguy | Editor | 3 | 03-30-2015 05:37 PM |
automate on start up | ladykayaker | Kindle Developer's Corner | 2 | 11-14-2013 08:30 AM |