View Single Post
Old 03-07-2023, 01:15 AM   #10
LostOnTheLine
Connoisseur
LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.LostOnTheLine ought to be getting tired of karma fortunes by now.
 
Posts: 72
Karma: 800000
Join Date: Jun 2021
Device: Kindle Paperwhite (PW1|PW3|PW4), Kindle Voyage
Quote:
Originally Posted by Doitsu View Post
That requires only minimal code changes to Becky's plugin:
Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from sigil_bs4 import BeautifulSoup

def run(bk):
    for html_id, href in bk.text_iter():
        html = bk.readfile(html_id)
        soup = BeautifulSoup(html, 'html.parser')
        body_text = soup.body.text.strip()
        if body_text == '' or len(body_text) <= 6:
            print('INFO: Removing {}... '.format(href))
            bk.deletefile(html_id)
    print('\nPlease click OK to close the Plugin Runner window.')
    return 0

def main():
    print("I reached main when I should not have\n")
    return -1

if __name__ == "__main__":
    sys.exit(main())

Please note that the plugin will crash Sigil if you use it with an epub that only contains blank pages.
This actually doesn't work...
I mean it does work, but it works too well...

It removes the blank pages, but it also removes non-blank pages, particularly pages that have only an image on them, which, in my case, is what I'm trying to get that sometimes ends up creating a blank page.

I have an automate that works perfectly for some books, particularly ones that I regularly get each volume of for a couple series as they all come from the same publisher. It find the images that are inline, which sometimes display not ideally, especially after I run ImgShrinker on them, & adds a split marker, then splits at the split marker.
Having these images on their own page is generally how I want my books, & I'll run the automate on other publishers stuff & check them, most of the time they work well, with only a few having images that are supposed to be part of a page getting split as well. But half the time the books have some, but not all images already on their own page, in which case they end up with a 2nd blank page.
Certain publishers always separate the full-page images so I have to have a 2nd automate for those that skips that part, but I'd prefer to have 1 that I can use more universally. That's my main goal for the "remove-blank-pages" thing. Unfortunately with this one it ends up making all the images on their own page, then removing all the pages that are blank as well as all those with just images.

Thanks for the attempt, but anyone looking into this should be aware that it will remove pages with just an image, like most cover pages, as well as the actual blank ones

When I have time I will try again to make it work & hopefully this will make that easier
LostOnTheLine is offline   Reply With Quote