MobileRead Forums - View Single Post

Purple Lady · 12-30-2011, 07:15 PM

@Starson17, for GoComics I would like to remove the entire line with the comic name, date, and author as well as the line that has "This article was downloaded by calibre from..". I originally tried

Code:

remove_tags  =  [dict(name='h1')]

to remove the first line but that wouldn't allow the comic to be retrieved at all, lol. So I removed it by removing the entire h1 in preprocess_html after the data was extracted from it by adding the code in bold

Code:

   def preprocess_html(self, soup):
        if soup.title:
            title_string = soup.title.string.strip()
            _cd = title_string.split(',',1)[1]
            comic_date = ' '.join(_cd.split(' ', 4)[0:-1])
        if soup.h1.span:
            artist = soup.h1.span.string
            soup.h1.span.string.replaceWith(comic_date + artist)
        feature_item = soup.find('p',attrs={'class':'feature_item'})
        for h1 in soup.findAll('h1'):
                     h1.extract()

I cannot figure out how to get rid of the line that has "This article was downloaded by calibre from..". Can you help?

I need to be able to make the comic as large as possible so I can read it, but there is one more problem - when I put my Sony 950 in landscape mode it makes it into two columns. Is this a problem with the Sony, or does the recipe make it this way? I noticed that with my news feed it also does two columns, but it keeps one column for a book.

12-30-2011, 07:15 PM	#14
Purple Lady Grand Sorcerer Posts: 5,698 Karma: 16542228 Join Date: Feb 2010 Location: Pennsylvania Device: Huawei MediaPad M5, LG V30, Boyue T80S, Nexus 7 LTE, K3 3G, Fire HD8	@Starson17, for GoComics I would like to remove the entire line with the comic name, date, and author as well as the line that has "This article was downloaded by calibre from..". I originally tried Code: remove_tags = [dict(name='h1')] to remove the first line but that wouldn't allow the comic to be retrieved at all, lol. So I removed it by removing the entire h1 in preprocess_html after the data was extracted from it by adding the code in bold Code: def preprocess_html(self, soup): if soup.title: title_string = soup.title.string.strip() _cd = title_string.split(',',1)[1] comic_date = ' '.join(_cd.split(' ', 4)[0:-1]) if soup.h1.span: artist = soup.h1.span.string soup.h1.span.string.replaceWith(comic_date + artist) feature_item = soup.find('p',attrs={'class':'feature_item'}) for h1 in soup.findAll('h1'): h1.extract() I cannot figure out how to get rid of the line that has "This article was downloaded by calibre from..". Can you help? I need to be able to make the comic as large as possible so I can read it, but there is one more problem - when I put my Sony 950 in landscape mode it makes it into two columns. Is this a problem with the Sony, or does the recipe make it this way? I noticed that with my news feed it also does two columns, but it keeps one column for a book. Last edited by Purple Lady; 12-30-2011 at 07:18 PM.