MobileRead Forums - View Single Post - multi-page coding for Creative Blog

I'm trying to do some updating to the Creative Blog recipe because I just found out there are a few articles that go to a second page and it's right now not pulling the second pages. So this is the modified recipe that doesn't work yet.

Spoiler:

this is the html coding from one of the articles that shows how it links to the second page. One thing to note is that there are two spots that have the div class="item-list" as it's lead.

Spoiler:

I think I have the type of coding that's needed, but if anyone has an easier way for the few articles to pull the second page, please let me know. Just to note, recursion doesn't work as it pulls a lot more links and creates a very big epub, already tried it.

03-15-2015, 06:17 PM	#1
Camper65 Enthusiast Posts: 32 Karma: 10 Join Date: Apr 2011 Device: Kindle wifi; Dell 2in1	multi-page coding for Creative Blog I'm trying to do some updating to the Creative Blog recipe because I just found out there are a few articles that go to a second page and it's right now not pulling the second pages. So this is the modified recipe that doesn't work yet. Spoiler: __license__ = 'GPL v3' __copyright__ = '2014, Bonni Salles - post in forum for help' ''' Creative Blog (formerly .net magazine) ''' from calibre.web.feeds.news import BasicNewsRecipe class creativeblog(BasicNewsRecipe): title = u'Creative Blog (formerly .Net magazine)' __author__ = 'Bonni Salles' oldest_article = 7 publication_type = 'blog' max_articles_per_feed = 100 description = 'Web Design and Tutorials from Creative Blog (part of .Net Magazine and others)' publisher = 'Creative Blog' category = 'internet, web design' language = 'en' encoding = 'utf-8' ignore_duplicate_articles = {'title', 'url'} remove_empty_feeds = True auto_cleanup = True # presently this is set to download the whole group of blogs for the feed. If you want # to limit it to the specific sections of the blog that you want to download. feeds = [ (u'Creative Blog', u'http://www.creativebloq.com/feed/'), # (u'3D', u'http://www.creativebloq.com/feed/3d'), # (u'Adobe', u'http://www.creativebloq.com/feed/adobe'), # (u'Animation', u'http://www.creativebloq.com/feed/animation'), # (u'Apple', u'http://www.creativebloq.com/feed/apple'), # (u'Branding', u'http://www.creativebloq.com/feed/branding'), # (u'Graphic Design', u'http://www.creativebloq.com/feed/graphic-design'), # (u'Illustration', u'http://www.creativebloq.com/feed/illustration'), # (u'News', u'http://www.creativebloq.com/feed/news'), # (u'Opinion', u'http://www.creativebloq.com/feed/opinion'), # (u'Tutorials', u'http://www.creativebloq.com/feed/tutorial'), # (u'Typography', u'http://www.creativebloq.com/feed/typography'), # (u'Video', u'http://www.creativebloq.com/feed/video'), # (u'web design', u'http://www.creativebloq.com/feed/web-design'), ] def append_page(self, soup, appendtag, position, surl): pager = soup.find('li', attrs={'class':'pager-current first'}) if pager: nextpages = soup.findAll('li', attrs={'class':'pager-next'}) nextpage = nextpages[1] if nextpage and (nextpage['href'] != surl): nexturl = nextpage['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('li', attrs={'class':'pager-next'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos,nexturl) texttag.extract() pager.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): self.append_page(soup, soup.body, 3, '') pager = soup.find('li', attrs={'class':'pager-current first'}) if pager: pager.extract() return self.adeify_images(soup) this is the html coding from one of the articles that shows how it links to the second page. One thing to note is that there are two spots that have the div class="item-list" as it's lead. Spoiler: <div class="item-list"><ul class="pager" data-pagenum="1"><li class="pager-current first">1</li> <li class="pager-item"><a href="/career/promote-art-online-31514434/page-2" rel="next" title="Go to page 2">2</a></li> <li class="pager-next"><a href="/career/promote-art-online-31514434/page-2" title="Go to next page" rel="next">next ›</a></li> <li class="pager-last last"><a href="/career/promote-art-online-31514434/page-2" title="Go to last page" rel="prev">last »</a></li> </ul></div> I think I have the type of coding that's needed, but if anyone has an easier way for the few articles to pull the second page, please let me know. Just to note, recursion doesn't work as it pulls a lot more links and creates a very big epub, already tried it. Last edited by Camper65; 03-15-2015 at 06:22 PM. Reason: needed to add about recursion