MobileRead Forums - View Single Post - Recipes - Re-usable code

Starson17 · 11-09-2011, 10:01 AM

This is not my code, but there have been many requests for code to handle sites where each article is split into multiple pages. At the bottom of each page will be a button to go to the next page. Here is typical code from Darko Miletic's builtin recipe for Adventure Gamers that is used in this situation:

You may want to look at the source for an article at Adventure Gamers with FireBug or equivalent. The append_page code identifies each "next page" button, follows the link it points to ("nexturl"), finds the article text on that next page, inserts that text into the first page beneath the article text found on the first page, and recursively reiterates that process until the last page (identified by not having the "next page" button) is found.

The append_page code is then used in preprocess_html.

Spoiler:

11-09-2011, 10:01 AM	#17
Starson17 Wizard Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T	Multiple Page Sites This is not my code, but there have been many requests for code to handle sites where each article is split into multiple pages. At the bottom of each page will be a button to go to the next page. Here is typical code from Darko Miletic's builtin recipe for Adventure Gamers that is used in this situation: You may want to look at the source for an article at Adventure Gamers with FireBug or equivalent. The append_page code identifies each "next page" button, follows the link it points to ("nexturl"), finds the article text on that next page, inserts that text into the first page beneath the article text found on the first page, and recursively reiterates that process until the last page (identified by not having the "next page" button) is found. The append_page code is then used in preprocess_html. Spoiler: Code: INDEX = u'http://www.adventuregamers.com' def append_page(self, soup, appendtag, position): pager = soup.find('div',attrs={'class':'toolbar_fat_next'}) if pager: nexturl = self.INDEX + pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'bodytext'}) newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): self.append_page(soup, soup.body, 3) pager = soup.find('div',attrs={'class':'toolbar_fat}) if pager: pager.extract() return self.adeify_images(soup)