08-17-2012, 08:13 AM | #16 |
Enthusiast
Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
|
08-17-2012, 09:00 AM | #17 |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Got it. Thanks.
|
Advert | |
|
08-18-2012, 02:02 AM | #18 |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Anyone knows a recipe that uses both index-parsing (as against rss) and multi-page fetching?
|
08-18-2012, 02:33 AM | #19 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
index parsing has no bearing on multipage. What method you use to create the index does not affect multipage in any way.
|
08-18-2012, 02:38 AM | #20 |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Thank you. That at least keeps me working on the method. Then how do I know if pager is found or not? (With or without codes related to multi-page fetching, the log and the file produced look exactly the same.)
|
Advert | |
|
08-18-2012, 02:41 AM | #21 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use print statements in your recipe.
|
08-18-2012, 02:44 AM | #22 |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
|
08-18-2012, 02:57 AM | #23 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use the source, Luke: http://bazaar.launchpad.net/~kovid/c.../feeds/news.py in particular look at the is_link_wanted() function.
|
08-18-2012, 03:11 AM | #24 | |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Quote:
Speaking of 'print', will something like this do? Code:
def append_page(self, soup, appendtag, position): pager = ... if pager: self.log('Found pager') ... Could it be possible that the "soup" in index-parsing and the "soup" in append_page are confused? (So it's looking for the pager in the index page rather than the article page) Last edited by Steven630; 08-22-2012 at 06:48 AM. |
|
08-21-2012, 11:07 AM | #25 |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Help!
|
08-21-2012, 01:57 PM | #26 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Try with these changes:
Code:
def append_page(self, soup, appendtag, position, surl): pager = soup.find('div', attrs={'id':'pages'}) if pager: nextpages = soup.findAll('a', attrs={'class':'a1'}) nextpage = nextpages[1] if nextpage and (nextpage['href'] != surl): nexturl = nextpage['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'content_left_5'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos,nexturl) texttag.extract() pager.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): self.append_page(soup, soup.body, 3, '') pager = soup.find('div', attrs={'id':'pages'}) if pager: pager.extract() return self.adeify_images(soup) Also verify the pager tag I use for searching and texttag you can experiment with those accordingly. |
08-21-2012, 09:06 PM | #27 | |
Groupie
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Quote:
UPDATE: Problem solved thanks to kiklop74. Also many thanks to lrui (who also spent a lot of time looking into the issue) and kovidgoyal. Last edited by Steven630; 08-22-2012 at 06:55 AM. |
|
08-21-2012, 10:04 PM | #28 |
Enthusiast
Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
post your recipe
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem: Recipe for Foreign Affairs not fetching premium articles | besianm | Recipes | 1 | 03-07-2012 04:41 AM |
Calibre fetching the web page | dbip | Calibre | 1 | 02-01-2012 04:13 PM |
Multi page possible? | ProDigit | Sigil | 11 | 12-30-2011 12:13 AM |
Problem with Multi-file News Articles | rozen | Recipes | 1 | 10-14-2011 12:05 PM |
Multi-column articles in PDF | tdido | OpenInkpot | 7 | 06-30-2009 11:13 AM |