|
|
#16 |
|
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
|
|
|
|
|
|
#17 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Got it. Thanks.
|
|
|
|
|
|
#18 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Anyone knows a recipe that uses both index-parsing (as against rss) and multi-page fetching?
|
|
|
|
|
|
#19 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
index parsing has no bearing on multipage. What method you use to create the index does not affect multipage in any way.
|
|
|
|
|
|
#20 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Thank you. That at least keeps me working on the method. Then how do I know if pager is found or not? (With or without codes related to multi-page fetching, the log and the file produced look exactly the same.)
|
|
|
|
|
|
#21 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use print statements in your recipe.
|
|
|
|
|
|
#22 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
|
|
|
|
|
|
#23 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use the source, Luke: http://bazaar.launchpad.net/~kovid/c.../feeds/news.py in particular look at the is_link_wanted() function.
|
|
|
|
|
|
#24 | |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Quote:
Speaking of 'print', will something like this do? Code:
def append_page(self, soup, appendtag, position):
pager = ...
if pager:
self.log('Found pager')
...
![]() Could it be possible that the "soup" in index-parsing and the "soup" in append_page are confused? (So it's looking for the pager in the index page rather than the article page) Last edited by Steven630; 08-22-2012 at 07:48 AM. |
|
|
|
|
|
|
#25 |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Help!
|
|
|
|
|
|
#26 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Try with these changes:
Code:
def append_page(self, soup, appendtag, position, surl):
pager = soup.find('div', attrs={'id':'pages'})
if pager:
nextpages = soup.findAll('a', attrs={'class':'a1'})
nextpage = nextpages[1]
if nextpage and (nextpage['href'] != surl):
nexturl = nextpage['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'class':'content_left_5'})
for it in texttag.findAll(style=True):
del it['style']
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos,nexturl)
texttag.extract()
pager.extract()
appendtag.insert(position,texttag)
def preprocess_html(self, soup):
self.append_page(soup, soup.body, 3, '')
pager = soup.find('div', attrs={'id':'pages'})
if pager:
pager.extract()
return self.adeify_images(soup)
Also verify the pager tag I use for searching and texttag you can experiment with those accordingly. |
|
|
|
|
|
#27 | |
|
Groupie
![]() Posts: 180
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
|
Quote:
(And I feel like such an idiot after so many false starts involving a silly mistake by me.)UPDATE: Problem solved thanks to kiklop74. Also many thanks to lrui (who also spent a lot of time looking into the issue) and kovidgoyal. Last edited by Steven630; 08-22-2012 at 07:55 AM. |
|
|
|
|
|
|
#28 |
|
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
|
post your recipe
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Problem: Recipe for Foreign Affairs not fetching premium articles | besianm | Recipes | 1 | 03-07-2012 05:41 AM |
| Calibre fetching the web page | dbip | Calibre | 1 | 02-01-2012 05:13 PM |
| Multi page possible? | ProDigit | Sigil | 11 | 12-30-2011 01:13 AM |
| Problem with Multi-file News Articles | rozen | Recipes | 1 | 10-14-2011 01:05 PM |
| Multi-column articles in PDF | tdido | OpenInkpot | 7 | 06-30-2009 12:13 PM |