| 
			
			 | 
		#16 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49 
				Karma: 475062 
				Join Date: Aug 2012 
				
				
				
				Device: nook simple touch 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#17 | 
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Got it. Thanks.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#18 | 
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Anyone knows a recipe that uses both index-parsing (as against rss) and multi-page fetching?
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#19 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			index parsing has no bearing on multipage. What method you use to create the index does not affect multipage in any way.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#20 | 
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thank you. That at least keeps me working on the method. Then how do I know if pager is found or not? (With or without codes related to multi-page fetching, the log and the file produced look exactly the same.)
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#21 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Use print statements in your recipe.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#22 | 
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#23 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Use the source, Luke: http://bazaar.launchpad.net/~kovid/c.../feeds/news.py in particular look at the is_link_wanted() function.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#24 | |
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Speaking of 'print', will something like this do? Code: 
	    def append_page(self, soup, appendtag, position):
        pager = ...
        if pager:
           self.log('Found pager')
...
![]() Could it be possible that the "soup" in index-parsing and the "soup" in append_page are confused? (So it's looking for the pager in the index page rather than the article page) Last edited by Steven630; 08-22-2012 at 07:48 AM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#25 | 
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Help!
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#26 | 
| 
			
			
			
			 Guru 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800 
				Karma: 194644 
				Join Date: Dec 2007 
				Location: Argentina 
				
				
				Device: Kindle Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Try with these changes: 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Code: 
	    def append_page(self, soup, appendtag, position, surl):
        pager = soup.find('div', attrs={'id':'pages'})
        if pager:
           nextpages = soup.findAll('a', attrs={'class':'a1'})
           nextpage = nextpages[1]
           if nextpage and (nextpage['href'] != surl):
               nexturl = nextpage['href']
               soup2 = self.index_to_soup(nexturl)
               texttag = soup2.find('div', attrs={'class':'content_left_5'})
               for it in texttag.findAll(style=True):
                   del it['style']
               newpos = len(texttag.contents)
               self.append_page(soup2,texttag,newpos,nexturl)
               texttag.extract()
               pager.extract()
               appendtag.insert(position,texttag)
    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3, '')
        pager = soup.find('div', attrs={'id':'pages'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)
Also verify the pager tag I use for searching and texttag you can experiment with those accordingly.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#27 | |
| 
			
			
			
			 Groupie 
			
			![]() Posts: 180 
				Karma: 10 
				Join Date: May 2012 
				
				
				
				Device: Kindle Paperwhite2 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
   (And I feel like such an idiot after so many false starts involving a silly mistake by me.)UPDATE: Problem solved thanks to kiklop74. Also many thanks to lrui (who also spent a lot of time looking into the issue) and kovidgoyal. Last edited by Steven630; 08-22-2012 at 07:55 AM.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#28 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49 
				Karma: 475062 
				Join Date: Aug 2012 
				
				
				
				Device: nook simple touch 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			post your recipe
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Problem: Recipe for Foreign Affairs not fetching premium articles | besianm | Recipes | 1 | 03-07-2012 05:41 AM | 
| Calibre fetching the web page | dbip | Calibre | 1 | 02-01-2012 05:13 PM | 
| Multi page possible? | ProDigit | Sigil | 11 | 12-30-2011 01:13 AM | 
| Problem with Multi-file News Articles | rozen | Recipes | 1 | 10-14-2011 01:05 PM | 
| Multi-column articles in PDF | tdido | OpenInkpot | 7 | 06-30-2009 12:13 PM |