Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-17-2012, 08:13 AM   #16
lrui
Enthusiast
lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.
 
lrui's Avatar
 
Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
Quote:
Originally Posted by Steven630 View Post
What exactly is the name of the recipe? grep append?

sorry,用中文说吧,全局搜索,我用的emeditor,在文件中查找,appen关键词
lrui is offline   Reply With Quote
Old 08-17-2012, 09:00 AM   #17
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Got it. Thanks.
Steven630 is offline   Reply With Quote
Advert
Old 08-18-2012, 02:02 AM   #18
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Anyone knows a recipe that uses both index-parsing (as against rss) and multi-page fetching?
Steven630 is offline   Reply With Quote
Old 08-18-2012, 02:33 AM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
index parsing has no bearing on multipage. What method you use to create the index does not affect multipage in any way.
kovidgoyal is online now   Reply With Quote
Old 08-18-2012, 02:38 AM   #20
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Quote:
Originally Posted by kovidgoyal View Post
index parsing has no bearing on multipage. What method you use to create the index does not affect multipage in any way.
Thank you. That at least keeps me working on the method. Then how do I know if pager is found or not? (With or without codes related to multi-page fetching, the log and the file produced look exactly the same.)
Steven630 is offline   Reply With Quote
Advert
Old 08-18-2012, 02:41 AM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use print statements in your recipe.
kovidgoyal is online now   Reply With Quote
Old 08-18-2012, 02:44 AM   #22
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Quote:
Originally Posted by kovidgoyal View Post
Use print statements in your recipe.
Thanks. Since there's a "next page" button even on the last page. Is there anyway I can let Calibre to know that it's actually the last page? (Like comparing the contents and see if they are the same)
Steven630 is offline   Reply With Quote
Old 08-18-2012, 02:57 AM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use the source, Luke: http://bazaar.launchpad.net/~kovid/c.../feeds/news.py in particular look at the is_link_wanted() function.
kovidgoyal is online now   Reply With Quote
Old 08-18-2012, 03:11 AM   #24
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Quote:
Originally Posted by kovidgoyal View Post
Use the source, Luke: http://bazaar.launchpad.net/~kovid/c.../feeds/news.py in particular look at the is_link_wanted() function.
Thanks. But seems that the next-page link on the last page cannot be simply filtered out since it's identical to previous links (just that it's redundant).

Speaking of 'print', will something like this do?

Code:
    def append_page(self, soup, appendtag, position):
        pager = ...
        if pager:
           self.log('Found pager')
...
Still, it failed to find the pager in the first place.

Could it be possible that the "soup" in index-parsing and the "soup" in append_page are confused? (So it's looking for the pager in the index page rather than the article page)

Last edited by Steven630; 08-22-2012 at 06:48 AM.
Steven630 is offline   Reply With Quote
Old 08-21-2012, 11:07 AM   #25
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Help!
Steven630 is offline   Reply With Quote
Old 08-21-2012, 01:57 PM   #26
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Try with these changes:

Code:
    def append_page(self, soup, appendtag, position, surl):
        pager = soup.find('div', attrs={'id':'pages'})
        if pager:
           nextpages = soup.findAll('a', attrs={'class':'a1'})
           nextpage = nextpages[1]
           if nextpage and (nextpage['href'] != surl):
               nexturl = nextpage['href']
               soup2 = self.index_to_soup(nexturl)
               texttag = soup2.find('div', attrs={'class':'content_left_5'})
               for it in texttag.findAll(style=True):
                   del it['style']
               newpos = len(texttag.contents)
               self.append_page(soup2,texttag,newpos,nexturl)
               texttag.extract()
               pager.extract()
               appendtag.insert(position,texttag)


    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3, '')
        pager = soup.find('div', attrs={'id':'pages'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)
Notice that I changed append_page to contain new parameter. That should be used to pass the current page URL. You use that later to check if the URL of page who called the method is the same or not to the one in pager. If it is the same the recursion is stopped.

Also verify the pager tag I use for searching and texttag you can experiment with those accordingly.
kiklop74 is offline   Reply With Quote
Old 08-21-2012, 09:06 PM   #27
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Quote:
Originally Posted by kiklop74 View Post
Try with these changes:
...

Notice that I changed append_page to contain new parameter. That should be used to pass the current page URL. You use that later to check if the URL of page who called the method is the same or not to the one in pager. If it is the same the recursion is stopped.

Also verify the pager tag I use for searching and texttag you can experiment with those accordingly.
I can't thank you enough. It works! (And I feel like such an idiot after so many false starts involving a silly mistake by me.)



UPDATE: Problem solved thanks to kiklop74.


Also many thanks to lrui (who also spent a lot of time looking into the issue) and kovidgoyal.

Last edited by Steven630; 08-22-2012 at 06:55 AM.
Steven630 is offline   Reply With Quote
Old 08-21-2012, 10:04 PM   #28
lrui
Enthusiast
lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.lrui ought to be getting tired of karma fortunes by now.
 
lrui's Avatar
 
Posts: 49
Karma: 475062
Join Date: Aug 2012
Device: nook simple touch
post your recipe
lrui is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem: Recipe for Foreign Affairs not fetching premium articles besianm Recipes 1 03-07-2012 04:41 AM
Calibre fetching the web page dbip Calibre 1 02-01-2012 04:13 PM
Multi page possible? ProDigit Sigil 11 12-30-2011 12:13 AM
Problem with Multi-file News Articles rozen Recipes 1 10-14-2011 12:05 PM
Multi-column articles in PDF tdido OpenInkpot 7 06-30-2009 11:13 AM


All times are GMT -4. The time now is 10:16 PM.


MobileRead.com is a privately owned, operated and funded community.