![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
|
help! how to handle multi page topic
Hi,
there are some rss instance refer to a topic divided in multiple pages, can Calibre handle it? if yes, how to write the recipe? thanks! zxpan |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Yes, Calibre can handle multipage sites. Search here for "multipage." Search here and in the builtin recipes for "append_page" and see the AdventureGamers recipe.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
|
Dear Starson17:
Thank you for the help. I had check some post about append_pages code, but i don't know how to write the code fetch the hyperlink. the link in my page is like below. <center> <table border="0" align="center"> <tbody> <tr> <td> <a href="/GB/14562/15549575.html"> <img src="/img/next_b.gif" border="0"/> </a> </td> </tr> </tbody> </table> </center> Can you help me? thx |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
pager = soup.find('a') if pager.img['src'] == "/img/next_b.gif": nexturl = self.INDEX + pager.a['href'] If you don't know what pager is, see the various recipes that use append_page. I hate posting code without testing it, so that part is up to you. |
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
|
Hi, Starson17:
Thanks, but I still need help. this is my code class peoplenetrecipe(BasicNewsRecipe): title = '人民网' __author__ = 'me' oldest_article = 3 max_articles_per_feed = 25 feeds = [ ('china', 'http://www.people.com.cn/rss/politics.xml'), ('world', 'http://www.people.com.cn/rss/world.xml'), ('finance', 'http://www.people.com.cn/rss/finance.xml'), ('sport', 'http://www.people.com.cn/rss/sports.xml'), ] no_stylesheets = True # remove_javascript = True # encoding = 'UTF-8' keep_only_tags = [ dict(name='div', attrs={'class':'c_l fl'}), ] remove_tags = [ dict(name='div', attrs={'class':'tools'}), dict(name='div', attrs={'class':'box'}), ] remove_tags_after = [ dict(name='div', attrs={'class':'show_text'}), ] def append_page(self, soup, appendtag, position): pager = soup.find('a') if pager.img['src'] == "/img/next_b.gif": nexturl = self.INDEX + pager.a['href'] # pager = soup.find('a',attrs={'class':'nextPage greyButton'}) # here is pager # if pager: # nexturl = self.INDEX + pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'c_l fl'}) # here is text for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) it seems not work, the page http://politics.people.com.cn/GB/1024/15556053.html is in Chinese, at bottom there is a link to next page, code is <a href="/GB/1024/15556054.html"> <img src="/img/next_b.gif" border="0"/> I don't know how to debug the recipe. so, would you pls help to check it? Thanks BR |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unutterably Silly Corrupt-A-Topic (anymore off-topic and it would be on-topic) | RWood | Lounge | 6227 | 08-18-2023 10:58 PM |
How to handle badly formed xml from web page? | kiwidude | Development | 6 | 02-19-2011 12:05 AM |
multi-page HTML with images to ePub or LRF | Nvidiot | Workshop | 19 | 07-13-2009 07:20 PM |
how to handle one book has multi-files ? | zhanglong | Calibre | 5 | 03-27-2009 11:47 PM |
converting multi-page HTML to Mobipocket | shinew | Calibre | 13 | 02-21-2009 01:33 PM |