Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-30-2011, 04:44 AM   #1
zhixiangpan
Junior Member
zhixiangpan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
help! how to handle multi page topic

Hi,

there are some rss instance refer to a topic divided in multiple pages, can Calibre handle it? if yes, how to write the recipe?

thanks!

zxpan
zhixiangpan is offline   Reply With Quote
Old 08-30-2011, 11:50 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by zhixiangpan View Post
there are some rss instance refer to a topic divided in multiple pages, can Calibre handle it? if yes, how to write the recipe?
Yes, Calibre can handle multipage sites. Search here for "multipage." Search here and in the builtin recipes for "append_page" and see the AdventureGamers recipe.
Starson17 is offline   Reply With Quote
Advert
Old 08-31-2011, 01:06 AM   #3
zhixiangpan
Junior Member
zhixiangpan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
Dear Starson17:

Thank you for the help. I had check some post about append_pages code, but i don't know how to write the code fetch the hyperlink. the link in my page is like below.

<center>
<table border="0" align="center">
<tbody>
<tr>
<td>
<a href="/GB/14562/15549575.html">
<img src="/img/next_b.gif" border="0"/>
</a>
</td>
</tr>
</tbody>
</table>
</center>

Can you help me?

thx
zhixiangpan is offline   Reply With Quote
Old 08-31-2011, 02:55 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by zhixiangpan View Post
Dear Starson17:

Thank you for the help. I had check some post about append_pages code, but i don't know how to write the code fetch the hyperlink. the link in my page is like below.

Code:
<center>
<table border="0" align="center">
<tbody>
<tr>
<td>
<a href="/GB/14562/15549575.html">
<img src="/img/next_b.gif" border="0"/>
</a>
</td>
</tr>
</tbody>
</table>
</center>
Can you help me?
Without looking closely at your page, I can't be sure, but something like this may work:
Code:
        pager = soup.find('a')
        if pager.img['src'] == "/img/next_b.gif":
           nexturl = self.INDEX + pager.a['href']
Find the <a> tag, see if it has an <img> tag that points to the "next" image (whatever that is), and if so, grab the href and append it to the INDEX.

If you don't know what pager is, see the various recipes that use append_page.
I hate posting code without testing it, so that part is up to you.
Starson17 is offline   Reply With Quote
Old 08-31-2011, 09:46 PM   #5
zhixiangpan
Junior Member
zhixiangpan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2011
Device: kindle
Hi, Starson17:

Thanks, but I still need help.

this is my code

class peoplenetrecipe(BasicNewsRecipe):
title = '人民网'
__author__ = 'me'
oldest_article = 3
max_articles_per_feed = 25

feeds = [
('china', 'http://www.people.com.cn/rss/politics.xml'),
('world', 'http://www.people.com.cn/rss/world.xml'),
('finance', 'http://www.people.com.cn/rss/finance.xml'),
('sport', 'http://www.people.com.cn/rss/sports.xml'),
]

no_stylesheets = True
# remove_javascript = True
# encoding = 'UTF-8'

keep_only_tags = [
dict(name='div', attrs={'class':'c_l fl'}),
]
remove_tags = [
dict(name='div', attrs={'class':'tools'}),
dict(name='div', attrs={'class':'box'}),
]
remove_tags_after = [
dict(name='div', attrs={'class':'show_text'}),
]

def append_page(self, soup, appendtag, position):

pager = soup.find('a')
if pager.img['src'] == "/img/next_b.gif":
nexturl = self.INDEX + pager.a['href']

# pager = soup.find('a',attrs={'class':'nextPage greyButton'}) # here is pager
# if pager:
# nexturl = self.INDEX + pager.a['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'class':'c_l fl'}) # here is text
for it in texttag.findAll(style=True):
del it['style']
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
texttag.extract()
appendtag.insert(position,texttag)

it seems not work, the page

http://politics.people.com.cn/GB/1024/15556053.html

is in Chinese, at bottom there is a link to next page, code is

<a href="/GB/1024/15556054.html">
<img src="/img/next_b.gif" border="0"/>

I don't know how to debug the recipe. so, would you pls help to check it?

Thanks

BR
zhixiangpan is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unutterably Silly Corrupt-A-Topic (anymore off-topic and it would be on-topic) RWood Lounge 6227 08-18-2023 10:58 PM
How to handle badly formed xml from web page? kiwidude Development 6 02-19-2011 12:05 AM
multi-page HTML with images to ePub or LRF Nvidiot Workshop 19 07-13-2009 07:20 PM
how to handle one book has multi-files ? zhanglong Calibre 5 03-27-2009 11:47 PM
converting multi-page HTML to Mobipocket shinew Calibre 13 02-21-2009 01:33 PM


All times are GMT -4. The time now is 07:18 AM.


MobileRead.com is a privately owned, operated and funded community.