Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-29-2010, 09:35 PM   #1
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
How do you get Calibre to follow a trail when > one page

Can Calibre follow an article when it is spread over 2 or 3 pages? You read the first page and must click on next or 2 to continue. If so How ?
Thanks
mufc is offline   Reply With Quote
Old 12-30-2010, 07:46 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mufc View Post
Can Calibre follow an article when it is spread over 2 or 3 pages? You read the first page and must click on next or 2 to continue. If so How ?
Thanks
Yes. Search this forum for "multipage" or "append_page" and see the AdventureGamers recipe.
Starson17 is offline   Reply With Quote
Advert
Old 12-30-2010, 09:08 PM   #3
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
OK I am doing something wrong

Here is the relevant html
Spoiler:
<div class="article">
<div id="main_text">
<h1>How to check up on your cloud provider</h1>
<h2>Cloud providers won't let you audit their actual systems, but there are questions you can ask to decide your level of trust</h2>



<span class='print-link'></span><p>Potential cloud-services customers face a tough problem: How can they trust cloud providers enough to hire them when the providers refuse to reveal <a href="http://www.networkworld.com/news/2010/100710-google-cloud-security.html" target="_blank">important infrastructure details</a> for reasons of security and practicality?</p> <p>These providers say they can’t open their network architectures to customer scrutiny for fear the details will give potential attackers a blueprint for compromising security. They also say the time involved in answering each customer’s questions would be prohibitive.</p> <p><strong>[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page <a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?isource=ifwelg_fssr">Cloud Computing Deep Dive PDF special report</a>. | Stay up on the cloud with InfoWorld's <a href="http://www.infoworld.com/newsletters/subscribe?showlist=infoworld_cloud_computing&amp;s ource=ifwelg_fssr">Cloud Computing Report newsletter</a>. ]</strong></p><div id="edit-promo" style="padding: 5px; background: none no-repeat scroll center top #ffffff; position: relative; float: right; width: 336px; height: 200px; margin-bottom: 0pt; margin-top: 10px;"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/Cloud-deep-dive-promo.jpg" alt="Cloud Computing Deep Dive" /><div id="mobile-deep-dive-button" style="position: relative; top: -36px; left: 14px;"><a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?idglg=ifwsite_editinline&amp;source=ifwelg_new s" target="_blank"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/edit_promo-download_btn.gif" alt="" /></a></div></div> <p><a href="http://www.networkworld.com/topics/cloud-computing.html">(Cloud Computing Research Center)</a></p> <p>The bottom line, as one service provider put it earlier this year, is that customers will never get the level of transparency they want. "We won't let you audit to the degree that you would audit your own infrastructure," says Adam Swidler, a product marketing manager at Google, speaking about Google’s cloud services. "It's never going to be the same as auditing your own infrastructure. You'll have to extend some level of trust to third-party verification."</p> <p>While customers may not be able to walk through cloud providers’ data centers and grill their CISOs, they can submit probing questions whose answers may serve the purpose, says the Cloud Security Alliance, which has written a questionnaire businesses can adapt for their own purposes when trying to assess the suitability of cloud service providers.</p> <p>Called the <a href="http://poena:9992/Rhythmyx/psx_ceArticle/www.cloudsecurityalliance.org/cai" target="_blank">Consensus Assessments Initiative Questionnaire</a>, the document is a well-thought-out framework for assessing cloud security. “This question set is a simplified distillation of the issues, best practices, and control ... intended to help organizations build the necessary assessment processes for engaging with cloud providers,” the CSA says.</p> <p>Key questions to ask:</p>
</div>

<div class="pagination clearfix">

<div class="links"><div class="prevLink">&nbsp;</div><div class="nextLink"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" class="active">next page ›</a>&nbsp;</div><div class="pages"><span class="pager-current">1</span><span class="pager-item"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" title="Go to page 2" class="active">2</a></span></div><!--/.pages--></div>

</div>
</div>

Here is my recipe. It removes articles that have more than one page. So obviously I am doing something wrong. I have followed the adventure gamers recipe.
Spoiler:

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1289709253(BasicNewsRecipe):
title = u'InfoWorld test'
oldest_article = 7
max_articles_per_feed = 100

use_embedded_content = False
no_stylesheets = True

remove_javascript = True
extra_css = '''
h1{font-family:Georgia,serif; font-weight:bold;font-size:large;}
h2{font-family:Georgia,serif; font-weight:normal;font-size:small;}
p{font-family:Georgia,serif;font-size:small;}
body{font-family:Georgia,serif;font-size:small;}
'''



remove_tags = [dict(name='div', attrs={'class':['']}),
dict(name='div', attrs={'id':['']}),
dict(name='img'),]

keep_only_tags = [dict(name='div', attrs={'class':['article']})]

feeds = [(u'News', u'http://www.infoworld.com/news/feed'),
(u'Test Center', u'http://www.infoworld.com/testcenter/feed'),
(u'Open Source', u'http://www.infoworld.com/taxonomy/term/3218/feed'),
(u'Windows', u'http://www.infoworld.com/taxonomy/term/3213/feed')]




def append_page(self, soup, appendtag, position):
pager = soup.find('div',attrs={'class':'nextLink'})
if pager:
nexturl = self.INDEX + pager.a['href']
soup2 = self.index_to_soup(nexturl)
texttag = soup2.find('div', attrs={'id':'main_text'})
for it in texttag.findAll(style=True):
del it['style']
newpos = len(texttag.contents)
self.append_page(soup2,texttag,newpos)
texttag.extract()
appendtag.insert(position,texttag)

def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll('div', attrs={'class':'edit-promo'}):
item.extract()
self.append_page(soup, soup.body, 3)
pager = soup.find('div',attrs={'class':'pagination clearfix'})
if pager:
pager.extract()
return self.adeify_images(soup)



Last edited by mufc; 12-31-2010 at 01:18 PM.
mufc is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
See? I follow advice. durkinrobinson Writers' Corner 3 11-19-2010 01:08 PM
Kobo on sale for $99; others to follow? L.J. Sellers News 11 10-27-2010 06:02 AM
How do you follow a link using the new Coolreader? jusmee Astak EZReader 0 04-08-2010 09:54 PM
HELP, details to follow.......:) Techick Workshop 2 06-10-2005 11:43 AM
handstory basic doesn't follow links? pierrr Reading and Management 3 09-25-2003 11:14 AM


All times are GMT -4. The time now is 12:43 AM.


MobileRead.com is a privately owned, operated and funded community.