How do you get Calibre to follow a trail when > one page

mufc · 12-29-2010, 09:35 PM

Can Calibre follow an article when it is spread over 2 or 3 pages? You read the first page and must click on next or 2 to continue. If so How ?
Thanks

Starson17 · 12-30-2010, 07:46 AM

Quote:

Originally Posted by mufc

Can Calibre follow an article when it is spread over 2 or 3 pages? You read the first page and must click on next or 2 to continue. If so How ?
Thanks

Yes. Search this forum for "multipage" or "append_page" and see the AdventureGamers recipe.

mufc · 12-30-2010, 09:08 PM

Here is the relevant html

Spoiler:

<div class="article">
<div id="main_text">
<h1>How to check up on your cloud provider</h1>
<h2>Cloud providers won't let you audit their actual systems, but there are questions you can ask to decide your level of trust</h2>

<span class='print-link'></span><p>Potential cloud-services customers face a tough problem: How can they trust cloud providers enough to hire them when the providers refuse to reveal <a href="http://www.networkworld.com/news/2010/100710-google-cloud-security.html" target="_blank">important infrastructure details</a> for reasons of security and practicality?</p> <p>These providers say they can’t open their network architectures to customer scrutiny for fear the details will give potential attackers a blueprint for compromising security. They also say the time involved in answering each customer’s questions would be prohibitive.</p> <p><strong>[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page <a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?isource=ifwelg_fssr">Cloud Computing Deep Dive PDF special report</a>. | Stay up on the cloud with InfoWorld's <a href="http://www.infoworld.com/newsletters/subscribe?showlist=infoworld_cloud_computing&s ource=ifwelg_fssr">Cloud Computing Report newsletter</a>. ]</strong></p><div id="edit-promo" style="padding: 5px; background: none no-repeat scroll center top #ffffff; position: relative; float: right; width: 336px; height: 200px; margin-bottom: 0pt; margin-top: 10px;"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/Cloud-deep-dive-promo.jpg" alt="Cloud Computing Deep Dive" /><div id="mobile-deep-dive-button" style="position: relative; top: -36px; left: 14px;"><a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?idglg=ifwsite_editinline&source=ifwelg_new s" target="_blank"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/edit_promo-download_btn.gif" alt="" /></a></div></div> <p><a href="http://www.networkworld.com/topics/cloud-computing.html">(Cloud Computing Research Center)</a></p> <p>The bottom line, as one service provider put it earlier this year, is that customers will never get the level of transparency they want. "We won't let you audit to the degree that you would audit your own infrastructure," says Adam Swidler, a product marketing manager at Google, speaking about Google’s cloud services. "It's never going to be the same as auditing your own infrastructure. You'll have to extend some level of trust to third-party verification."</p> <p>While customers may not be able to walk through cloud providers’ data centers and grill their CISOs, they can submit probing questions whose answers may serve the purpose, says the Cloud Security Alliance, which has written a questionnaire businesses can adapt for their own purposes when trying to assess the suitability of cloud service providers.</p> <p>Called the <a href="http://poena:9992/Rhythmyx/psx_ceArticle/www.cloudsecurityalliance.org/cai" target="_blank">Consensus Assessments Initiative Questionnaire</a>, the document is a well-thought-out framework for assessing cloud security. “This question set is a simplified distillation of the issues, best practices, and control ... intended to help organizations build the necessary assessment processes for engaging with cloud providers,” the CSA says.</p> <p>Key questions to ask:</p>
</div>

<div class="pagination clearfix">

<div class="links"><div class="prevLink"> </div><div class="nextLink"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" class="active">next page ›</a> </div><div class="pages"><span class="pager-current">1</span><span class="pager-item"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" title="Go to page 2" class="active">2</a></span></div></div>

</div>
</div>

Here is my recipe. It removes articles that have more than one page. So obviously I am doing something wrong. I have followed the adventure gamers recipe.

Spoiler:

12-29-2010, 09:35 PM	#1
mufc Connoisseur Posts: 99 Karma: 170 Join Date: Nov 2010 Location: Airdrie Alberta Device: Sony 650	How do you get Calibre to follow a trail when > one page Can Calibre follow an article when it is spread over 2 or 3 pages? You read the first page and must click on next or 2 to continue. If so How ? Thanks

12-30-2010, 09:08 PM	#3
mufc Connoisseur Posts: 99 Karma: 170 Join Date: Nov 2010 Location: Airdrie Alberta Device: Sony 650	OK I am doing something wrong Here is the relevant html Spoiler: <div class="article"> <div id="main_text"> <h1>How to check up on your cloud provider</h1> <h2>Cloud providers won't let you audit their actual systems, but there are questions you can ask to decide your level of trust</h2> <span class='print-link'></span><p>Potential cloud-services customers face a tough problem: How can they trust cloud providers enough to hire them when the providers refuse to reveal <a href="http://www.networkworld.com/news/2010/100710-google-cloud-security.html" target="_blank">important infrastructure details</a> for reasons of security and practicality?</p> <p>These providers say they can’t open their network architectures to customer scrutiny for fear the details will give potential attackers a blueprint for compromising security. They also say the time involved in answering each customer’s questions would be prohibitive.</p> <p><strong>[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page <a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?isource=ifwelg_fssr">Cloud Computing Deep Dive PDF special report</a>. \| Stay up on the cloud with InfoWorld's <a href="http://www.infoworld.com/newsletters/subscribe?showlist=infoworld_cloud_computing&s ource=ifwelg_fssr">Cloud Computing Report newsletter</a>. ]</strong></p><div id="edit-promo" style="padding: 5px; background: none no-repeat scroll center top #ffffff; position: relative; float: right; width: 336px; height: 200px; margin-bottom: 0pt; margin-top: 10px;"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/Cloud-deep-dive-promo.jpg" alt="Cloud Computing Deep Dive" /><div id="mobile-deep-dive-button" style="position: relative; top: -36px; left: 14px;"><a href="http://www.infoworld.com/d/cloud-computing/selecting-the-right-cloud-step-step-guide-692?idglg=ifwsite_editinline&source=ifwelg_new s" target="_blank"><img src="http://www.infoworld.com/sites/infoworld.com/files/media/image/edit_promo-download_btn.gif" alt="" /></a></div></div> <p><a href="http://www.networkworld.com/topics/cloud-computing.html">(Cloud Computing Research Center)</a></p> <p>The bottom line, as one service provider put it earlier this year, is that customers will never get the level of transparency they want. "We won't let you audit to the degree that you would audit your own infrastructure," says Adam Swidler, a product marketing manager at Google, speaking about Google’s cloud services. "It's never going to be the same as auditing your own infrastructure. You'll have to extend some level of trust to third-party verification."</p> <p>While customers may not be able to walk through cloud providers’ data centers and grill their CISOs, they can submit probing questions whose answers may serve the purpose, says the Cloud Security Alliance, which has written a questionnaire businesses can adapt for their own purposes when trying to assess the suitability of cloud service providers.</p> <p>Called the <a href="http://poena:9992/Rhythmyx/psx_ceArticle/www.cloudsecurityalliance.org/cai" target="_blank">Consensus Assessments Initiative Questionnaire</a>, the document is a well-thought-out framework for assessing cloud security. “This question set is a simplified distillation of the issues, best practices, and control ... intended to help organizations build the necessary assessment processes for engaging with cloud providers,” the CSA says.</p> <p>Key questions to ask:</p> </div> <div class="pagination clearfix"> <div class="links"><div class="prevLink"> </div><div class="nextLink"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" class="active">next page ›</a> </div><div class="pages"><span class="pager-current">1</span><span class="pager-item"><a href="/d/cloud-computing/how-check-your-cloud-provider-712?page=0,1" title="Go to page 2" class="active">2</a></span></div><!--/.pages--></div> </div> </div> Here is my recipe. It removes articles that have more than one page. So obviously I am doing something wrong. I have followed the adventure gamers recipe. Spoiler: from calibre.web.feeds.news import BasicNewsRecipe class AdvancedUserRecipe1289709253(BasicNewsRecipe): title = u'InfoWorld test' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False no_stylesheets = True remove_javascript = True extra_css = ''' h1{font-family:Georgia,serif; font-weight:bold;font-size:large;} h2{font-family:Georgia,serif; font-weight:normal;font-size:small;} p{font-family:Georgia,serif;font-size:small;} body{font-family:Georgia,serif;font-size:small;} ''' remove_tags = [dict(name='div', attrs={'class':['']}), dict(name='div', attrs={'id':['']}), dict(name='img'),] keep_only_tags = [dict(name='div', attrs={'class':['article']})] feeds = [(u'News', u'http://www.infoworld.com/news/feed'), (u'Test Center', u'http://www.infoworld.com/testcenter/feed'), (u'Open Source', u'http://www.infoworld.com/taxonomy/term/3218/feed'), (u'Windows', u'http://www.infoworld.com/taxonomy/term/3213/feed')] def append_page(self, soup, appendtag, position): pager = soup.find('div',attrs={'class':'nextLink'}) if pager: nexturl = self.INDEX + pager.a['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'id':'main_text'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() appendtag.insert(position,texttag) def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll('div', attrs={'class':'edit-promo'}): item.extract() self.append_page(soup, soup.body, 3) pager = soup.find('div',attrs={'class':'pagination clearfix'}) if pager: pager.extract() return self.adeify_images(soup) Last edited by mufc; 12-31-2010 at 01:18 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
See? I follow advice.	durkinrobinson	Writers' Corner	3	11-19-2010 01:08 PM
Kobo on sale for $99; others to follow?	L.J. Sellers	News	11	10-27-2010 06:02 AM
How do you follow a link using the new Coolreader?	jusmee	Astak EZReader	0	04-08-2010 09:54 PM
HELP, details to follow.......:)	Techick	Workshop	2	06-10-2005 11:43 AM
handstory basic doesn't follow links?	pierrr	Reading and Management	3	09-25-2003 11:14 AM

Advert