Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-27-2013, 08:36 PM   #1
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Problem getting print_version to be pulled

I'm working on fixing my InformationWeek recipe. It gets the regular page articles (and if more than one page, only the first page). I had it set to actually try to pull the print version (which is the full article) but it's still not getting the print version.

Here is the recipe

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class InformationWeek(BasicNewsRecipe):
    title          = u'InformationWeek'
    oldest_article = 3
    max_articles_per_feed = 150
    auto_cleanup = True
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True
    remove_javascript = True
    use_embedded_content   = False


    feeds          = [
                          (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'),
                          (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'),
                          (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'),
                          (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'),
	      (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware')
                     ]

    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'healthcare' in article.title or 'healthcare' in article.url:
            feed.articles.remove(article)
      return feeds

    def print_version(self, url):
          main, sep, unneeded = url.rpartition('?')
          return main + '?printer_friendly=this-page'
Here is one of the original article URLs

http://www.informationweek.com/socia...SSfeed_IWK_ALL

and here is the printer version URL
http://www.informationweek.com/socia...ndly=this-page

I presently have the recipe remove the last bit (which changes based on which area it comes from) and put in ?printer_friendly=this-page but it's still failing to download the printer version of the article.

Any ideas?
Camper65 is offline   Reply With Quote
Old 07-27-2013, 11:29 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
url.rpartition('?')[0]
kovidgoyal is offline   Reply With Quote
Advert
Old 07-28-2013, 09:54 PM   #3
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Quote:
Originally Posted by kovidgoyal View Post
url.rpartition('?')[0]
Tried that, this is the resulting job details for it.

it only gives me a section title after the cover page and title page for the first section and then a title page for what should be the first article but nothing but

InformationWeek - Stories

InformationWeek

on the page.

Code:
Fetch news from InformationWeek
Resolved conversion options
calibre version: 0.9.40
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'dont_split_on_page_breaks': True,
 'duplicate_links_in_toc': False,
 'embed_all_fonts': False,
 'embed_font_family': None,
 'enable_heuristics': False,
 'epub_flatten': False,
 'epub_inline_toc': False,
 'epub_toc_at_end': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': None,
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x034AEB70>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x034AEEB0>,
 'page_breaks_before': None,
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': '',
 'search_replace': None,
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': '',
 'sr1_search': '',
 'sr2_replace': '',
 'sr2_search': '',
 'sr3_replace': '',
 'sr3_search': '',
 'start_reading_at': None,
 'subset_embedded_fonts': False,
 'tags': None,
 'test': False,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'toc_title': None,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: Recipe Input running
Using custom recipe
Skipping article 'Largest-Ever' Data Breach Scheme Uncovered, Feds Say (Thu, 25 Jul, 2013 19:30) from feed InformationWeek - Stories as it is too old.
Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InformationWeek - News as it is too old.
Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InformationWeek - News as it is too old.
Skipping article LinkedIn Sponsored Updates: 4 Things To Know (Thu, 25 Jul, 2013 13:58) from feed InformationWeek - News as it is too old.
Skipping article Rosetta Stone Moves Deeper Into Education Tech (Thu, 25 Jul, 2013 12:39) from feed InformationWeek - News as it is too old.
Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InformationWeek - Personal Tech as it is too old.
Skipping article LinkedIn Sponsored Updates: 4 Things To Know (Thu, 25 Jul, 2013 13:58) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Rosetta Stone Moves Deeper Into Education Tech (Thu, 25 Jul, 2013 12:39) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Android 4.3's Best Features (Thu, 25 Jul, 2013 11:42) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Apple, Samsung Lead Q2 Smartphone Sales (Wed, 24 Jul, 2013 11:30) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Facebook, LinkedIn Rank Like Airlines On User Satisfaction (Tue, 23 Jul, 2013 14:46) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Verizon Intros Trio Of Motorola Droids (Tue, 23 Jul, 2013 14:31) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Nokia Brings Big Screen To Lumia Line (Tue, 23 Jul, 2013 11:14) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Moto X: Motorola's Not-So-Bold Rebirth (Mon, 22 Jul, 2013 10:55) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Apple Tinkers With Larger iPhones, iPads (Mon, 22 Jul, 2013 09:50) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Smartphone Plans: 5 Ways To Avoid Trouble (Sat, 20 Jul, 2013 09:06) from feed InformationWeek - Personal Tech as it is too old.
Skipping article HTC Unveils Miniature One Smartphone (Thu, 18 Jul, 2013 11:50) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Google Second-Gen Nexus 7 Debut Tipped (Thu, 18 Jul, 2013 10:47) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Yahoo's Year In Mobile: Progress Report (Wed, 17 Jul, 2013 13:22) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Next iPhone May Have Bigger Screen (Wed, 17 Jul, 2013 11:30) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Aetna CarePass Combines Mobile Health App Data (Wed, 17 Jul, 2013 10:50) from feed InformationWeek - Personal Tech as it is too old.
Skipping article AT&T Challenges T-Mobile Uncarrier Strategy (Tue, 16 Jul, 2013 12:15) from feed InformationWeek - Personal Tech as it is too old.
Skipping article BlackBerry A10 Details Leak; Z10 Price Drops (Tue, 16 Jul, 2013 11:40) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Apple Investigating iPhone Death In China (Mon, 15 Jul, 2013 17:34) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Mobile App Update Bonanza (Sat, 13 Jul, 2013 09:06) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Windows Phone Scores More Key Apps (Fri, 12 Jul, 2013 11:20) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Samsung Challenges Apple 'Bounce-Back' Patent Verdict (Fri, 12 Jul, 2013 10:02) from feed InformationWeek - Personal Tech as it is too old.
Skipping article Google Nexus 7 Heats Up Mini-Tablet Battle (Thu, 25 Jul, 2013 12:18) from feed InformationWeek - Software as it is too old.
Skipping article Windows XP's End Of Life: Readers Respond (Thu, 25 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old.
Skipping article Windows Phone Users Wait Impatiently For Updates (Thu, 25 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old.
Skipping article Cloudera Brings Role-Based Security To Hadoop (Wed, 24 Jul, 2013 13:33) from feed InformationWeek - Software as it is too old.
Skipping article Network Solutions Knocked Down Again (Wed, 24 Jul, 2013 12:35) from feed InformationWeek - Software as it is too old.
Skipping article DataStax Stalks Oracle With Cassandra Upgrades (Wed, 24 Jul, 2013 10:22) from feed InformationWeek - Software as it is too old.
Skipping article Data Science Certification Program Emerges (Tue, 23 Jul, 2013 11:49) from feed InformationWeek - Software as it is too old.
Skipping article What Cisco Gains From Sourcefire (Tue, 23 Jul, 2013 11:07) from feed InformationWeek - Software as it is too old.
Skipping article Oracle Says Utilities Botch Smart Meter Data Analysis (Tue, 23 Jul, 2013 10:57) from feed InformationWeek - Software as it is too old.
Skipping article In-Q-Tel Backs Open Source Mapping Company (Tue, 23 Jul, 2013 10:26) from feed InformationWeek - Software as it is too old.
Skipping article 3 End Of Windows XP Upgrade Headaches (Tue, 23 Jul, 2013 09:58) from feed InformationWeek - Software as it is too old.
Skipping article Salesforce Improves Chatter Mobile Apps (Tue, 23 Jul, 2013 09:01) from feed InformationWeek - Software as it is too old.
Skipping article NASCAR Team Drives Dell Windows 8 Tablets (Mon, 22 Jul, 2013 15:23) from feed InformationWeek - Software as it is too old.
Skipping article Facebook Scores 100M Mobile Users In Emerging Markets (Mon, 22 Jul, 2013 13:05) from feed InformationWeek - Software as it is too old.
Skipping article Tableau Takes Data Visualization Online (Mon, 22 Jul, 2013 08:30) from feed InformationWeek - Software as it is too old.
Skipping article SAP Co-CEO Snabe Plans To Step Down (Mon, 22 Jul, 2013 08:25) from feed InformationWeek - Software as it is too old.
Skipping article Microsoft's Struggles Grow: 9 Key Points (Fri, 19 Jul, 2013 10:59) from feed InformationWeek - Software as it is too old.
Skipping article Intel's Next Hope: $300 Windows 8 Devices (Fri, 19 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old.
Skipping article Google Chrome For iOS Promises Data Cost Savings (Thu, 18 Jul, 2013 15:36) from feed InformationWeek - Software as it is too old.
Skipping article SAP Sees Cloud Growth Accelerating (Thu, 18 Jul, 2013 12:35) from feed InformationWeek - Software as it is too old.
Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Google Nexus 7 Heats Up Mini-Tablet Battle (Thu, 25 Jul, 2013 12:18) from feed InforamtionWeek - Hardware as it is too old.
Skipping article 10 Hidden iPhone Tips, Tricks (Thu, 25 Jul, 2013 11:06) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Google Takes On Apple With Chromecast, Android 4.3 (Thu, 25 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Google Nexus 7: Small Tablet To Beat (Wed, 24 Jul, 2013 14:04) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Intel's Plan To Dominate Data Centers (Wed, 24 Jul, 2013 11:35) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Apple, Samsung Lead Q2 Smartphone Sales (Wed, 24 Jul, 2013 11:30) from feed InforamtionWeek - Hardware as it is too old.
Skipping article 10 Tablet Battery Tips: More Power (Wed, 24 Jul, 2013 11:06) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Apple Sets iPhone Sales Record For Quarter (Wed, 24 Jul, 2013 09:47) from feed InforamtionWeek - Hardware as it is too old.
Skipping article OpenStack Grows Up And China Notices (Wed, 24 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old.
Skipping article CIO Profile: William H. Miller, Jr., Of Broadcom (Wed, 24 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Verizon Intros Trio Of Motorola Droids (Tue, 23 Jul, 2013 14:31) from feed InforamtionWeek - Hardware as it is too old.
Skipping article FBI To Overhaul Printer Network (Tue, 23 Jul, 2013 12:46) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Nokia Brings Big Screen To Lumia Line (Tue, 23 Jul, 2013 11:14) from feed InforamtionWeek - Hardware as it is too old.
Skipping article NASCAR Team Drives Dell Windows 8 Tablets (Mon, 22 Jul, 2013 15:23) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Facebook Scores 100M Mobile Users In Emerging Markets (Mon, 22 Jul, 2013 13:05) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Common Core Meets Aging Education Technology (Mon, 22 Jul, 2013 12:16) from feed InforamtionWeek - Hardware as it is too old.
Skipping article Moto X: Motorola's Not-So-Bold Rebirth (Mon, 22 Jul, 2013 10:55) from feed InforamtionWeek - Hardware as it is too old.
article.title is:  Google Nexus 7 Vs. iPad Mini: 6 Key Factors
article.title is:  Microsoft's Dilemma: Windows 8.1 May Not Be Enough
article.title is:  Gmail Tweaks: 5 Tips For Power Users
article.title is:  Inside Intel's Data Center Vision
article.title is:  NSA Utah Data Center Scrutiny: Off Target?
article.title is:  Oracle Retools Java For Internet Of Things
article.title is:  Feds Move To Open Source Databases Pressures Oracle
article.title is:  NASA Satellite Reveals New View Of Sun
article.title is:  Apple Dominates Consumer Brands Poll
article.title is:  Brave Tales From The SysAdmin Trenches
article.title is:  Google Nexus 7, Chromecast: Visual Tour
article.title is:  Hospitals Struggle With EHRs For Quality Reporting, AHA Says
article.title is:  Stanford University Network Hacked
article.title is:  Record-Setting Data Breach Highlights Corporate Security Risks
article.title is:  Samsung Leads Smartphone Market
article.title is:  IBM Mainframes Nipped, Tucked For Cloud Age
article.title is:  Social Business Needs Culture Of Open Leadership
article.title is:  Big Data Ushers In 'Virtuous Cycle Of Computing'
article.title is:  Lawmakers Grill Federal CIO On Data Center Figures
article.title is:  Take An Email Sabbatical: 5 Steps
article.title is:  5 Helpful Online Services From Uncle Sam
article.title is:  MS Looks To Move Windows 7 Users To IE 11
article.title is:  Is Scale-Out Storage A Must Have?
article.title is:  Cloud File Storage Fight: No Knockout Yet
article.title is:  Google Nexus 7 Vs. iPad Mini: 6 Key Factors
article.title is:  Microsoft's Dilemma: Windows 8.1 May Not Be Enough
article.title is:  Gmail Tweaks: 5 Tips For Power Users
article.title is:  Feds Move To Open Source Databases Pressures Oracle
article.title is:  NASA Satellite Reveals New View Of Sun
article.title is:  Apple Dominates Consumer Brands Poll
article.title is:  Brave Tales From The SysAdmin Trenches
article.title is:  Google Nexus 7, Chromecast: Visual Tour
article.title is:  Hospitals Struggle With EHRs For Quality Reporting, AHA Says
article.title is:  Stanford University Network Hacked
article.title is:  Record-Setting Data Breach Highlights Corporate Security Risks
article.title is:  Samsung Leads Smartphone Market
article.title is:  IBM Mainframes Nipped, Tucked For Cloud Age
article.title is:  Social Business Needs Culture Of Open Leadership
article.title is:  Big Data Ushers In 'Virtuous Cycle Of Computing'
article.title is:  Lawmakers Grill Federal CIO On Data Center Figures
article.title is:  Take An Email Sabbatical: 5 Steps
article.title is:  5 Helpful Online Services From Uncle Sam
article.title is:  MS Looks To Move Windows 7 Users To IE 11
article.title is:  Is Scale-Out Storage A Must Have?
article.title is:  Cloud File Storage Fight: No Knockout Yet
article.title is:  Apple Dominates Consumer Brands Poll
article.title is:  Samsung Leads Smartphone Market
article.title is:  Microsoft's Dilemma: Windows 8.1 May Not Be Enough
article.title is:  Gmail Tweaks: 5 Tips For Power Users
article.title is:  Feds Move To Open Source Databases Pressures Oracle
article.title is:  Google Nexus 7, Chromecast: Visual Tour
article.title is:  MS Looks To Move Windows 7 Users To IE 11
article.title is:  Google Nexus 7 Vs. iPad Mini: 6 Key Factors
article.title is:  Microsoft's Dilemma: Windows 8.1 May Not Be Enough
article.title is:  NSA Utah Data Center Scrutiny: Off Target?
article.title is:  Apple Dominates Consumer Brands Poll
article.title is:  Google Nexus 7, Chromecast: Visual Tour
article.title is:  IBM Mainframes Nipped, Tucked For Cloud Age
Removing duplicate article: Google Nexus 7 Vs. iPad Mini: 6 Key Factors from section: InformationWeek - News
Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InformationWeek - News
Removing duplicate article: Gmail Tweaks: 5 Tips For Power Users from section: InformationWeek - News
Removing duplicate article: Feds Move To Open Source Databases Pressures Oracle from section: InformationWeek - News
Removing duplicate article: NASA Satellite Reveals New View Of Sun from section: InformationWeek - News
Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InformationWeek - News
Removing duplicate article: Brave Tales From The SysAdmin Trenches from section: InformationWeek - News
Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InformationWeek - News
Removing duplicate article: Stanford University Network Hacked from section: InformationWeek - News
Removing duplicate article: Record-Setting Data Breach Highlights Corporate Security Risks from section: InformationWeek - News
Removing duplicate article: Samsung Leads Smartphone Market from section: InformationWeek - News
Removing duplicate article: IBM Mainframes Nipped, Tucked For Cloud Age from section: InformationWeek - News
Removing duplicate article: Social Business Needs Culture Of Open Leadership from section: InformationWeek - News
Removing duplicate article: Big Data Ushers In 'Virtuous Cycle Of Computing' from section: InformationWeek - News
Removing duplicate article: Big Data Ushers In 'Virtuous Cycle Of Computing' from section: InformationWeek - News
Removing duplicate article: Lawmakers Grill Federal CIO On Data Center Figures from section: InformationWeek - News
Removing duplicate article: Take An Email Sabbatical: 5 Steps from section: InformationWeek - News
Removing duplicate article: 5 Helpful Online Services From Uncle Sam from section: InformationWeek - News
Removing duplicate article: MS Looks To Move Windows 7 Users To IE 11 from section: InformationWeek - News
Removing duplicate article: Is Scale-Out Storage A Must Have? from section: InformationWeek - News
Removing duplicate article: Cloud File Storage Fight: No Knockout Yet from section: InformationWeek - News
Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InformationWeek - Personal Tech
Removing duplicate article: Samsung Leads Smartphone Market from section: InformationWeek - Personal Tech
Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InformationWeek - Software
Removing duplicate article: Gmail Tweaks: 5 Tips For Power Users from section: InformationWeek - Software
Removing duplicate article: Feds Move To Open Source Databases Pressures Oracle from section: InformationWeek - Software
Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InformationWeek - Software
Removing duplicate article: MS Looks To Move Windows 7 Users To IE 11 from section: InformationWeek - Software
Removing duplicate article: Google Nexus 7 Vs. iPad Mini: 6 Key Factors from section: InforamtionWeek - Hardware
Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InforamtionWeek - Hardware
Removing duplicate article: NSA Utah Data Center Scrutiny: Off Target? from section: InforamtionWeek - Hardware
Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InforamtionWeek - Hardware
Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InforamtionWeek - Hardware
Removing duplicate article: IBM Mainframes Nipped, Tucked For Cloud Age from section: InforamtionWeek - Hardware
Synthesizing mastheadImage
Failed to find print version for: http://www.informationweek.com/hardware/handheld/google-nexus-7-vs-ipad-mini-6-key-factor/240159052?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/windows/operating-systems/microsofts-dilemma-windows-81-may-not-be/240159040?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/social-business/email/gmail-tweaks-5-tips-for-power-users/240159033?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/quickview/inside-intels-data-center-vision/3937?cid=RSSfeed_IWK_ALL&wc=4
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/quickview/nsa-utah-data-center-scrutiny-off-target/3936?cid=RSSfeed_IWK_ALL&wc=4
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/quickview/oracle-retools-java-for-internet-of-thin/3935?cid=RSSfeed_IWK_ALL&wc=4
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/government/enterprise-applications/feds-move-to-open-source-databases-press/240159014?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/government/information-management/nasa-satellite-reveals-new-view-of-sun/240159017?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/hardware/handheld/apple-dominates-consumer-brands-poll/240159008?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/global-cio/interviews/brave-tales-from-the-sysadmin-trenches/240158967?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/hardware/handheld/google-nexus-7-chromecast-visual-tour/240158973?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/education/security/stanford-university-network-hacked/240158977?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/security/vulnerabilities/record-setting-data-breach-highlights-co/240158986?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/mobility/smart-phones/samsung-leads-smartphone-market/240159007?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/global-cio/interviews/ibm-mainframes-nipped-tucked-for-cloud-a/240158991?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/social-business/strategy/social-business-needs-culture-of-open-le/240158737?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/big-data/news/big-data-analytics/big-data-ushers-in-virtuous-cycle-of-computing/240158963
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: need more than 0 values to unpack

Failed to find print version for: http://www.informationweek.com/government/policy/lawmakers-grill-federal-cio-on-data-cent/240158975?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/global-cio/interviews/take-an-email-sabbatical-5-steps/240158957?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/government/information-management/5-helpful-online-services-from-uncle-sam/240158897?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/windows/microsoft-news/ms-looks-to-move-windows-7-users-to-ie-1/240158971?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/storage/systems/is-scale-out-storage-a-must-have/240158786?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Failed to find print version for: http://www.informationweek.com/storage/data-protection/cloud-file-storage-fight-no-knockout-yet/240158749?cid=RSSfeed_IWK_ALL
Traceback (most recent call last):
  File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index
  File "<string>", line 33, in print_version
ValueError: too many values to unpack

Parsing all content...
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Reading TOC from NCX...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 3 items of level: div_1
Found 2 items of level: div_2
Found 2 items of level: p_2
Ignoring level p_2
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
div_2  left margin stats: Counter()
div_2  right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'index.html': u'index_u1.html'}
Rescaling image from 590x750 to 485x616 cover.jpg
Rescaling image from 600x60 to 501x50 mastheadImage.jpg
Splitting markup on page breaks and flow limits, if any...
	Looking for large trees in feed_0/index.html...
	No large trees found
	Looking for large trees in index_u1.html...
	No large trees found
This EPUB file has no Table of Contents. Creating a default TOC
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to G:\Users\Camper\AppData\Local\Temp\calibre_2ou67n\7fqtxh_recipe_out.epub
here's the updated recipe:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class InformationWeek(BasicNewsRecipe):
    title          = u'InformationWeek'
    oldest_article = 3
    max_articles_per_feed = 150
    auto_cleanup = True
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True
    remove_javascript = True
    use_embedded_content   = False


    feeds          = [
                          (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'),
                          (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'),
                          (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'),
                          (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'),
	      (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware')
                     ]

    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'healthcare' in article.title or 'healthcare' in article.url:
            feed.articles.remove(article)
      return feeds

    def print_version(self, url):
          main, sep, unneeded = url.rpartition('?')[0]
          return main + '?printer_friendly=this-page'

Last edited by Camper65; 07-28-2013 at 09:58 PM.
Camper65 is offline   Reply With Quote
Old 07-28-2013, 10:39 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I didn't notice you were unpacking the results of rpartition anyway, in which case it is not needed.
kovidgoyal is offline   Reply With Quote
Old 07-28-2013, 11:17 PM   #5
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
I tried to go to the article using the Print link and it seems that it's resetting itself to the main article. Is there a way of having it click on the href="?printer_friendly=this-page" link so that it can then download that?
Camper65 is offline   Reply With Quote
Advert
Old 07-29-2013, 03:35 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If the print version uses javascript, then no. You can do it using the JavascriptRecipe class, see the builtin time.com recipe for an example of using the JavascriptRecipe class.

It might be easier to just use the normal pages and have calibre follow the links to the subsequent parts.
kovidgoyal is offline   Reply With Quote
Old 09-12-2013, 11:35 PM   #7
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
Okay I'm finally getting back to fixing this (had to build a new tower, one of my notebooks started to constantly crash on me). I'm trying to have it pull multipage articles by searching and using the pgno=# area (just part of a long article page)

Code:
<div class="article-pagination">
<strong>
	<a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2">
		Page 2:&nbsp;BlackBerry Isn't Immune	</a>
</strong>
<img hspace="0" height="5" border="0" width="10" vspace="0" src="http://i.cmpnet.com/infoweek/spacer.gif">
<br/>
<div class="controls">
	<strong>&nbsp;1&nbsp;|&nbsp;</strong><a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2">2</a>&nbsp;&nbsp;|<a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2"> Next Page »</a>&nbsp;</div>
</div>
</article>
this is what I've come up with but it still only gets page one of multi-page articles.

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.web.feeds import Feed

class InformationWeek(BasicNewsRecipe):
    title          = u'InformationWeek'
    oldest_article = 3
    max_articles_per_feed = 150
    auto_cleanup = True
    ignore_duplicate_articles = {'title', 'url'}
    remove_empty_feeds = True
    remove_javascript = True
    use_embedded_content   = False
    recursions = 1
    match_regexps = [r'\?pgno=\d+$']

    preprocess_regexps = [
        (re.compile(r'<!-- End SiteCatalyst code version: H.16 -->.*</body>', re.DOTALL), lambda match: '</body>')
    ]

    feeds          = [
                          (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'),
                          (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'),
                          (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'),
                          (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'),
              (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware')
                     ]

    def parse_feeds (self): 
      feeds = BasicNewsRecipe.parse_feeds(self) 
      for feed in feeds:
        for article in feed.articles[:]:
          print 'article.title is: ', article.title
          if 'healthcare' in article.title or 'healthcare' in article.url:
            feed.articles.remove(article)
      return feeds

    def append_page(self, soup, appendtag, position):
        pager = soup.find('div', attrs={'class':'article-pagination'})
        if pager:
          nextpage = soup.find('a', attrs={'class':'contentgating_article'})
          if nextpage:
              nexturl = nextpage['href']
              soup2 = self.index_to_soup(nexturl)
              texttag = soup2.find('div', attrs={'class':'article-v2'})
              for it in texttag.findAll(style=True):
                  del it['style']
              newpos = len(texttag.contents)
              self.append_page(soup2,texttag,newpos)
              texttag.extract()
              pager.extract()
              appendtag.insert(position,texttag)
			  
			  
			  
	remove_tags_before = dict(name='article', id=lambda x:not x)
Can someone help me fix this up, thanks.
Camper65 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can't get print_version to do anything MikeBlyth Recipes 12 01-17-2012 05:42 AM
How to use print_version to get the print page of Starson17 Recipes 0 06-15-2011 12:05 PM
Using print_version for custom news source sexymax15 Recipes 0 06-15-2011 10:53 AM
SacBee print_version syntax thczv Recipes 6 04-12-2011 09:38 AM
Using PubDate in print_version of custom news source mobilereader72 Calibre 4 05-30-2009 05:52 PM


All times are GMT -4. The time now is 05:50 AM.


MobileRead.com is a privately owned, operated and funded community.