07-27-2013, 08:36 PM | #1 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
|
Problem getting print_version to be pulled
I'm working on fixing my InformationWeek recipe. It gets the regular page articles (and if more than one page, only the first page). I had it set to actually try to pull the print version (which is the full article) but it's still not getting the print version.
Here is the recipe Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds import Feed class InformationWeek(BasicNewsRecipe): title = u'InformationWeek' oldest_article = 3 max_articles_per_feed = 150 auto_cleanup = True ignore_duplicate_articles = {'title', 'url'} remove_empty_feeds = True remove_javascript = True use_embedded_content = False feeds = [ (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'), (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'), (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'), (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'), (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware') ] def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'article.title is: ', article.title if 'healthcare' in article.title or 'healthcare' in article.url: feed.articles.remove(article) return feeds def print_version(self, url): main, sep, unneeded = url.rpartition('?') return main + '?printer_friendly=this-page' http://www.informationweek.com/socia...SSfeed_IWK_ALL and here is the printer version URL http://www.informationweek.com/socia...ndly=this-page I presently have the recipe remove the last bit (which changes based on which area it comes from) and put in ?printer_friendly=this-page but it's still failing to download the printer version of the article. Any ideas? |
07-27-2013, 11:29 PM | #2 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
url.rpartition('?')[0]
|
Advert | |
|
07-28-2013, 09:54 PM | #3 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
|
Tried that, this is the resulting job details for it.
it only gives me a section title after the cover page and title page for the first section and then a title page for what should be the first article but nothing but InformationWeek - Stories InformationWeek on the page. Code:
Fetch news from InformationWeek Resolved conversion options calibre version: 0.9.40 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_download_recipe': False, 'dont_split_on_page_breaks': True, 'duplicate_links_in_toc': False, 'embed_all_fonts': False, 'embed_font_family': None, 'enable_heuristics': False, 'epub_flatten': False, 'epub_inline_toc': False, 'epub_toc_at_end': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'flow_size': 260, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x034AEB70>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'no_chapters_in_toc': False, 'no_default_epub_cover': False, 'no_inline_navbars': False, 'no_svg_cover': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x034AEEB0>, 'page_breaks_before': None, 'prefer_metadata_cover': False, 'preserve_cover_aspect_ratio': False, 'pretty_print': True, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'search_replace': None, 'series': None, 'series_index': None, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'start_reading_at': None, 'subset_embedded_fonts': False, 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'verbose': 2} InputFormatPlugin: Recipe Input running Using custom recipe Skipping article 'Largest-Ever' Data Breach Scheme Uncovered, Feds Say (Thu, 25 Jul, 2013 19:30) from feed InformationWeek - Stories as it is too old. Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InformationWeek - News as it is too old. Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InformationWeek - News as it is too old. Skipping article LinkedIn Sponsored Updates: 4 Things To Know (Thu, 25 Jul, 2013 13:58) from feed InformationWeek - News as it is too old. Skipping article Rosetta Stone Moves Deeper Into Education Tech (Thu, 25 Jul, 2013 12:39) from feed InformationWeek - News as it is too old. Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InformationWeek - Personal Tech as it is too old. Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InformationWeek - Personal Tech as it is too old. Skipping article LinkedIn Sponsored Updates: 4 Things To Know (Thu, 25 Jul, 2013 13:58) from feed InformationWeek - Personal Tech as it is too old. Skipping article Rosetta Stone Moves Deeper Into Education Tech (Thu, 25 Jul, 2013 12:39) from feed InformationWeek - Personal Tech as it is too old. Skipping article Android 4.3's Best Features (Thu, 25 Jul, 2013 11:42) from feed InformationWeek - Personal Tech as it is too old. Skipping article Apple, Samsung Lead Q2 Smartphone Sales (Wed, 24 Jul, 2013 11:30) from feed InformationWeek - Personal Tech as it is too old. Skipping article Facebook, LinkedIn Rank Like Airlines On User Satisfaction (Tue, 23 Jul, 2013 14:46) from feed InformationWeek - Personal Tech as it is too old. Skipping article Verizon Intros Trio Of Motorola Droids (Tue, 23 Jul, 2013 14:31) from feed InformationWeek - Personal Tech as it is too old. Skipping article Nokia Brings Big Screen To Lumia Line (Tue, 23 Jul, 2013 11:14) from feed InformationWeek - Personal Tech as it is too old. Skipping article Moto X: Motorola's Not-So-Bold Rebirth (Mon, 22 Jul, 2013 10:55) from feed InformationWeek - Personal Tech as it is too old. Skipping article Apple Tinkers With Larger iPhones, iPads (Mon, 22 Jul, 2013 09:50) from feed InformationWeek - Personal Tech as it is too old. Skipping article Smartphone Plans: 5 Ways To Avoid Trouble (Sat, 20 Jul, 2013 09:06) from feed InformationWeek - Personal Tech as it is too old. Skipping article HTC Unveils Miniature One Smartphone (Thu, 18 Jul, 2013 11:50) from feed InformationWeek - Personal Tech as it is too old. Skipping article Google Second-Gen Nexus 7 Debut Tipped (Thu, 18 Jul, 2013 10:47) from feed InformationWeek - Personal Tech as it is too old. Skipping article Yahoo's Year In Mobile: Progress Report (Wed, 17 Jul, 2013 13:22) from feed InformationWeek - Personal Tech as it is too old. Skipping article Next iPhone May Have Bigger Screen (Wed, 17 Jul, 2013 11:30) from feed InformationWeek - Personal Tech as it is too old. Skipping article Aetna CarePass Combines Mobile Health App Data (Wed, 17 Jul, 2013 10:50) from feed InformationWeek - Personal Tech as it is too old. Skipping article AT&T Challenges T-Mobile Uncarrier Strategy (Tue, 16 Jul, 2013 12:15) from feed InformationWeek - Personal Tech as it is too old. Skipping article BlackBerry A10 Details Leak; Z10 Price Drops (Tue, 16 Jul, 2013 11:40) from feed InformationWeek - Personal Tech as it is too old. Skipping article Apple Investigating iPhone Death In China (Mon, 15 Jul, 2013 17:34) from feed InformationWeek - Personal Tech as it is too old. Skipping article Mobile App Update Bonanza (Sat, 13 Jul, 2013 09:06) from feed InformationWeek - Personal Tech as it is too old. Skipping article Windows Phone Scores More Key Apps (Fri, 12 Jul, 2013 11:20) from feed InformationWeek - Personal Tech as it is too old. Skipping article Samsung Challenges Apple 'Bounce-Back' Patent Verdict (Fri, 12 Jul, 2013 10:02) from feed InformationWeek - Personal Tech as it is too old. Skipping article Google Nexus 7 Heats Up Mini-Tablet Battle (Thu, 25 Jul, 2013 12:18) from feed InformationWeek - Software as it is too old. Skipping article Windows XP's End Of Life: Readers Respond (Thu, 25 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old. Skipping article Windows Phone Users Wait Impatiently For Updates (Thu, 25 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old. Skipping article Cloudera Brings Role-Based Security To Hadoop (Wed, 24 Jul, 2013 13:33) from feed InformationWeek - Software as it is too old. Skipping article Network Solutions Knocked Down Again (Wed, 24 Jul, 2013 12:35) from feed InformationWeek - Software as it is too old. Skipping article DataStax Stalks Oracle With Cassandra Upgrades (Wed, 24 Jul, 2013 10:22) from feed InformationWeek - Software as it is too old. Skipping article Data Science Certification Program Emerges (Tue, 23 Jul, 2013 11:49) from feed InformationWeek - Software as it is too old. Skipping article What Cisco Gains From Sourcefire (Tue, 23 Jul, 2013 11:07) from feed InformationWeek - Software as it is too old. Skipping article Oracle Says Utilities Botch Smart Meter Data Analysis (Tue, 23 Jul, 2013 10:57) from feed InformationWeek - Software as it is too old. Skipping article In-Q-Tel Backs Open Source Mapping Company (Tue, 23 Jul, 2013 10:26) from feed InformationWeek - Software as it is too old. Skipping article 3 End Of Windows XP Upgrade Headaches (Tue, 23 Jul, 2013 09:58) from feed InformationWeek - Software as it is too old. Skipping article Salesforce Improves Chatter Mobile Apps (Tue, 23 Jul, 2013 09:01) from feed InformationWeek - Software as it is too old. Skipping article NASCAR Team Drives Dell Windows 8 Tablets (Mon, 22 Jul, 2013 15:23) from feed InformationWeek - Software as it is too old. Skipping article Facebook Scores 100M Mobile Users In Emerging Markets (Mon, 22 Jul, 2013 13:05) from feed InformationWeek - Software as it is too old. Skipping article Tableau Takes Data Visualization Online (Mon, 22 Jul, 2013 08:30) from feed InformationWeek - Software as it is too old. Skipping article SAP Co-CEO Snabe Plans To Step Down (Mon, 22 Jul, 2013 08:25) from feed InformationWeek - Software as it is too old. Skipping article Microsoft's Struggles Grow: 9 Key Points (Fri, 19 Jul, 2013 10:59) from feed InformationWeek - Software as it is too old. Skipping article Intel's Next Hope: $300 Windows 8 Devices (Fri, 19 Jul, 2013 09:06) from feed InformationWeek - Software as it is too old. Skipping article Google Chrome For iOS Promises Data Cost Savings (Thu, 18 Jul, 2013 15:36) from feed InformationWeek - Software as it is too old. Skipping article SAP Sees Cloud Growth Accelerating (Thu, 18 Jul, 2013 12:35) from feed InformationWeek - Software as it is too old. Skipping article 5 Wishes For Samsung Nexus 10 (Thu, 25 Jul, 2013 14:46) from feed InforamtionWeek - Hardware as it is too old. Skipping article Can Michael Dell's New Offer Sway Investors? (Thu, 25 Jul, 2013 14:05) from feed InforamtionWeek - Hardware as it is too old. Skipping article Google Nexus 7 Heats Up Mini-Tablet Battle (Thu, 25 Jul, 2013 12:18) from feed InforamtionWeek - Hardware as it is too old. Skipping article 10 Hidden iPhone Tips, Tricks (Thu, 25 Jul, 2013 11:06) from feed InforamtionWeek - Hardware as it is too old. Skipping article Google Takes On Apple With Chromecast, Android 4.3 (Thu, 25 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old. Skipping article Google Nexus 7: Small Tablet To Beat (Wed, 24 Jul, 2013 14:04) from feed InforamtionWeek - Hardware as it is too old. Skipping article Intel's Plan To Dominate Data Centers (Wed, 24 Jul, 2013 11:35) from feed InforamtionWeek - Hardware as it is too old. Skipping article Apple, Samsung Lead Q2 Smartphone Sales (Wed, 24 Jul, 2013 11:30) from feed InforamtionWeek - Hardware as it is too old. Skipping article 10 Tablet Battery Tips: More Power (Wed, 24 Jul, 2013 11:06) from feed InforamtionWeek - Hardware as it is too old. Skipping article Apple Sets iPhone Sales Record For Quarter (Wed, 24 Jul, 2013 09:47) from feed InforamtionWeek - Hardware as it is too old. Skipping article OpenStack Grows Up And China Notices (Wed, 24 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old. Skipping article CIO Profile: William H. Miller, Jr., Of Broadcom (Wed, 24 Jul, 2013 09:06) from feed InforamtionWeek - Hardware as it is too old. Skipping article Verizon Intros Trio Of Motorola Droids (Tue, 23 Jul, 2013 14:31) from feed InforamtionWeek - Hardware as it is too old. Skipping article FBI To Overhaul Printer Network (Tue, 23 Jul, 2013 12:46) from feed InforamtionWeek - Hardware as it is too old. Skipping article Nokia Brings Big Screen To Lumia Line (Tue, 23 Jul, 2013 11:14) from feed InforamtionWeek - Hardware as it is too old. Skipping article NASCAR Team Drives Dell Windows 8 Tablets (Mon, 22 Jul, 2013 15:23) from feed InforamtionWeek - Hardware as it is too old. Skipping article Facebook Scores 100M Mobile Users In Emerging Markets (Mon, 22 Jul, 2013 13:05) from feed InforamtionWeek - Hardware as it is too old. Skipping article Common Core Meets Aging Education Technology (Mon, 22 Jul, 2013 12:16) from feed InforamtionWeek - Hardware as it is too old. Skipping article Moto X: Motorola's Not-So-Bold Rebirth (Mon, 22 Jul, 2013 10:55) from feed InforamtionWeek - Hardware as it is too old. article.title is: Google Nexus 7 Vs. iPad Mini: 6 Key Factors article.title is: Microsoft's Dilemma: Windows 8.1 May Not Be Enough article.title is: Gmail Tweaks: 5 Tips For Power Users article.title is: Inside Intel's Data Center Vision article.title is: NSA Utah Data Center Scrutiny: Off Target? article.title is: Oracle Retools Java For Internet Of Things article.title is: Feds Move To Open Source Databases Pressures Oracle article.title is: NASA Satellite Reveals New View Of Sun article.title is: Apple Dominates Consumer Brands Poll article.title is: Brave Tales From The SysAdmin Trenches article.title is: Google Nexus 7, Chromecast: Visual Tour article.title is: Hospitals Struggle With EHRs For Quality Reporting, AHA Says article.title is: Stanford University Network Hacked article.title is: Record-Setting Data Breach Highlights Corporate Security Risks article.title is: Samsung Leads Smartphone Market article.title is: IBM Mainframes Nipped, Tucked For Cloud Age article.title is: Social Business Needs Culture Of Open Leadership article.title is: Big Data Ushers In 'Virtuous Cycle Of Computing' article.title is: Lawmakers Grill Federal CIO On Data Center Figures article.title is: Take An Email Sabbatical: 5 Steps article.title is: 5 Helpful Online Services From Uncle Sam article.title is: MS Looks To Move Windows 7 Users To IE 11 article.title is: Is Scale-Out Storage A Must Have? article.title is: Cloud File Storage Fight: No Knockout Yet article.title is: Google Nexus 7 Vs. iPad Mini: 6 Key Factors article.title is: Microsoft's Dilemma: Windows 8.1 May Not Be Enough article.title is: Gmail Tweaks: 5 Tips For Power Users article.title is: Feds Move To Open Source Databases Pressures Oracle article.title is: NASA Satellite Reveals New View Of Sun article.title is: Apple Dominates Consumer Brands Poll article.title is: Brave Tales From The SysAdmin Trenches article.title is: Google Nexus 7, Chromecast: Visual Tour article.title is: Hospitals Struggle With EHRs For Quality Reporting, AHA Says article.title is: Stanford University Network Hacked article.title is: Record-Setting Data Breach Highlights Corporate Security Risks article.title is: Samsung Leads Smartphone Market article.title is: IBM Mainframes Nipped, Tucked For Cloud Age article.title is: Social Business Needs Culture Of Open Leadership article.title is: Big Data Ushers In 'Virtuous Cycle Of Computing' article.title is: Lawmakers Grill Federal CIO On Data Center Figures article.title is: Take An Email Sabbatical: 5 Steps article.title is: 5 Helpful Online Services From Uncle Sam article.title is: MS Looks To Move Windows 7 Users To IE 11 article.title is: Is Scale-Out Storage A Must Have? article.title is: Cloud File Storage Fight: No Knockout Yet article.title is: Apple Dominates Consumer Brands Poll article.title is: Samsung Leads Smartphone Market article.title is: Microsoft's Dilemma: Windows 8.1 May Not Be Enough article.title is: Gmail Tweaks: 5 Tips For Power Users article.title is: Feds Move To Open Source Databases Pressures Oracle article.title is: Google Nexus 7, Chromecast: Visual Tour article.title is: MS Looks To Move Windows 7 Users To IE 11 article.title is: Google Nexus 7 Vs. iPad Mini: 6 Key Factors article.title is: Microsoft's Dilemma: Windows 8.1 May Not Be Enough article.title is: NSA Utah Data Center Scrutiny: Off Target? article.title is: Apple Dominates Consumer Brands Poll article.title is: Google Nexus 7, Chromecast: Visual Tour article.title is: IBM Mainframes Nipped, Tucked For Cloud Age Removing duplicate article: Google Nexus 7 Vs. iPad Mini: 6 Key Factors from section: InformationWeek - News Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InformationWeek - News Removing duplicate article: Gmail Tweaks: 5 Tips For Power Users from section: InformationWeek - News Removing duplicate article: Feds Move To Open Source Databases Pressures Oracle from section: InformationWeek - News Removing duplicate article: NASA Satellite Reveals New View Of Sun from section: InformationWeek - News Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InformationWeek - News Removing duplicate article: Brave Tales From The SysAdmin Trenches from section: InformationWeek - News Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InformationWeek - News Removing duplicate article: Stanford University Network Hacked from section: InformationWeek - News Removing duplicate article: Record-Setting Data Breach Highlights Corporate Security Risks from section: InformationWeek - News Removing duplicate article: Samsung Leads Smartphone Market from section: InformationWeek - News Removing duplicate article: IBM Mainframes Nipped, Tucked For Cloud Age from section: InformationWeek - News Removing duplicate article: Social Business Needs Culture Of Open Leadership from section: InformationWeek - News Removing duplicate article: Big Data Ushers In 'Virtuous Cycle Of Computing' from section: InformationWeek - News Removing duplicate article: Big Data Ushers In 'Virtuous Cycle Of Computing' from section: InformationWeek - News Removing duplicate article: Lawmakers Grill Federal CIO On Data Center Figures from section: InformationWeek - News Removing duplicate article: Take An Email Sabbatical: 5 Steps from section: InformationWeek - News Removing duplicate article: 5 Helpful Online Services From Uncle Sam from section: InformationWeek - News Removing duplicate article: MS Looks To Move Windows 7 Users To IE 11 from section: InformationWeek - News Removing duplicate article: Is Scale-Out Storage A Must Have? from section: InformationWeek - News Removing duplicate article: Cloud File Storage Fight: No Knockout Yet from section: InformationWeek - News Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InformationWeek - Personal Tech Removing duplicate article: Samsung Leads Smartphone Market from section: InformationWeek - Personal Tech Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InformationWeek - Software Removing duplicate article: Gmail Tweaks: 5 Tips For Power Users from section: InformationWeek - Software Removing duplicate article: Feds Move To Open Source Databases Pressures Oracle from section: InformationWeek - Software Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InformationWeek - Software Removing duplicate article: MS Looks To Move Windows 7 Users To IE 11 from section: InformationWeek - Software Removing duplicate article: Google Nexus 7 Vs. iPad Mini: 6 Key Factors from section: InforamtionWeek - Hardware Removing duplicate article: Microsoft's Dilemma: Windows 8.1 May Not Be Enough from section: InforamtionWeek - Hardware Removing duplicate article: NSA Utah Data Center Scrutiny: Off Target? from section: InforamtionWeek - Hardware Removing duplicate article: Apple Dominates Consumer Brands Poll from section: InforamtionWeek - Hardware Removing duplicate article: Google Nexus 7, Chromecast: Visual Tour from section: InforamtionWeek - Hardware Removing duplicate article: IBM Mainframes Nipped, Tucked For Cloud Age from section: InforamtionWeek - Hardware Synthesizing mastheadImage Failed to find print version for: http://www.informationweek.com/hardware/handheld/google-nexus-7-vs-ipad-mini-6-key-factor/240159052?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/windows/operating-systems/microsofts-dilemma-windows-81-may-not-be/240159040?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/social-business/email/gmail-tweaks-5-tips-for-power-users/240159033?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/quickview/inside-intels-data-center-vision/3937?cid=RSSfeed_IWK_ALL&wc=4 Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/quickview/nsa-utah-data-center-scrutiny-off-target/3936?cid=RSSfeed_IWK_ALL&wc=4 Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/quickview/oracle-retools-java-for-internet-of-thin/3935?cid=RSSfeed_IWK_ALL&wc=4 Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/government/enterprise-applications/feds-move-to-open-source-databases-press/240159014?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/government/information-management/nasa-satellite-reveals-new-view-of-sun/240159017?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/hardware/handheld/apple-dominates-consumer-brands-poll/240159008?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/global-cio/interviews/brave-tales-from-the-sysadmin-trenches/240158967?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/hardware/handheld/google-nexus-7-chromecast-visual-tour/240158973?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/education/security/stanford-university-network-hacked/240158977?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/security/vulnerabilities/record-setting-data-breach-highlights-co/240158986?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/mobility/smart-phones/samsung-leads-smartphone-market/240159007?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/global-cio/interviews/ibm-mainframes-nipped-tucked-for-cloud-a/240158991?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/social-business/strategy/social-business-needs-culture-of-open-le/240158737?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/big-data/news/big-data-analytics/big-data-ushers-in-virtuous-cycle-of-computing/240158963 Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: need more than 0 values to unpack Failed to find print version for: http://www.informationweek.com/government/policy/lawmakers-grill-federal-cio-on-data-cent/240158975?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/global-cio/interviews/take-an-email-sabbatical-5-steps/240158957?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/government/information-management/5-helpful-online-services-from-uncle-sam/240158897?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/windows/microsoft-news/ms-looks-to-move-windows-7-users-to-ie-1/240158971?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/storage/systems/is-scale-out-storage-a-must-have/240158786?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Failed to find print version for: http://www.informationweek.com/storage/data-protection/cloud-file-storage-fight-no-knockout-yet/240158749?cid=RSSfeed_IWK_ALL Traceback (most recent call last): File "site-packages\calibre\web\feeds\news.py", line 1195, in build_index File "<string>", line 33, in print_version ValueError: too many values to unpack Parsing all content... Parsing index.html ... Forcing index.html into XHTML namespace Parsing feed_0/index.html ... Initial parse failed, using more forgiving parsers Parsing feed_0/index.html as HTML Reading TOC from NCX... Merging user specified metadata... Detecting structure... Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Found 3 items of level: div_1 Found 2 items of level: div_2 Found 2 items of level: p_2 Ignoring level p_2 div_1 left margin stats: Counter() div_1 right margin stats: Counter() div_2 left margin stats: Counter() div_2 right margin stats: Counter() Cleaning up manifest... Trimming unused files from manifest... Creating EPUB Output... Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza... {u'index.html': u'index_u1.html'} Rescaling image from 590x750 to 485x616 cover.jpg Rescaling image from 600x60 to 501x50 mastheadImage.jpg Splitting markup on page breaks and flow limits, if any... Looking for large trees in feed_0/index.html... No large trees found Looking for large trees in index_u1.html... No large trees found This EPUB file has no Table of Contents. Creating a default TOC The cover image has an id != "cover". Renaming to work around bug in Nook Color EPUB output written to G:\Users\Camper\AppData\Local\Temp\calibre_2ou67n\7fqtxh_recipe_out.epub Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds import Feed class InformationWeek(BasicNewsRecipe): title = u'InformationWeek' oldest_article = 3 max_articles_per_feed = 150 auto_cleanup = True ignore_duplicate_articles = {'title', 'url'} remove_empty_feeds = True remove_javascript = True use_embedded_content = False feeds = [ (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'), (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'), (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'), (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'), (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware') ] def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'article.title is: ', article.title if 'healthcare' in article.title or 'healthcare' in article.url: feed.articles.remove(article) return feeds def print_version(self, url): main, sep, unneeded = url.rpartition('?')[0] return main + '?printer_friendly=this-page' Last edited by Camper65; 07-28-2013 at 09:58 PM. |
07-28-2013, 10:39 PM | #4 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I didn't notice you were unpacking the results of rpartition anyway, in which case it is not needed.
|
07-28-2013, 11:17 PM | #5 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
|
I tried to go to the article using the Print link and it seems that it's resetting itself to the main article. Is there a way of having it click on the href="?printer_friendly=this-page" link so that it can then download that?
|
Advert | |
|
07-29-2013, 03:35 AM | #6 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If the print version uses javascript, then no. You can do it using the JavascriptRecipe class, see the builtin time.com recipe for an example of using the JavascriptRecipe class.
It might be easier to just use the normal pages and have calibre follow the links to the subsequent parts. |
09-12-2013, 11:35 PM | #7 |
Enthusiast
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
|
Okay I'm finally getting back to fixing this (had to build a new tower, one of my notebooks started to constantly crash on me). I'm trying to have it pull multipage articles by searching and using the pgno=# area (just part of a long article page)
Code:
<div class="article-pagination"> <strong> <a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2"> Page 2: BlackBerry Isn't Immune </a> </strong> <img hspace="0" height="5" border="0" width="10" vspace="0" src="http://i.cmpnet.com/infoweek/spacer.gif"> <br/> <div class="controls"> <strong> 1 | </strong><a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2">2</a> |<a class="contentgating_article" href="/security/privacy/nsa-vs-your-smartphone-5-facts/240161133?pgno=2"> Next Page »</a> </div> </div> </article> Code:
from calibre.web.feeds.news import BasicNewsRecipe from calibre.web.feeds import Feed class InformationWeek(BasicNewsRecipe): title = u'InformationWeek' oldest_article = 3 max_articles_per_feed = 150 auto_cleanup = True ignore_duplicate_articles = {'title', 'url'} remove_empty_feeds = True remove_javascript = True use_embedded_content = False recursions = 1 match_regexps = [r'\?pgno=\d+$'] preprocess_regexps = [ (re.compile(r'<!-- End SiteCatalyst code version: H.16 -->.*</body>', re.DOTALL), lambda match: '</body>') ] feeds = [ (u'InformationWeek - Stories', u'http://www.informationweek.com/rss/pheedo/all_story_blog.xml?cid=RSSfeed_IWK_ALL'), (u'InformationWeek - News', u'http://www.informationweek.com/rss/pheedo/news.xml?cid=RSSfeed_IWK_News'), (u'InformationWeek - Personal Tech', u'http://www.informationweek.com/rss/pheedo/personaltech.xml?cid=RSSfeed_IWK_Personal_Tech'), (u'InformationWeek - Software', u'http://www.informationweek.com/rss/pheedo/software.xml?cid=RSSfeed_IWK_Software'), (u'InforamtionWeek - Hardware', u'http://www.informationweek.com/rss/pheedo/hardware.xml?cid=RSSfeed_IWK_Hardware') ] def parse_feeds (self): feeds = BasicNewsRecipe.parse_feeds(self) for feed in feeds: for article in feed.articles[:]: print 'article.title is: ', article.title if 'healthcare' in article.title or 'healthcare' in article.url: feed.articles.remove(article) return feeds def append_page(self, soup, appendtag, position): pager = soup.find('div', attrs={'class':'article-pagination'}) if pager: nextpage = soup.find('a', attrs={'class':'contentgating_article'}) if nextpage: nexturl = nextpage['href'] soup2 = self.index_to_soup(nexturl) texttag = soup2.find('div', attrs={'class':'article-v2'}) for it in texttag.findAll(style=True): del it['style'] newpos = len(texttag.contents) self.append_page(soup2,texttag,newpos) texttag.extract() pager.extract() appendtag.insert(position,texttag) remove_tags_before = dict(name='article', id=lambda x:not x) |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Can't get print_version to do anything | MikeBlyth | Recipes | 12 | 01-17-2012 05:42 AM |
How to use print_version to get the print page of | Starson17 | Recipes | 0 | 06-15-2011 12:05 PM |
Using print_version for custom news source | sexymax15 | Recipes | 0 | 06-15-2011 10:53 AM |
SacBee print_version syntax | thczv | Recipes | 6 | 04-12-2011 09:38 AM |
Using PubDate in print_version of custom news source | mobilereader72 | Calibre | 4 | 05-30-2009 05:52 PM |