Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-19-2011, 03:24 PM   #1
stanleti
Junior Member
stanleti began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Apr 2011
Device: Sony PRS-505
Recipe gets headings but not full articles

I am attempting to write a recipe to retrieve the full issue of the Evangelical Missions Quarterly. However, I am only able to download the headings and a brief description, but not the full articles. It seems that the recipe is not able to access the subscriber content even though I have entered a correct username and password.

Recipe:


Spoiler:
class AdvancedUserRecipe1303233128(BasicNewsRecipe):
title = u'EMQ'
oldest_article = 25
max_articles_per_feed = 500
feeds = [(u'EMQ', u'https://www.emisdirect.com/rss')]
needs_subscription = True
use_embedded_content = True

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('https://www.emisdirect.com/login')
br.select_form(nr=0)
br['username'] = self.username
br['password'] = self.password
br.submit()
return br


Here is the output of
Code:
ebook-convert emq.recipe .epub --test -vv --debug-pipeline debug --username xxxxxxxx --password xxxxxxxx
Spoiler:
Resolved conversion options
calibre version: 0.7.56
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': u'debug',
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0xa6a0e4c>,
'insert_blank_line': False,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0xa6a640c>,
'page_breaks_before': None,
'password': u'xxxxxxxx',
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'tags': None,
'test': True,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unwrap_lines': True,
'use_auto_toc': False,
'username': u'xxxxxxxx',
'verbose': 2}
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
1% Fetching feed EMQ...
1% Trying to download cover...
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
Downloading
Downloading
FetchingFetching file:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_Ge0jDU_feeds2disk.htmlfile:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_CTUTJx_feeds2disk.html

Processing images...
Processing images...
Recursion limit reached. Skipping links inRecursion limit reached. Skipping links in file:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_CTUTJx_feeds2disk.htmlfile:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_Ge0jDU_feeds2disk.html

file:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_CTUTJx_feeds2disk.html saved to /tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_sPsBN5_plumber/feed_0/article_0/calibre_0.7.56_CTUTJx_feeds2disk.xhtml
Downloaded article: "What about My Singleness?" Encouraging the Single Person toward Ministry from http://www.emisdirect.com/emq/Issue-315/2520
17% Article downloaded: u'"What about My Singleness?" Encouraging the Single Person toward Ministry'
file:///tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_Ge0jDU_feeds2disk.html saved to /tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_sPsBN5_plumber/feed_0/article_1/calibre_0.7.56_Ge0jDU_feeds2disk.xhtml
Downloaded article: Korean Contextualization: A Brief Examination from http://www.emisdirect.com/emq/Issue-315/2529
34% Article downloaded: u'Korean Contextualization: A Brief Examination'
34% Feeds downloaded to /tmp/calibre_0.7.56_tmp_Y48ClJ/calibre_0.7.56_sPsBN5_plumber/index.html
34% Download finished
Input debug saved to: /home/user/debug/input
Parsing all content...
Parsing feed_0/article_1/index.html ...
Forcing feed_0/article_1/index.html into XHTML namespace
Parsing feed_0/index.html ...
Initial parse failed:
Traceback (most recent call last):
File "site-packages/calibre/ebooks/oeb/base.py", line 886, in first_pass
File "lxml.etree.pyx", line 2532, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48634)
File "parser.pxi", line 1545, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:72245)
File "parser.pxi", line 1417, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:71041)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67581)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521)
XMLSyntaxError: Opening and ending tag mismatch: br line 32 and div, line 33, column 7

Parsing file 'feed_0/index.html' as HTML
Forcing feed_0/index.html into XHTML namespace
Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Parsing index.html ...
Forcing index.html into XHTML namespace
Referenced file 'feed_1/index.html' not found
Reading TOC from NCX...
Parsed HTML written to: /home/user/debug/parsed
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Structured HTML written to: /home/user/debug/structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 9 items of level: div_1
Found 2 items of level: div_2
Found 4 items of level: p_2
Found 2 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Processed HTML written to: /home/user/debug/processed
Creating EPUB Output...
67% Creating EPUB Output
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{'feed_0/article_0/index.html': 'feed_0/article_0/index_u2.html',
'feed_0/index.html': 'feed_0/index_u1.html',
'index.html': 'index_u3.html'}
Looking for large trees in feed_0/article_1/index.html...
No large trees found
Looking for large trees in index_u3.html...
No large trees found
Looking for large trees in feed_0/article_0/index_u2.html...
No large trees found
Looking for large trees in feed_0/index_u1.html...
No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to /home/user/emq.epub
Output saved to /home/user/emq.epub


Thanks in advance for any suggestions.
stanleti is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
MacWorld recipe - only headlines - no articles simonz Recipes 4 06-04-2011 09:02 AM
Not Getting full Articles in Recipes Bushwil Recipes 1 02-03-2011 02:42 PM
Decorate article headings as hyperlinks to full article? tomsem Recipes 5 10-15-2010 08:30 PM
LA Weekly - Trouble - Full articles? kidblue Recipes 21 10-09-2010 04:16 PM
Full Articles via RSS jotheman Reading and Management 17 07-06-2008 05:12 AM


All times are GMT -4. The time now is 08:26 AM.


MobileRead.com is a privately owned, operated and funded community.