View Single Post
Old 02-09-2013, 05:21 AM   #10
surf
Member
surf began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Feb 2013
Device: kindle
Quote:
Originally Posted by kovidgoyal View Post
I dont have time to investigate parsing issues in feedparser, if it is indeed a aprsing issue, but make sure you have set the oldest_article setting in your recipe correctly.
Quote:
Originally Posted by kovidgoyal View Post
I dont have time to investigate parsing issues in feedparser, if it is indeed a aprsing issue, but make sure you have set the oldest_article setting in your recipe correctly.
Hi, Kovid, sorry for troubling you again

I, trying to narrow down the issue

For the attached XML (very simple, containing 3 items), calibre would only fetch the first item.

My recipe and the log are as below

Code:
class GoogleReader(BasicNewsRecipe):

    title   = 'z - GR-pipe-WSJ'
    description = ''
    __author__ = 'Surf'

    oldest_article = 365
    max_articles_per_feed = 400
 
    use_embedded_content = True
    auto_cleanup = False

    feeds = [(u'GR-pipe-WSJ', 'file:///D:/GR-pipe-WSJ.xml')]
Fetch news from z - GR-NYT
Resolved conversion options
calibre version: 0.8.68
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x03C73810>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.GenericEink object at 0x03C73A10>,
'page_breaks_before': None,
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': None,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'start_reading_at': None,
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: Recipe Input running
Synthesizing mastheadImage
Downloading
Fetching file:C:\Users\LMH\AppData\Local\Temp\calibre_0.8.6 8_tmp_nrhgty\xj7edb_feeds2disk.html
WARNING: Encoding detection confidence 99%
Processing images...
Fetching http://g1.cn.nytimes.com/images/2010...icleInline.jpg

Recursion limit reached. Skipping links in file:C:\Users\LMH\AppData\Local\Temp\calibre_0.8.6 8_tmp_nrhgty\xj7edb_feeds2disk.html
file:C:\Users\LMH\AppData\Local\Temp\calibre_0.8.6 8_tmp_nrhgty\xj7edb_feeds2disk.html saved to C:\Users\LMH\AppData\Local\Temp\calibre_0.8.68_tmp _nrhgty\jcjhdy_plumber\feed_0\article_0\xj7edb_fee ds2disk.xhtml
Downloaded article: 虚拟中产阶级的崛起 from http://cn.nytimes.com/tools/r.html?f...iedman%2F&cid=
Parsing all content...
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML

Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Referenced file u'feed_1/index.html' not found
Reading TOC from NCX...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 6 items of level: div_1
Found 2 items of level: div_2
Found 16 items of level: p_2
Found 1 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1 left margin stats: Counter({u'': 1})
div_1 right margin stats: Counter({u'': 1})
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'feed_0/article_0/index.html': u'feed_0/article_0/index_u2.html',
u'feed_0/index.html': u'feed_0/index_u1.html'}
Rescaling image from 590x750 to 566x720 cover.jpg
Rescaling image from 600x60 to 566x56 mastheadImage.jpg
Splitting markup on page breaks and flow limits, if any...
Looking for large trees in feed_0/article_0/index_u2.html...
No large trees found
Looking for large trees in index.html...
No large trees found
Looking for large trees in feed_0/index_u1.html...
No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to C:\Users\LMH\AppData\Local\Temp\calibre_0.8.68_tmp _nrhgty\j7m479_recipe_out.epub
Attached Files
File Type: xml GR-NYT.xml (24.1 KB, 35 views)
File Type: epub GR-NYT (output).epub (137.6 KB, 29 views)
surf is offline   Reply With Quote