View Single Post
Old 11-22-2012, 10:33 PM   #1
rouilj
Junior Member
rouilj began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2012
Device: nook tablet
Science News recipie for calibre producing epub without content

I filed the original report as a bug against calibre at:
https://bugs.launchpad.net/calibre/+bug/1082233
and the calibre author suggested I post here.

Here are the details:

I am running calibre 0.9.6 under Windows XP SP 3. The automatic download of science news is producing an epub that consists of a table of contents
and pages with the calibre footer on them.

This is using the recipe included in calibre.

Testing the recipe using:

ebook-convert ScienceNews.recipe .epub --test -vv --debug-pipeline debug

resulted in the output:

Resolved conversion options
calibre version: 0.9.6
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': u'debug',
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x018CC9F0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x018CCBD0>,
'page_breaks_before': None,
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': None,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'test': True,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
Trying to get latest version of recipe: science_news
Using downloaded builtin recipe
1% Fetching feeds...
1% Fetching feed Science News / News Items...
1% Trying to download cover...
<img class="thumbnail print" alt="issue" src="/view/scale/id/346547/width/225/height/225" />
34% Downloading cover from http://www.sciencenews.org/view/scal...height/225.jpg
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
Downloading
Fetching http://www.sciencenews.org/index.php...om_dehydration
Downloading
Fetching http://www.sciencenews.org/index.php...nsion_slowdown
Processing images...
Fetching http://pixel.quantserve.com/pixel/p-7daKFnhj4RYR-.gif
Recursion limit reached. Skipping links in http://www.sciencenews.org/index.php...nsion_slowdown
http://www.sciencenews.org/index.php...nsion_slowdown saved to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\feed_0\article_1\index.xhtm l
Downloaded article: Glimpse at early universe finds expansion slowdown from http://www.sciencenews.org/index.php...nsion_slowdown
17% Article downloaded: Glimpse at early universe finds expansion slowdown
Processing images...
Recursion limit reached. Skipping links in http://www.sciencenews.org/index.php...om_dehydration
http://www.sciencenews.org/index.php...om_dehydration saved to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\feed_0\article_0\index.xhtm l
Downloaded article: Trees worldwide a sip away from dehydration from http://www.sciencenews.org/index.php...om_dehydration
34% Article downloaded: Trees worldwide a sip away from dehydration
34% Feeds downloaded to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\index.html
34% Download finished
Input debug saved to: C:\tmp\debug\input
Parsing all content...
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing feed_0/article_1/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/article_1/index.html as HTML
Referenced file u'feed_1/index.html' not found
Reading TOC from NCX...
Parsed HTML written to: C:\tmp\debug\parsed
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Structured HTML written to: C:\tmp\debug\structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 5 items of level: div_1
Found 2 items of level: div_2
Found 2 items of level: p_2
Found 2 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Trimming u'feed_0/article_1/images/img1.jpg' from manifest
Processed HTML written to: C:\tmp\debug\processed
Creating EPUB Output...
67% Running EPUB Output plugin
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'feed_0/article_0/index.html': u'feed_0/article_0/index_u1.html',
u'feed_0/article_1/index.html': u'feed_0/article_1/index_u3.html',
u'feed_0/index.html': u'feed_0/index_u2.html'}
Splitting markup on page breaks and flow limits, if any...
Looking for large trees in feed_0/article_1/index_u3.html...
No large trees found
Looking for large trees in feed_0/article_0/index_u1.html...
No large trees found
Looking for large trees in index.html...
No large trees found
Looking for large trees in feed_0/index_u2.html...
No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to C:\tmp\ScienceNews.epub
Output saved to C:\tmp\ScienceNews.epub

The epub produced is 21834 bytes while Science News is usually 60+ pages in length and hence much larger.

Thanks for any ideas.

-- rouilj
rouilj is offline   Reply With Quote