Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-22-2012, 10:33 PM   #1
rouilj
Junior Member
rouilj began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2012
Device: nook tablet
Science News recipie for calibre producing epub without content

I filed the original report as a bug against calibre at:
https://bugs.launchpad.net/calibre/+bug/1082233
and the calibre author suggested I post here.

Here are the details:

I am running calibre 0.9.6 under Windows XP SP 3. The automatic download of science news is producing an epub that consists of a table of contents
and pages with the calibre footer on them.

This is using the recipe included in calibre.

Testing the recipe using:

ebook-convert ScienceNews.recipe .epub --test -vv --debug-pipeline debug

resulted in the output:

Resolved conversion options
calibre version: 0.9.6
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': u'debug',
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_download_recipe': False,
'dont_split_on_page_breaks': True,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x018CC9F0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.OutputProfile object at 0x018CCBD0>,
'page_breaks_before': None,
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'search_replace': None,
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'test': True,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
Trying to get latest version of recipe: science_news
Using downloaded builtin recipe
1% Fetching feeds...
1% Fetching feed Science News / News Items...
1% Trying to download cover...
<img class="thumbnail print" alt="issue" src="/view/scale/id/346547/width/225/height/225" />
34% Downloading cover from http://www.sciencenews.org/view/scal...height/225.jpg
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
Downloading
Fetching http://www.sciencenews.org/index.php...om_dehydration
Downloading
Fetching http://www.sciencenews.org/index.php...nsion_slowdown
Processing images...
Fetching http://pixel.quantserve.com/pixel/p-7daKFnhj4RYR-.gif
Recursion limit reached. Skipping links in http://www.sciencenews.org/index.php...nsion_slowdown
http://www.sciencenews.org/index.php...nsion_slowdown saved to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\feed_0\article_1\index.xhtm l
Downloaded article: Glimpse at early universe finds expansion slowdown from http://www.sciencenews.org/index.php...nsion_slowdown
17% Article downloaded: Glimpse at early universe finds expansion slowdown
Processing images...
Recursion limit reached. Skipping links in http://www.sciencenews.org/index.php...om_dehydration
http://www.sciencenews.org/index.php...om_dehydration saved to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\feed_0\article_0\index.xhtm l
Downloaded article: Trees worldwide a sip away from dehydration from http://www.sciencenews.org/index.php...om_dehydration
34% Article downloaded: Trees worldwide a sip away from dehydration
34% Feeds downloaded to C:\DOCUME~1\rouilj\LOCALS~1\Temp\calibre_0.9.6_tmp _qsdk4w\arxik6_plumber\index.html
34% Download finished
Input debug saved to: C:\tmp\debug\input
Parsing all content...
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing feed_0/article_1/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/article_1/index.html as HTML
Referenced file u'feed_1/index.html' not found
Reading TOC from NCX...
Parsed HTML written to: C:\tmp\debug\parsed
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Structured HTML written to: C:\tmp\debug\structure
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 5 items of level: div_1
Found 2 items of level: div_2
Found 2 items of level: p_2
Found 2 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1 left margin stats: Counter()
div_1 right margin stats: Counter()
div_2 left margin stats: Counter()
div_2 right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Trimming u'feed_0/article_1/images/img1.jpg' from manifest
Processed HTML written to: C:\tmp\debug\processed
Creating EPUB Output...
67% Running EPUB Output plugin
Found non-unique filenames, renaming to support broken EPUB readers like FBReader, Aldiko and Stanza...
{u'feed_0/article_0/index.html': u'feed_0/article_0/index_u1.html',
u'feed_0/article_1/index.html': u'feed_0/article_1/index_u3.html',
u'feed_0/index.html': u'feed_0/index_u2.html'}
Splitting markup on page breaks and flow limits, if any...
Looking for large trees in feed_0/article_1/index_u3.html...
No large trees found
Looking for large trees in feed_0/article_0/index_u1.html...
No large trees found
Looking for large trees in index.html...
No large trees found
Looking for large trees in feed_0/index_u2.html...
No large trees found
The cover image has an id != "cover". Renaming to work around bug in Nook Color
EPUB output written to C:\tmp\ScienceNews.epub
Output saved to C:\tmp\ScienceNews.epub

The epub produced is 21834 bytes while Science News is usually 60+ pages in length and hence much larger.

Thanks for any ideas.

-- rouilj
rouilj is offline   Reply With Quote
Old 12-25-2012, 03:21 PM   #2
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
I tried fiddling with the recipe and can't get it to work either. Anyone else have any ideas? This is a really great little mag and I miss it having it on my Nook even though I subscribe to the paper version.
davidnye is offline   Reply With Quote
Advert
Old 02-24-2013, 01:59 PM   #3
davidnye
Member
davidnye began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Aug 2011
Device: Nook
Fixed it. Here is an updated recipe for Science News Recent Issues:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
'''
sciencenews.org
'''
from calibre.web.feeds.news import BasicNewsRecipe

class ScienceNewsIssue(BasicNewsRecipe):
    title                 = u'Science News Recent Issues'
    __author__            = u'Darko Miletic, Sujata Raman and Starson17'
    description           = u'''Science News is an award-winning weekly
    newsmagazine covering the most important research in all fields of science.
    Its 16 pages each week are packed with short, accurate articles that appeal
    to both general readers and scientists. Published since 1922, the magazine
    now reaches about 150,000 subscribers and more than 1 million readers.
    These are the latest News Items from Science News. This recipe downloads
    the last 30 days worth of articles.'''
    category              = u'Science, Technology, News'
    publisher             = u'Society for Science & the Public'
    oldest_article        = 30
    language = 'en'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    timefmt               = ' [%A, %d %B, %Y]'
    recursions = 1
    remove_attributes = ['style']

    conversion_options = {'linearize_tables'  : True
                        , 'comment'           : description
                        , 'tags'              : category
                        , 'publisher'         : publisher
                        , 'language'          : language
                        }

    extra_css = '''
                .content_description{font-family:georgia ;font-size:x-large; color:#646464 ; font-weight:bold;}
                .content_summary{font-family:georgia ;font-size:small ;color:#585858 ; font-weight:bold;}
                .content_authors{font-family:helvetica,arial ;font-size: xx-small ;color:#14487E ;}
                .content_edition{font-family:helvetica,arial ;font-size: xx-small ;}
                .exclusive{color:#FF0000 ;}
                .anonymous{color:#14487E ;}
                .content_content{font-family:helvetica,arial ;font-size: medium ; color:#000000;}
                .description{color:#585858;font-family:helvetica,arial ;font-size: large ;}
                .credit{color:#A6A6A6;font-family:helvetica,arial ;font-size: xx-small ;}
                '''

    keep_only_tags = [ dict(name='div', attrs={'class':'content_content'}),
                       dict(name='ul', attrs={'id':'toc'})
                     ]

    feeds       = [(u"Science News Current Issues", u'http://www.sciencenews.org/view/feed/type/edition/name/issues.rss')]
    
    match_regexps = [
            r'www.sciencenews.org/view/feature/id/',
            r'www.sciencenews.org/view/generic/id'
            ]

    def get_cover_url(self):
        cover_url = None
        index = 'http://www.sciencenews.org/view/home'
        soup = self.index_to_soup(index)
        link_item = soup.find(name = 'img',alt = "issue")
        if link_item:
           cover_url = 'http://www.sciencenews.org' + link_item['src'] + '.jpg'

        return cover_url

    def preprocess_html(self, soup):
        for tag in soup.findAll(name=['span']):
            tag.name = 'div'
        return soup
davidnye is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I configure the calibre content server to download news? fabian Library Management 18 04-24-2011 02:01 PM
Opening hyperlinks in Calibre EPUB news documents on the Nook TimboK Barnes & Noble NOOK 4 11-29-2010 04:18 PM
Calibre News Epub Image Scaling grib Calibre 3 01-07-2010 06:45 AM
Calibre Recipie Output Crashing Reader snickp Sony Reader 1 08-13-2009 11:04 AM
Producing ePub documents from Adobe InDesign Alexander Turcic Deals and Resources (No Self-Promotion or Affiliate Links) 12 09-07-2007 12:36 PM


All times are GMT -4. The time now is 05:39 AM.


MobileRead.com is a privately owned, operated and funded community.