Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2012, 04:41 PM   #1
apiontek
Member
apiontek began at the beginning.
 
apiontek's Avatar
 
Posts: 18
Karma: 10
Join Date: Mar 2012
Location: Queens, NY
Device: Kobo Glo HD
Modifying recipe for Wordpress blog with comments

Hi - wondering if anyone can help me out... I'm new to trying to script or do recipes, so, apologies... I've been trying to figure this out myself but I'm running into a lack-of-comprehension wall.

My end goal is to get recent articles, with comment threads, from a blog I follow. A simple custom recipe gets the articles, but not the comment threads, so I've been trying to research how to do it with the comments.

Via this thread, I found the recipe shared here for another wordpress blog. I've tested it, and it works, with comments. So that's good.

Now, I'm trying to adapt that recipe for a Wordpress blog I follow called Savage Minds. Here's my modified version of the recipe:

Spoiler:
Code:
class Savage_minds(BasicNewsRecipe):
    title          = u'Savage Minds'
    oldest_article = 7
    max_articles_per_feed = 100

    no_stylesheets = True

    keep_only_tags = dict(name='div', attrs={'id':'content'})
    remove_tags = [dict(name='div', attrs={'class':'meta clear'}),
        dict(name='div', attrs={'class':'snap_nopreview sharing robots-nocontent'}),
        dict(name='div', attrs={'id':'respond'}),
        ]

    feeds          = [(u'Savage Minds Blog', u'http://savageminds.org/feed/')]


It seems like it should work, based on what I understand of the other recipe and the HTML code it's parsing, but I'm doing something wrong, because all I get is an index and no articles. Here is the --test -vv output:

Spoiler:
Code:
Resolved conversion options
calibre version: 0.8.43
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'duplicate_links_in_toc': False,
 'enable_heuristics': False,
 'extra_css': None,
 'filter_css': None,
 'fix_indents': True,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x02FD0390>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_inline_navbars': False,
 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x02FD0570>,
 'page_breaks_before': None,
 'password': None,
 'prefer_metadata_cover': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': '',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': '',
 'sr1_search': '',
 'sr2_replace': '',
 'sr2_search': '',
 'sr3_replace': '',
 'sr3_search': '',
 'tags': None,
 'test': True,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'username': None,
 'verbose': 2}
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
1% Fetching feed Savage Minds Blog...
1% Trying to download cover...
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
Downloading
Downloading
Fetching file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html
Fetching file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html
WARNING: Encoding detection confidence 99%
Processing images...
Recursion limit reached. Skipping links in file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html
file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html saved to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\feed_0\article_0\ej_hde_feeds2disk.xhtml
WARNING: Encoding detection confidence 99%
Processing images...
Recursion limit reached. Skipping links in file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html
file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html saved to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\feed_0\article_1\sqw63j_feeds2disk.xhtml
Downloaded article: Grading Papers from http://savageminds.org/2012/03/19/grading-papers/
17% Article downloaded: Grading Papers
Downloaded article: Statement of Teaching Philosophy from http://savageminds.org/2012/03/13/statement-of-teaching-philosophy/
34% Article downloaded: Statement of Teaching Philosophy
34% Feeds downloaded to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\index.html
34% Download finished
Parsing all content...
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/article_1/index.html ...
Forcing feed_0/article_1/index.html into XHTML namespace
Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Referenced file u'feed_1/index.html' not found
Reading TOC from NCX...
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 7 items of level: div_1
Found 2 items of level: div_2
Found 4 items of level: p_2
Found 2 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
div_2  left margin stats: Counter()
div_2  right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Creating OEB Output...
67% Creating OEB Output
The cover image has an id != "cover". Renaming to work around bug in Nook Color
OEB output written to C:\Users\apiontek\Dropbox\Reading\Calibre Recipes\Savage_minds_blog
Output saved to   C:\Users\apiontek\Dropbox\Reading\Calibre Recipes\Savage_minds_blog


I don't get it. It looks like it's failing to download all the articles and then failing to parse them, but I can't understand why.

Any help would be appreciated

Last edited by apiontek; 03-19-2012 at 04:55 PM. Reason: adding spoiler tags
apiontek is offline   Reply With Quote
Old 03-20-2012, 11:23 AM   #2
apiontek
Member
apiontek began at the beginning.
 
apiontek's Avatar
 
Posts: 18
Karma: 10
Join Date: Mar 2012
Location: Queens, NY
Device: Kobo Glo HD
Lightbulb Solved.

Well, I figured out a solution. From continuing to browse here, I saw someone else had used
Code:
use_embedded_content = False
to solve something that looked similar. With that, I've come up with a solution that's working:

Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
'''
Savage Minds
'''
import string
import re

from calibre.web.feeds.news import BasicNewsRecipe

class Savage_Minds(BasicNewsRecipe):
    title          = u'Savage Minds'
    description = 'Notes and Queries in Anthropology - A Group Blog'
    cover_url       = 'http://savageminds.org/wp-content/themes/SM2009Test/images/sidebar/sidebox.jpg'
    use_embedded_content = False
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False
    no_stylesheets = True

    feeds          = [(u'Savage Minds Entries', u'http://savageminds.org/feed/')]

    keep_only_tags    = [dict(name='div', attrs={'id':'content'})]
    remove_tags = [dict(name='div', attrs={'class':'meta clear'}),
        dict(name='div', attrs={'class':'snap_nopreview sharing robots-nocontent'}),
        dict(name='div', attrs={'id':'respond'}),
        dict(name='div', attrs={'class':'c-grav'}),
        dict(name='span', attrs={'class':'c-permalink'})
        ]


It seems like even when I change "oldest_article" to, say, 14, or 20, Calibre still only downloads the latest two articles, but in the long run 7 days is fine, so I guess I'm not going to worry about it.
apiontek is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
iPad Recipe needed for Wordpress Blog bmwr1200c Apple Devices 4 02-11-2012 11:22 AM
Recipe Needed for Wordpress Blog. bmwr1200c Recipes 1 02-03-2012 11:31 AM
Recipe for Real-World Economics Review Blog (wordpress) needed marksoc Recipes 4 10-29-2011 04:33 AM
Modifying different feeds in one recipe rjchew Recipes 1 08-03-2011 01:27 PM
Recipe for Wordpress ventures Recipes 0 05-24-2011 09:10 PM


All times are GMT -4. The time now is 05:34 AM.


MobileRead.com is a privately owned, operated and funded community.