Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-19-2012, 04:41 PM   #1
apiontek
Member
apiontek began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Mar 2012
Device: Kindle Touch
Modifying recipe for Wordpress blog with comments

Hi - wondering if anyone can help me out... I'm new to trying to script or do recipes, so, apologies... I've been trying to figure this out myself but I'm running into a lack-of-comprehension wall.

My end goal is to get recent articles, with comment threads, from a blog I follow. A simple custom recipe gets the articles, but not the comment threads, so I've been trying to research how to do it with the comments.

Via this thread, I found the recipe shared here for another wordpress blog. I've tested it, and it works, with comments. So that's good.

Now, I'm trying to adapt that recipe for a Wordpress blog I follow called Savage Minds. Here's my modified version of the recipe:

Spoiler:
Code:
class Savage_minds(BasicNewsRecipe):
    title          = u'Savage Minds'
    oldest_article = 7
    max_articles_per_feed = 100

    no_stylesheets = True

    keep_only_tags = dict(name='div', attrs={'id':'content'})
    remove_tags = [dict(name='div', attrs={'class':'meta clear'}),
        dict(name='div', attrs={'class':'snap_nopreview sharing robots-nocontent'}),
        dict(name='div', attrs={'id':'respond'}),
        ]

    feeds          = [(u'Savage Minds Blog', u'http://savageminds.org/feed/')]


It seems like it should work, based on what I understand of the other recipe and the HTML code it's parsing, but I'm doing something wrong, because all I get is an index and no articles. Here is the --test -vv output:

Spoiler:
Code:
Resolved conversion options
calibre version: 0.8.43
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0,
 'book_producer': None,
 'change_justification': 'original',
 'chapter': None,
 'chapter_mark': 'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_download_recipe': False,
 'duplicate_links_in_toc': False,
 'enable_heuristics': False,
 'extra_css': None,
 'filter_css': None,
 'fix_indents': True,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x02FD0390>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0,
 'linearize_tables': False,
 'lrf': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_inline_navbars': False,
 'output_profile': <calibre.customize.profiles.OutputProfile object at 0x02FD0570>,
 'page_breaks_before': None,
 'password': None,
 'prefer_metadata_cover': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': None,
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': '',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': '',
 'sr1_search': '',
 'sr2_replace': '',
 'sr2_search': '',
 'sr3_replace': '',
 'sr3_search': '',
 'tags': None,
 'test': True,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'username': None,
 'verbose': 2}
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
1% Fetching feed Savage Minds Blog...
1% Trying to download cover...
1% Generating masthead...
Synthesizing mastheadImage
1% Starting download [4 thread(s)]...
Downloading
Downloading
Fetching file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html
Fetching file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html
WARNING: Encoding detection confidence 99%
Processing images...
Recursion limit reached. Skipping links in file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html
file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ej_hde_feeds2disk.html saved to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\feed_0\article_0\ej_hde_feeds2disk.xhtml
WARNING: Encoding detection confidence 99%
Processing images...
Recursion limit reached. Skipping links in file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html
file:C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\sqw63j_feeds2disk.html saved to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\feed_0\article_1\sqw63j_feeds2disk.xhtml
Downloaded article: Grading Papers from http://savageminds.org/2012/03/19/grading-papers/
17% Article downloaded: Grading Papers
Downloaded article: Statement of Teaching Philosophy from http://savageminds.org/2012/03/13/statement-of-teaching-philosophy/
34% Article downloaded: Statement of Teaching Philosophy
34% Feeds downloaded to C:\Users\apiontek\AppData\Local\Temp\calibre_0.8.43_tmp_k2rjur\ywj4yi_plumber\index.html
34% Download finished
Parsing all content...
Parsing feed_0/index.html ...
Initial parse failed, using more forgiving parsers
Parsing feed_0/index.html as HTML
Parsing index.html ...
Forcing index.html into XHTML namespace
Parsing feed_0/article_1/index.html ...
Forcing feed_0/article_1/index.html into XHTML namespace
Parsing feed_0/article_0/index.html ...
Forcing feed_0/article_0/index.html into XHTML namespace
Referenced file u'feed_1/index.html' not found
Reading TOC from NCX...
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Parsing stylesheet.css ...
Found 7 items of level: div_1
Found 2 items of level: div_2
Found 4 items of level: p_2
Found 2 items of level: div_4
Ignoring level p_2
Ignoring level div_4
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
div_2  left margin stats: Counter()
div_2  right margin stats: Counter()
Cleaning up manifest...
Trimming unused files from manifest...
Creating OEB Output...
67% Creating OEB Output
The cover image has an id != "cover". Renaming to work around bug in Nook Color
OEB output written to C:\Users\apiontek\Dropbox\Reading\Calibre Recipes\Savage_minds_blog
Output saved to   C:\Users\apiontek\Dropbox\Reading\Calibre Recipes\Savage_minds_blog


I don't get it. It looks like it's failing to download all the articles and then failing to parse them, but I can't understand why.

Any help would be appreciated

Last edited by apiontek; 03-19-2012 at 04:55 PM. Reason: adding spoiler tags
apiontek is offline   Reply With Quote
Old 03-20-2012, 11:23 AM   #2
apiontek
Member
apiontek began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Mar 2012
Device: Kindle Touch
Lightbulb Solved.

Well, I figured out a solution. From continuing to browse here, I saw someone else had used
Code:
use_embedded_content = False
to solve something that looked similar. With that, I've come up with a solution that's working:

Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
'''
Savage Minds
'''
import string
import re

from calibre.web.feeds.news import BasicNewsRecipe

class Savage_Minds(BasicNewsRecipe):
    title          = u'Savage Minds'
    description = 'Notes and Queries in Anthropology - A Group Blog'
    cover_url       = 'http://savageminds.org/wp-content/themes/SM2009Test/images/sidebar/sidebox.jpg'
    use_embedded_content = False
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False
    no_stylesheets = True

    feeds          = [(u'Savage Minds Entries', u'http://savageminds.org/feed/')]

    keep_only_tags    = [dict(name='div', attrs={'id':'content'})]
    remove_tags = [dict(name='div', attrs={'class':'meta clear'}),
        dict(name='div', attrs={'class':'snap_nopreview sharing robots-nocontent'}),
        dict(name='div', attrs={'id':'respond'}),
        dict(name='div', attrs={'class':'c-grav'}),
        dict(name='span', attrs={'class':'c-permalink'})
        ]


It seems like even when I change "oldest_article" to, say, 14, or 20, Calibre still only downloads the latest two articles, but in the long run 7 days is fine, so I guess I'm not going to worry about it.
apiontek is offline   Reply With Quote
 
Enthusiast
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
iPad Recipe needed for Wordpress Blog bmwr1200c Apple Devices 4 02-11-2012 11:22 AM
Recipe Needed for Wordpress Blog. bmwr1200c Recipes 1 02-03-2012 11:31 AM
Recipe for Real-World Economics Review Blog (wordpress) needed marksoc Recipes 4 10-29-2011 04:33 AM
Modifying different feeds in one recipe rjchew Recipes 1 08-03-2011 01:27 PM
Recipe for Wordpress ventures Recipes 0 05-24-2011 09:10 PM


All times are GMT -4. The time now is 12:00 PM.


MobileRead.com is a privately owned, operated and funded community.