View Single Post
Old 11-04-2014, 12:36 PM   #1
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Blogger/Blogspot Comment Parser & More

This is a news recipe which, as configured, downloads the latest post from The Archdruid Report (in my mind some of the most insightful commentary anywhere). It can also be configured to download all posts on the front page, or a collection of posts from any given month.

The core of this recipe, and the feature most valuable to readers of this particular weekly blog, takes follow-up comments by the author/moderator and inserts them immediately after the commentator to whom he is responding. This was not a simple hack.

It also combines subsequent pages of comments into a single page. A good example of using preprocess_html and postprocess_html.

The recipe seems to work with arbitrary Blogger/Blogspot blogs. You will see two other blogs pre-configured in the recipe. Just comment out the blog you don't want, or put the blog you do want last in the list.

Additionally, you can use this recipe as a stand-alone python program (Assuming you have Calibre installed, as it relies on Calibre command-line tools). You can test an arbitrary Blogger/Blogspot blog in this way:

rename ADR.text to BlogParse.py and make it executable.

./BlogParse.py http://myblog.blogspot.com/post.html

or

./BlogParse.py http://myblog.blogspot.com/post.html mod='Moderator Name'

The second is for blogs where the author has not configured a VCard that can be parsed from the html source.

The command line version is only intended for individual blog entries, not the homepage of the blog.

Hope someone finds this useful! It was a fun project.

-- original files removed. see latest below...

Last edited by EnergyLens; 11-23-2014 at 11:02 AM. Reason: I realized that the version I posted was unnecessarily draconian about images as the primary target is almost entirely text.
EnergyLens is offline   Reply With Quote