11-04-2014, 12:36 PM | #1 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Blogger/Blogspot Comment Parser & More
This is a news recipe which, as configured, downloads the latest post from The Archdruid Report (in my mind some of the most insightful commentary anywhere). It can also be configured to download all posts on the front page, or a collection of posts from any given month.
The core of this recipe, and the feature most valuable to readers of this particular weekly blog, takes follow-up comments by the author/moderator and inserts them immediately after the commentator to whom he is responding. This was not a simple hack. It also combines subsequent pages of comments into a single page. A good example of using preprocess_html and postprocess_html. The recipe seems to work with arbitrary Blogger/Blogspot blogs. You will see two other blogs pre-configured in the recipe. Just comment out the blog you don't want, or put the blog you do want last in the list. Additionally, you can use this recipe as a stand-alone python program (Assuming you have Calibre installed, as it relies on Calibre command-line tools). You can test an arbitrary Blogger/Blogspot blog in this way: rename ADR.text to BlogParse.py and make it executable. ./BlogParse.py http://myblog.blogspot.com/post.html or ./BlogParse.py http://myblog.blogspot.com/post.html mod='Moderator Name' The second is for blogs where the author has not configured a VCard that can be parsed from the html source. The command line version is only intended for individual blog entries, not the homepage of the blog. Hope someone finds this useful! It was a fun project. -- original files removed. see latest below... Last edited by EnergyLens; 11-23-2014 at 11:02 AM. Reason: I realized that the version I posted was unnecessarily draconian about images as the primary target is almost entirely text. |
11-05-2014, 01:30 AM | #2 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Updated to include images
I tried to 'architect' this code such that the BloggerSoup class could be replaced by a 'TypePadSoup' or other blog platform that may have the same problem with unthreaded comments...
Last edited by EnergyLens; 11-05-2014 at 02:19 AM. |
11-19-2014, 01:41 AM | #3 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Updated, many improvements, cleaner code
Extensive revisions/cleaner code; more robust matching; indicators (*) when comments have responses.
And, if anyone expresses interest, I will upload the version that includes auto-recognition of Book Titles / Names as well as extraction of urls and Book Titles. -- file removed Last edited by EnergyLens; 11-23-2014 at 11:02 AM. |
11-22-2014, 11:50 AM | #4 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Wouldn't you know...
The very next ADR entry turned up another weakness, which can be avoided easily by changing one value in one line of code:
change: if diffmatch_2 >= .5 to: if diffmatch_2 >= .6 BUT, this is only avoiding the weakness and I am working on yet another revision of the ListAligner algorithm, which looks like it is going to be leaner and cleaner... |
11-23-2014, 11:04 AM | #5 |
Hack
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
|
Here is the latest.
It is currently set to JMG's other blog, The Well of Galabes. Import from File, then edit to change to another of the blogs which are pre-configured in the recipe, then re-import. |
Tags |
blogger, comments, news, pages, recipe |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
I'm relieved! Received my first review from a book blogger & it doesn't suck! | raychensmith | Self-Promotions by Authors and Publishers | 0 | 07-24-2012 08:45 PM |
Clippings Parser | wiccan2 | Kindle Developer's Corner | 10 | 09-21-2011 01:21 PM |
Transfer & View Comment Metadata? | TWizz | Kobo Reader | 3 | 08-04-2011 12:23 AM |
blogger/blogspot hosted blogs coming up blank? | nuveen | Recipes | 1 | 02-23-2011 09:22 AM |
Parser can't identify form used for user/pass | Solari | Calibre | 3 | 03-01-2009 07:04 PM |