Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-04-2014, 12:36 PM   #1
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Blogger/Blogspot Comment Parser & More

This is a news recipe which, as configured, downloads the latest post from The Archdruid Report (in my mind some of the most insightful commentary anywhere). It can also be configured to download all posts on the front page, or a collection of posts from any given month.

The core of this recipe, and the feature most valuable to readers of this particular weekly blog, takes follow-up comments by the author/moderator and inserts them immediately after the commentator to whom he is responding. This was not a simple hack.

It also combines subsequent pages of comments into a single page. A good example of using preprocess_html and postprocess_html.

The recipe seems to work with arbitrary Blogger/Blogspot blogs. You will see two other blogs pre-configured in the recipe. Just comment out the blog you don't want, or put the blog you do want last in the list.

Additionally, you can use this recipe as a stand-alone python program (Assuming you have Calibre installed, as it relies on Calibre command-line tools). You can test an arbitrary Blogger/Blogspot blog in this way:

rename ADR.text to BlogParse.py and make it executable.

./BlogParse.py http://myblog.blogspot.com/post.html

or

./BlogParse.py http://myblog.blogspot.com/post.html mod='Moderator Name'

The second is for blogs where the author has not configured a VCard that can be parsed from the html source.

The command line version is only intended for individual blog entries, not the homepage of the blog.

Hope someone finds this useful! It was a fun project.

-- original files removed. see latest below...

Last edited by EnergyLens; 11-23-2014 at 11:02 AM. Reason: I realized that the version I posted was unnecessarily draconian about images as the primary target is almost entirely text.
EnergyLens is offline   Reply With Quote
Old 11-05-2014, 01:30 AM   #2
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Updated to include images

I tried to 'architect' this code such that the BloggerSoup class could be replaced by a 'TypePadSoup' or other blog platform that may have the same problem with unthreaded comments...

Last edited by EnergyLens; 11-05-2014 at 02:19 AM.
EnergyLens is offline   Reply With Quote
Old 11-19-2014, 01:41 AM   #3
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Updated, many improvements, cleaner code

Extensive revisions/cleaner code; more robust matching; indicators (*) when comments have responses.

And, if anyone expresses interest, I will upload the version that includes auto-recognition of Book Titles / Names as well as extraction of urls and Book Titles.

-- file removed

Last edited by EnergyLens; 11-23-2014 at 11:02 AM.
EnergyLens is offline   Reply With Quote
Old 11-22-2014, 11:50 AM   #4
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Wouldn't you know...

The very next ADR entry turned up another weakness, which can be avoided easily by changing one value in one line of code:

change:
if diffmatch_2 >= .5

to:
if diffmatch_2 >= .6

BUT, this is only avoiding the weakness and I am working on yet another revision of the ListAligner algorithm, which looks like it is going to be leaner and cleaner...
EnergyLens is offline   Reply With Quote
Old 11-23-2014, 11:04 AM   #5
EnergyLens
Hack
EnergyLens began at the beginning.
 
Posts: 34
Karma: 12
Join Date: Dec 2009
Device: Kobo Aura HD, Kindle Paperwhite
Here is the latest.

It is currently set to JMG's other blog, The Well of Galabes.
Import from File, then edit to change to another of the blogs which are pre-configured in the recipe, then re-import.
Attached Files
File Type: zip Blogger.recipe.zip (15.7 KB, 244 views)
EnergyLens is offline   Reply With Quote
Reply

Tags
blogger, comments, news, pages, recipe


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I'm relieved! Received my first review from a book blogger & it doesn't suck! raychensmith Self-Promotions by Authors and Publishers 0 07-24-2012 08:45 PM
Clippings Parser wiccan2 Kindle Developer's Corner 10 09-21-2011 01:21 PM
Transfer & View Comment Metadata? TWizz Kobo Reader 3 08-04-2011 12:23 AM
blogger/blogspot hosted blogs coming up blank? nuveen Recipes 1 02-23-2011 09:22 AM
Parser can't identify form used for user/pass Solari Calibre 3 03-01-2009 07:04 PM


All times are GMT -4. The time now is 02:28 PM.


MobileRead.com is a privately owned, operated and funded community.