View Single Post
Old 09-01-2011, 05:31 PM   #7
haroldtreen
Member
haroldtreen began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Aug 2011
Location: Toronto, Canada
Device: Kindle 3
Thanks again Kovid!

So I realized that the code is looking for HTML tags that hold the info you want to clean. So even though I don't know what lots of the python means, I can see somewhat what is going on.

I changed Darko's code to pull the URL from the website, which is then cleaned with the AutoClean feature.

With that, I believe this recipe does exactly what I want now.

1) Pull all unread articles from Instapaper
2) Download a readability version of each article
3) Archive all the articles

As of now the only problems are

1) Anyone with more then 6 pages of unread articles won't get ALL their articles

2) All articles are archived as part of the cleanup. Their should be a way to select the archive option after the URL is fetched...but that sort of python is beyond me... if any developers knows how I would love to see it.

3) Articles downloaded with this recipe seem to have fewer images then before...

I looked at 1 webpage in three ways to see what might be up.

- When downloaded with the recipe it has no images
- When taken from the "text only feature" of instapaper it contains multiple images (although many which weren't meant to be part of the article).
- When taken with readability inside chrome it shows correctly with 1 image.

This is me being a perfectionist though. As long as all the content gets downloaded, I'm happy.

I'm going to post the new recipe in my post above. I'll include # comments so others with no coding background can modify it to their liking.

Cheers!
haroldtreen is offline   Reply With Quote