![]() |
#2611 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,209
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@JvdW: nrcnext uses the parse index function to get a list of articles and the website has changed, so it fails. Unfortunately, as I don't read Dutch, it's hard for me to fix.
|
![]() |
![]() |
#2612 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
If I have asked this before please forgive me but I can't remember how.
![]() If I have a rss feed that shows some linkes and the likes are like : http://www.nfl.com/goto?id=09000d5d81a38fd4 but that link gets automatically changed to something like this when the article loads. http://www.nfl.com/preseason/story/0...ffers-torn-mcl How the heck could I get the url that is produced when the article loads? Cause to get the print version all I need to do is Code:
def print_version(self, url): print_url = url.replace('/article/', '/printable') print 'THE PRINTABLE URL IS: ', print_url return print_url http://www.nfl.com/preseason/story/0...ffers-torn-mcl but instead i get: http://www.nfl.com/goto?id=09000d5d81a38fd4 thanks. And starson17 I'm doing like you said and making me one big template with comments and all of how to do certain things so I can cut and paste the "tricks" thanks for the advice ![]() |
![]() |
Advert | |
|
![]() |
#2613 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
The first is to skip the idea of getting the print version. Just use keep_only and remove_tags, etc. to keep what you want from the main non-print article. That's my preferred solution. The other is to treat the link as being obfuscated. |
|
![]() |
![]() |
#2614 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Thanks. I guess that is all the fun in this. Some of the feeds are hard as crap to figure out then others are easy. I think the easy ones tend to be more designed by professionals that actually take the time to follow general organizational patterns that's my thought. But in some cases I could be simply a sight trying to make it impossible to parse... Anyway thanks again and I did do that stuff in ultraedit and I love it. What I do is keep the actual myrecipe.txt open and then when i run the batch it tells me that the myrecipe.txt has been modified so i hit yes and see the changes. I really find that to be great. And I have also used the search feature to find where others did things like splits and remove and so on. |
|
![]() |
![]() |
#2615 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
if you were faced with something like this how would you remove it?
take a look at link : http://www.nfl.com/gamecenter/201009...cap/full-story notice it has the fantasy football in it.. Spoiler:
I've tried doing a Code:
remove_tags =[dict(attrs={'style':[""]})] Code:
def postprocess_html(self, soup): for tag in soup.findAll(attrs ={'style':[' ']}): tag.extract() return soup |
![]() |
Advert | |
|
![]() |
#2616 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If that's too much, you could search to see if the table tag has a fantasy football link in it, and extract it only if it does. You can do search and replace, etc. I'd say they are "common problems with someone just learning this stuff?" |
|
![]() |
![]() |
#2617 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
![]() Spoiler:
my understanding of the above is it should find all instances of the <table> tag and then take and look inside that for the https and http links specified. If it finds either of them it will extract it from the soup. otherwise it will continue on. then return the soup without those links yet that doesn't happen ![]() |
|
![]() |
![]() |
#2618 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
New Recipe for Georgia Outdoor News.
Only issue with this is some of the links do not have actual titles but simply the text states Read More. If anyone cares to fix that feel free. This version only includes a print_version() of the page (aka without the pretty pictures). I might update it in the future to include the pics from the non print_version. I didn't do the entire page only the hunting section for deer, waterfowl, wild life management, and then fishing for bass trout and fishing & lake reports. Enjoy. P.S. When loaded on the kindle 2 it seems to cut the text off on the right hand side. I don't know if this is a bug because i seen something similar posted in the bug reports for calibre. But it appears the content is within a table and the user is forced to pan. Maybe someone can help me figure this issue out. Thanks Last edited by TonytheBookworm; 09-04-2010 at 01:36 PM. Reason: Fixed Table issues. Thanks Starson17 :) |
![]() |
![]() |
#2619 |
Enthusiast
![]() Posts: 25
Karma: 10
Join Date: Nov 2008
Device: PRS505, Kindle 3G
|
Hi! I'd like to learn how do do some feeds - I've read the tutorial and the site I'm after doesn't quite work - are there any tips/examples for feed burner based feeds?
Ideally, I'd like to create a recipe for The Daily Mash : http://feeds.feedburner.com/thedailymash Thanks for any help you can give! |
![]() |
![]() |
#2620 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
def preprocess_html(self, soup): for article in soup.findAll('table') : |
|
![]() |
![]() |
#2621 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
You can use: conversion_options = {'linearize_tables' : True} or something like: Code:
def postprocess_html(self, soup, first_fetch): for t in soup.findAll(['table', 'tr', 'td']): t.name = 'div' |
|
![]() |
![]() |
#2622 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
AWWW!!! So that is why that was in there. I seen that in one of the other recipes but wasn't sure why it was there. Let me understand and correctly if I'm wrong. In the postprocess it is finding all instances of the table tr and td and then changing their name to div or making them div tags if you will.. One last thing while on the subject. I wasn't too clear on the postprocess_html parameters. It takes 3 arguments. The first 2 I understand but I'm confused about the first_fetch cause in some recipes I noticed they use first. So are these reserved words and if so what do they do exactly? Thanks again. Learning so much from you!!!
|
![]() |
![]() |
#2623 | |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
First when you get it pulling the feed, then you will be hey that's not how i want it to look. So then you do like I did and go hmmm how do I remove the stuff. So i started doing a search in the recipes for remove and came across remove_tags and remove_tags_after and so on. Then also keep_only. I then took and tried those methods and if they worked I patted myself on the back and if they didn't then i took and posted segments of my code or in some cases the whole code in spoiler and code tags and the good folks on this site will generally help you out in a timely manner given you put for the effort. I know Starson17 has helped me big time along with a few others.. Bottom line is yes it is complicated to learn (heck i'm still figuring it out), but once you start to get the basics. You develop and arsenal to attack almost any feed you are faced with. I for one feel defeated when I work on something for hours and then someone comes along instead of explaining what they done and simply doing it. Yes I'm grateful that they do that, yet on the same token I feel let down because I haven't learned anything.. So give it a try and let us know where we can help. Here take a look at this to give you an idea... This should work for you but read the comments in it so you can get a understand of how i went about it. The only thing that I can't figure out on this is how to remove the style tags to get rid of the digg links and so forth at the bottom.. Spoiler:
Last edited by TonytheBookworm; 09-04-2010 at 03:24 PM. Reason: added Recipe |
|
![]() |
![]() |
#2624 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
There was a typo pointed out in the West Hawaii Today online feed where the local feed didn't have a , in it.
Here is the updated version with the comma in it. |
![]() |
![]() |
#2625 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
You probably haven't used it much, but there is a recursion parameter that causes the recipe to follow links. The result is that links on the article page are fetched and work within the ebook. (By default it's off so links aren't followed/fetched). I have a recipe of food recipes. The main food recipe on page 1 of the article may have a link to another food recipe, like a sauce or a side dish. I have recursion turned on to fetch those related recipes. First_fetch is true only on the first page. |
||
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |