Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 03-14-2010, 08:57 PM   #1606
Ekips
Member
Ekips began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Mar 2010
Device: PW2, K3gb(x2), K3w, K4, k5(x3) PRS-505s, Stanza for ipod
Quote:
Originally Posted by Starson17 View Post
I'm a beginner, too. Kovid's been riding herd on my efforts, but I'll see if I can help you.

Your recipe looks pretty good. Minor cleanup: You might want to change the def print_version to this:
Code:
    def print_version(self, url):
          url.replace('?OTC-RSS&ATTR=News', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
          url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
          return url
Each replace() just modifies url, so you can do them sequentially in the body, and return url instead of doing a single modification of url in the return line.


I ran the recipe in test mode, so I only pulled two feeds with two articles each. I didn't see any references to Flash. I did see some text "Advertisement" and some "Add a Comment" links that were left. Can you tell me exactly what feed/article you want help on?

Add this to your remove_tags to kill the "Add a Comment" :
Code:
,dict(name='a', attrs={'class':'add_a_comment'})
Do you know the best way to find these?

Use Firefox,
install the Firebug add-on,
open the page you're having trouble with,
find the item you want to remove on the original page (CTRL-F),
right click that item and select "Inspect Element"

It tells you the name, and id or class label of the element.
Then just put that into your remove_tag list.

The "Add a Comment" junk was in an <a> tag with id='addComment' and class= 'add_a_comment'. You could pull it with reference to either the id or the class.

Also, you can condense your 3 removes into one. Here is the line:
Code:
dict(name='div', attrs={'class':['slideshow','float-left','ltbx-slideshow ltbx-btn-ss']})
The 3 keeps can be condensed the same way.

Last comment - I usually add "remove_javascript = True" unless there's some reason not to use it.
Thanks for that, cleaned it up a fair bit, code looks trim too.

There's a few that come back with the
Quote:
You need Flash Player 8 or higher to view video content with the ROO Flash Player. Click here to download and install it.
http://www.thesun.co.uk/sol/homepage...&ATTR=Football
that one for example.

I think its
Code:
<div id="vxFlashPlayer"><div id="vxFlashPlayerContent" style="width: 380px; height: 278px;">
that is doing it, I'm going to try removing that one and let it run.

And a few are coming back as blank, and the £ is coming up as Ł. so I still have some tweaking to do, but I'm finding it interesting (and very distracting)

How do you run the recipe in test mode? I've been running the thing in calibre and downloading the full feeds, takes ages each time
Ekips is offline  
Old 03-14-2010, 10:31 PM   #1607
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
How do you run the recipe in test mode? I've been running the thing in calibre and downloading the full feeds, takes ages each time
ebook-convert myrecipe.recipe output_dir --test -vv
See here and here.

It's much faster. I'll try to look at your problem pages tomorrow (if you haven't already solved the problems).
Starson17 is offline  
Old 03-15-2010, 07:44 PM   #1608
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
I think its
Code:
<div id="vxFlashPlayer"><div id="vxFlashPlayerContent" style="width: 380px; height: 278px;">
that is doing it, I'm going to try removing that one and let it run.
Yes, it's that one.
Code:
dict(name='div', attrs={'id':'vxFlashPlayer'})
will remove it.
Starson17 is offline  
Old 03-15-2010, 10:15 PM   #1609
Hamlet53
Nameless Being
 
Revised SFBG recipe

I had requested a recipe for the San Francisco Bay Guardian, and this was included in the latest version release of Calibre. Unfortunately the stock recipe results in the download of only a small part of the total weekly paper. I understand why as at the main RSS page for the SFBG web site the link labeled “Main Site (everything) “ is not that at all [everything]. Using the stock recipe as I guide I have prepared the expanded version here that obtains not everything, but at least a lot more. That is if anyone else is interested.

Spoiler:

from calibre.web.feeds.news import BasicNewsRecipe

class SanFranciscoBayGuardian(BasicNewsRecipe):
title = u'San Francisco Bay Guardian'
language = 'en'
__author__ = 'Krittika Goyal'
oldest_article = 31 #days
max_articles_per_feed = 25
#encoding = 'latin1'

no_stylesheets = True
#remove_tags_before = dict(name='div', attrs={'id':'story_header'})
#remove_tags_after = dict(name='div', attrs={'id':'shirttail'})
remove_tags = [
dict(name='iframe'),
#dict(name='div', attrs={'class':'related-articles'}),
#dict(name='div', attrs={'id':['story_tools', 'toolbox', 'shirttail', 'comment_widget']}),
#dict(name='ul', attrs={'class':'article-tools'}),
#dict(name='ul', attrs={'id':'story_tabs'}),
]


feeds = [
('sfbg', 'http://www.sfbg.com/rss.xml'),
('politics', 'http://www.sfbg.com/politics/rss.xml'),
('blogs', 'http://www.sfbg.com/blog/rss.xml'),
('pixel_vision', 'http://www.sfbg.com/pixel_vision/rss.xml'),
('bruce', 'http://www.sfbg.com/bruce/rss.xml'),
]


#def preprocess_html(self, soup):
#story = soup.find(name='div', attrs={'id':'story_body'})
#td = heading.findParent(name='td')
#td.extract()
#soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>')
#body = soup.find(name='body')
#body.insert(0, story)
#return soup


 
Old 03-16-2010, 02:04 AM   #1610
jonny109
Junior Member
jonny109 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2010
Device: Sony pocket edition
321gold custom recipe request

would love a custom recipe to create an ebook of all the articles listed on this page:

http://www.321gold.com/archives/archive.php

Thanks!
jonny109 is offline  
Old 03-16-2010, 01:49 PM   #1611
Ekips
Member
Ekips began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Mar 2010
Device: PW2, K3gb(x2), K3w, K4, k5(x3) PRS-505s, Stanza for ipod
Quote:
Originally Posted by Starson17 View Post
Yes, it's that one.
Code:
dict(name='div', attrs={'id':'vxFlashPlayer'})
will remove it.
Sorted that, Also sorted the £ showing up as Ł it was
Code:
encoding= 'iso-8859-1'
Tweaked a few more bits, got the main picture to show up, ok it shows up at the end, but its there.

Does the order you put the keep tags affect the order they show up?

Spoiler:
class AdvancedUserRecipe1268409464(BasicNewsRecipe):
title = u'The Sun'
__author__ = 'Chaz Ralph'
description = 'News from The Sun'
oldest_article = 1
max_articles_per_feed = 100
no_stylesheets = True
extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
charset = 'iso-8859-1'
encoding= 'iso-8859-1'
remove_javascript = True

keep_only_tags = [
dict(name='div', attrs={'class':'medium-centered'})
,dict(name='div', attrs={'class':'article'})
,dict(name='div', attrs={'class':'clear-left'})
,dict(name='div', attrs={'class':'text-center'})
]

remove_tags = [dict(name='div', attrs={'class':'slideshow'})
,dict(name='div', attrs={'class':'float-left'})
,dict(name='div', attrs={'class':'ltbx-slideshow ltbx-btn-ss'})
,dict(name='a', attrs={'class':'add_a_comment'})
,dict(name='div', attrs={'id':'vxFlashPlayerContent'})
,dict(name='div', attrs={'id':'k1006094r1c1t5w380h529'})
,dict(name='div', attrs={'id':'tum_login_form_container'})
,dict(name='div', attrs={'class':'discHeader'})
,dict(name='div', attrs={'class':'margin-bottom-neg-2'})
]


feeds = [(u'News', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article312900.ece')
,(u'Sport', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247732.ece')
,(u'Football', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247739.ece')
,(u'Gizmo', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247829.ece')
,(u'Bizarre', u'http://www.thesun.co.uk/sol/homepage/feeds/rss/article247767.ece')]

def print_version(self, url):
url.replace('?OTC-RSS&ATTR=News', '?print=yes')
url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
return url


that's the updated recipe.

I've been playing with firebug and also installed Python 2.6 and been learning a little of that
Ekips is offline  
Old 03-16-2010, 02:40 PM   #1612
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
Does the order you put the keep tags affect the order they show up?
AFAIK, the order has no effect. If a tag is not in the keep list, it gets dropped (along with all of its contents).
Starson17 is offline  
Old 03-16-2010, 07:08 PM   #1613
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by Starson17 View Post
AFAIK, the order has no effect. If a tag is not in the keep list, it gets dropped (along with all of its contents).

Actually it does have an effect. If for example you have content like this:

PHP Code:
<div class="a"></div>
garbage
<div class="b"></div>
garbage
<div class="c"></div
And you put in your recipe this:

Code:
keep_only_tags=[dict(name='div',attrs={'class':['c','a','b']})]
You will get this result:

Code:
<div class="c"></div>
<div class="a"></div>
<div class="b"></div>
Therefore the order is important.
kiklop74 is offline  
Old 03-16-2010, 08:29 PM   #1614
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiklop74 View Post
Actually it does have an effect. ...
You will get this result:

Code:
<div class="c"></div>
<div class="a"></div>
<div class="b"></div>
Therefore the order is important.
Interesting. I would have expected it to scan each tag it finds, as it finds it, against the filter list of the keep_only tags. That would produce an output that was in the same order as the input, a result that I would have expected.

To get the order-dependent result above, it looks like Calibre's recipe code scans the entire page against the first item in the list (c), then scans the entire page again against the second item (a) and finally scans the entire page a third time against the last item in the list (b).
Starson17 is offline  
Old 03-16-2010, 09:32 PM   #1615
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
That is how it works. And it is actually a good feature as it enables you to easily reorder pieces of page in case where it is necessary.
kiklop74 is offline  
Old 03-16-2010, 10:31 PM   #1616
Hamlet53
Nameless Being
 
East Bay Express

Possible to get East Bay Express added?

http://www.eastbayexpress.com/ebx/Home
 
Old 03-17-2010, 07:05 AM   #1617
Ekips
Member
Ekips began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Mar 2010
Device: PW2, K3gb(x2), K3w, K4, k5(x3) PRS-505s, Stanza for ipod
Quote:
Originally Posted by Starson17 View Post
Yes, it's that one.
Code:
dict(name='div', attrs={'id':'vxFlashPlayer'})
will remove it.
Hi, back again.

Been tweaking and playing and trying to figure out why I'm still pulling up all the slide show tags in the print version.

I checked the Job Details and noticed the url.replace was not working.

This is my cleaned up (thanks to Starson17) url.replace code

Code:
    def print_version(self, url):
          url.replace('?OTC-RSS&ATTR=News' , '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Our+Boys', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Football', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Film', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
          return url
And this is part of the Job Details

Code:
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895923/Soldiers-killed-in-Afghan-blast.html?OTC-RSS&ATTR=Our+Boys
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895648/Grieving-dads-drug-warning.html?OTC-RSS&ATTR=News
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895808/Soup-poison-bid-at-posh-school.html?OTC-RSS&ATTR=News
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895647/Royal-Navy-sends-Swiftsure-class-attack-submarine-to-Falkland-Islands-to-boost-security.html?OTC-RSS&ATTR=Our+Boys
So I #'d out the original url.replace and wrote in a single one, like so.

Code:
    def print_version(self, url):
          return url.replace('OTC-RSS&ATTR=News', 'print=yes')

#    def print_version(self, url):
#          url.replace('?OTC-RSS&ATTR=News' , '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Football', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=Film', '?print=yes')
#          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
#          return url
And on checking the Job Details, it worked.

Code:
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895923/Soldiers-killed-in-Afghan-blast.html?OTC-RSS&ATTR=Our+Boys
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895648/Grieving-dads-drug-warning.html?print=yes
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895808/Soup-poison-bid-at-posh-school.html?print=yes
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895647/Royal-Navy-sends-Swiftsure-class-attack-submarine-to-Falkland-Islands-to-boost-security.html?OTC-RSS&ATTR=Our+Boys
I left out the '?' in the code so I thought I might as well check that.

Code:
    def print_version(self, url):
          url.replace('OTC-RSS&ATTR=News' , 'print=yes'),
          url.replace('?OTC-RSS&ATTR=Royals', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Our+Boys', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Gizmo', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Boxing', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Cricket', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Football', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Rugby+Union', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Tv', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Bizarre', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Usa', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=Film', '?print=yes'),
          url.replace('?OTC-RSS&ATTR=HomePage', '?print=yes')
          return url
But that didn't work either.

Code:
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895923/Soldiers-killed-in-Afghan-blast.html?OTC-RSS&ATTR=Our+Boys
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895648/Grieving-dads-drug-warning.html?OTC-RSS&ATTR=News
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/2895808/Soup-poison-bid-at-posh-school.html?OTC-RSS&ATTR=News
Downloading
Fetching http://www.thesun.co.uk/sol/homepage/news/campaigns/our_boys/2895647/Royal-Navy-sends-Swiftsure-class-attack-submarine-to-Falkland-Islands-to-boost-security.html?OTC-RSS&ATTR=Our+Boys
So can anyone point me out what I've done wrong?

Is their a way to replace everything after
Code:
.html?
with the
Code:
?Print=Yes
no matter what it says?
Ekips is offline  
Old 03-17-2010, 07:18 AM   #1618
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiklop74 View Post
That is how it works. And it is actually a good feature as it enables you to easily reorder pieces of page in case where it is necessary.
Thanks for the information - it will be a useful trick.
Starson17 is offline  
Old 03-17-2010, 10:12 AM   #1619
tulsa
Zealot
tulsa has a complete set of Star Wars action figures.tulsa has a complete set of Star Wars action figures.tulsa has a complete set of Star Wars action figures.tulsa has a complete set of Star Wars action figures.tulsa has a complete set of Star Wars action figures.
 
tulsa's Avatar
 
Posts: 135
Karma: 488
Join Date: Mar 2010
Location: Tulsa, OK, USA
Device: Kindle 2, Sony PRS 900
Arrow Tulsa World News Feed

Hello all, I've been trying to get a custom recipe for the following feed:
http://www.tulsaworld.com/site/rss/rss.aspx?group=1

As an example, one of the stories links to here:
Code:
http://www.tulsaworld.com/news/article.aspx?subjectid=337&articleid=20100317_13_A6_PopeBe928321&rss_lnk=1
The print version of the same news article is like this:
Code:
http://www.tulsaworld.com/site/printerfriendlystory.aspx?articleid=20100317_13_A6_PopeBe928321
In the manual for calibre it says to have the URL recreated so the print version gets imported into calibre for the recipe, but I can't figure it out!

Can anybody help me out?
thanks!
tulsa is offline  
Old 03-17-2010, 10:28 AM   #1620
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Ekips View Post
Is their a way to replace everything after
Code:
.html?
with the
Code:
?Print=Yes
no matter what it says?
Yes.

I don't think you actually want to do that, though. Look carefully at what you wrote, since doing exactly what you propose would put in two "??".


Aside from that, yes, it's possible.....

I'm not at home, so I can't write it for you, but there are lots of sample recipes to look at. One way would be to simply use a compiled regular expression to modify url.

You'd need "import re" in the imports and you'd:
Code:
    def print_version(self, url):
          search and replace in the url string ".html?.*first_char_after_string_to_be_replaced" with ".html?Print=Yesfirst_char_after_string_to_be_replaced"
          return url
Note that '?' is a special character in a regex so you'll need to "escape" it as '\?'

If you have a decent search tool, just look at some built-in recipes that have "import re" in them for the syntax you need. Or look at string tools for Python. I've got lots of samples at home, and I just cut and paste them, so I can't remember the syntax. If you're still having trouble, I'm sure kiklop can give you code, or I'll look for some when I get home. Just think of "url" as a string that you use Python string manipulation functions to modify before it gets returned.
Starson17 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:54 AM.


MobileRead.com is a privately owned, operated and funded community.