Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 01-11-2010, 11:19 AM   #1126
LessPaul
Connoisseur
LessPaul doesn't litterLessPaul doesn't litter
 
Posts: 50
Karma: 160
Join Date: Jan 2008
Location: Dewitt, MI
Device: Kindle Paperwhite 2021 / PC / iPad
Here is my first attempt at at custom recipe. It is for the German Language course feeds are DW-World.de. I will reuse this same recipe to access the DW-World news feeds, but this is the one I completed first.

I do have one small problem. At the top and bottom of every article are a set of (unwanted) links. The HTML source is:
Code:
<p class="actionFooter"><a href="/dw/article/0,,4529629,00.html">DW-WORLD.DE</a><span>*|*</span><a href="javascript:window.print()">Drucken</a>
</p>
This code occurs at both the top and bottom of the page. Of course the URL number varies from page to page. Note there is a CR between </a> and </p>.

Tips on the best way to eliminate this would be much appreciated. I tried both "remove_tags" and "preprocess_regexps," but in both cases I managed to eliminate not only the offending code, but the entire content of the page. Ooops.

Thanks much.. Paul

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2009, Less Paul <LessPaul at gmail.com>'
'''
dw-world.de
'''

from calibre.web.feeds.news import BasicNewsRecipe

class DW_World_courses(BasicNewsRecipe):
    title                 = 'DW-World - German Courses'
    __author__            = 'LessPaul'
    description           = "German language courses and lesson feeds from the multi-language German news site DW-World.de"
    publisher             = 'Deutsche Welle'
    category              = 'German, Language, Education'
    oldest_article        = 30
    max_articles_per_feed = 100
    language              = 'de'
    lang                  = 'de-DE'
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True

    conversion_options = { 'tags'             : category,
                           'publisher'        : publisher,
                           'language'         : lang
                         }

    feeds          = [(u'Deutsch als Fremdsprache', u'http://rss.dw-world.de/rdf/DKfeed_dkmix_de'), (u'Deutsch im Fokus', u'http://rss.dw-world.de/rdf/DKfeed_dif_de'), (u'Alltagsdeutsch', u'http://rss.dw-world.de/rdf/DKfeed_alltagsdeutsch_de'), (u'Wort der Woche', u'http://rss.dw-world.de/rdf/DKfeed_wortderwoche_de'), (u'Sprachbar', u'http://rss.dw-world.de/rdf/DKfeed_sprachbar_de'), (u'Stichwort', u'http://rss.dw-world.de/rdf/DKfeed_stichwort_de'), (u'Top-Thema mit Vokabeln', u'http://rss.dw-world.de/rdf/DKfeed_topthemamitvokabeln_de'), (u'Langsam gesprochene Nachrichten', u'http://rss.dw-world.de/rdf/DKfeed_lgn_de')]

    def print_version(self, url):
        target = url.rpartition('/')[2]
        print_url = 'http://www.dw-world.de/popups/popup_printcontent/' + target
        return print_url
LessPaul is offline  
Old 01-11-2010, 03:19 PM   #1127
ajrmd
Junior Member
ajrmd began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2009
Device: nook
WSJ resolved

Quote:
Originally Posted by kovidgoyal View Post
log int to wsj and then go to the URL http://online.wsj.com/page/us_in_todays_paper.html

save the html and post it here.
For some strange reason, the bug went away and I am able to download today's paper without difficulties. If a new one should arise, I'll create a dump. Must have been a H1N1 virus? Thanks nonetheless.
ajrmd is offline  
Old 01-11-2010, 06:08 PM   #1128
syfr
Member
syfr began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jan 2010
Device: kindle2
Help ticket request for the Christian Science Monitor paper? Perhaps website has changed?

Thank you
syfr is offline  
Old 01-11-2010, 08:48 PM   #1129
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
hi Paul,

you can use one of the include/remove functions. as the tag is the same for both top and bottom just add:

Code:
remove_tags = [dict(name='p', attrs={'class':'actionFooter'})]
and this should do the job for you!

if you want more control on the look of the output add your own CSS. more useful stuff here:
http://calibre-ebook.com/user_manual...ownloaded-html

lorenzo
lorenzov is offline  
Old 01-11-2010, 09:28 PM   #1130
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
chr_mon.recipe fix

i haven't seen this one working before, therefore the solution provided in the attached might give you slightly different result; at least the pages are not blank!

Kovid, i couldn't find a ticket in the bug tracker, but hopefully is one more thing off your list
Attached Files
File Type: zip chr_mon.zip (1.6 KB, 175 views)
lorenzov is offline  
Old 01-12-2010, 02:23 PM   #1131
jayman
Enthusiast
jayman began at the beginning.
 
Posts: 33
Karma: 10
Join Date: Dec 2009
Device: iphone
AJKD request

Anyone want to try and make a recipe for The American Journal of Kidney Disease? www.ajkd.org. I can pm whomever with account details...
jayman is offline  
Old 01-12-2010, 02:25 PM   #1132
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
Quote:
Originally Posted by jayman View Post
Anyone want to try and make a recipe for The American Journal of Kidney Disease? www.ajkd.org. I can pm whomever with account details...
I'll give it a shot
Krittika Goyal is offline  
Old 01-12-2010, 02:56 PM   #1133
jayman
Enthusiast
jayman began at the beginning.
 
Posts: 33
Karma: 10
Join Date: Dec 2009
Device: iphone
Quote:
Originally Posted by Krittika Goyal View Post
I'll give it a shot
You're awesome. I'll pm you my account details.
jayman is offline  
Old 01-12-2010, 07:46 PM   #1134
elmoglick
Groupie
elmoglick doesn't litterelmoglick doesn't litterelmoglick doesn't litter
 
Posts: 165
Karma: 206
Join Date: Dec 2007
Location: Kansas City
Device: Kindle1, Kindle DX, Kindle DXG
PC Magazine recipe

I haven't had much luck trying to create a recipe for PC Magazine from scratch - I just end up with headlines and graphic boxes. Any help would be most appreciated.
elmoglick is offline  
Old 01-12-2010, 10:08 PM   #1135
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
PC Mag recipe

have a look at this one; the product review feed is not included and it is a first go with nothing fancy under the hood...

i noticed a few things already which can be improved (i.e. some articles spanning more than 1 page, some pics which can be removed etc), but it should get you started!

lorenzo
Attached Files
File Type: zip pcMag.zip (1.2 KB, 170 views)
lorenzov is offline  
Old 01-13-2010, 01:06 AM   #1136
DrStreet
Junior Member
DrStreet began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: Sony Daily Edition
The News & Observer

Could someone please help me setup a recipe for the News & Observer from Raleigh, NC. I have tried for 3 days to get this to work, but I keep ending up with unwanted headings in the table of contents, and the text of the news stories shadowed with only the words "tool name" visible. The feeds that I am trying to retrieve are:

feeds = [
('Cover', 'http://www.newsobserver.com/100/index.rss'),
('News', 'http://www.newsobserver.com/102/index.rss'),
('Politics', 'http://www.newsobserver.com/105/index.rss'),
('Business', 'http://www.newsobserver.com/104/index.rss'),
('Sports', 'http://www.newsobserver.com/103/index.rss'),
('College Sports', 'http://www.newsobserver.com/119/index.rss'),
('Lifestyles', 'http://www.newsobserver.com/106/index.rss'),
('Editorials', 'http://www.newsobserver.com/158/index.rss')]

Any help would be appreciated.
DrStreet is offline  
Old 01-13-2010, 01:49 AM   #1137
elmoglick
Groupie
elmoglick doesn't litterelmoglick doesn't litterelmoglick doesn't litter
 
Posts: 165
Karma: 206
Join Date: Dec 2007
Location: Kansas City
Device: Kindle1, Kindle DX, Kindle DXG
Quote:
Originally Posted by lorenzov View Post
have a look at this one; the product review feed is not included and it is a first go with nothing fancy under the hood...
lorenzo
Thanks, Lorenzo. I'll give it a shot!
elmoglick is offline  
Old 01-13-2010, 10:58 AM   #1138
Krittika Goyal
Vox calibre
Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.Krittika Goyal ought to be getting tired of karma fortunes by now.
 
Krittika Goyal's Avatar
 
Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
Quote:
Originally Posted by DrStreet View Post
Could someone please help me setup a recipe for the News & Observer from Raleigh, NC. I have tried for 3 days to get this to work, but I keep ending up with unwanted headings in the table of contents, and the text of the news stories shadowed with only the words "tool name" visible.
See attached recipe. It will be included in next calibre release.
Attached Files
File Type: zip observer.recipe.zip (822 Bytes, 168 views)

Last edited by Krittika Goyal; 01-13-2010 at 01:04 PM.
Krittika Goyal is offline  
Old 01-13-2010, 12:57 PM   #1139
Briand
Member
Briand began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2009
Location: Halifax, Nova Scotia
Device: Sony PRS-300
Hi,

Wondering if someone could please take a look at The Atlantic recipe. It doesn't seem to download anything but the menu. I see someone has already mentioned the same problem with the Christian Science Monitor.

Thanks
Brian
Sony Pocket Edition
Briand is offline  
Old 01-13-2010, 01:17 PM   #1140
DrStreet
Junior Member
DrStreet began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: Sony Daily Edition
Thank you Krittika, the recipe for the News & Observer worked perfectly.
DrStreet is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:18 AM.


MobileRead.com is a privately owned, operated and funded community.