Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 08-01-2009, 01:52 PM   #631
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No you should just need to remove the feeds you dont want
kovidgoyal is offline  
Old 08-01-2009, 03:24 PM   #632
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by hackettt View Post
Kovid and Darko —

Is there something I must do besides eliminating the code for other sections I do not want?
You are complicating things without real need. This is what you need to change in your recipe:

Code:
class AdvancedUserRecipe1249153260(BasicNewsRecipe):
    title          = u'DailyMail'
    oldest_article = 2
    max_articles_per_feed = 100
    no_stylesheets = True
    encoding = 'cp1252'

    keep_only_tags = [dict(name='div', attrs={'id':'js-article-text'})]

    remove_tags = [dict(name='div', attrs={'class':['relatedItems','article-icon-links-container']})]
    
    remove_tags_after = dict(name='h3', attrs={'class':'social-links-title'})
    
    feeds          = [(u'Sports', u'http://www.dailymail.co.uk/sport/index.rss')]

    def print_version(self, url):
        main = url.partition('?')[0]
        return main + '?printingPage=true'
kiklop74 is offline  
Advert
Old 08-03-2009, 04:45 PM   #633
fogus
Member
fogus began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Mar 2009
Device: Sony PRS-505
I did a search but I didn't find anything for the following idea:

I read a lot of fixed width (80 character often) texts. Does anyone have a script to turn these into paragraphized texts?

Some examples:
http://www.ietf.org/rfc/rfc793.txt (RFC: TCP)
http://www.gutenberg.org/files/345/345.txt (Dracula from Gutenberg) (Yes, I know there is an HTML version of that one.)
fogus is offline  
Old 08-03-2009, 08:45 PM   #634
Malakai
Junior Member
Malakai began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
Hey guys im tryin to grab a print version using the advanced version but i need to set the url.replace to change two things in the url.

here an example of the original url

http://www.dpreview.com/reviews/olympusep1/?from=rss

this is the url for the print version

http://www.dpreview.com/reviews/prin...iew=OlympusEP1

how do i get it to remove the /?from=rss at the end

This is what i currently have

def print_version(self, url):
return url.replace('http://www.dpreview.com/reviews/', 'http://www.dpreview.com/reviews/print.asp?review=')
Malakai is offline  
Old 08-04-2009, 09:26 AM   #635
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by Malakai View Post
Hey guys im tryin to grab a print version using the advanced version but i need to set the url.replace to change two things in the url.

here an example of the original url

http://www.dpreview.com/reviews/olympusep1/?from=rss

this is the url for the print version

http://www.dpreview.com/reviews/prin...iew=OlympusEP1

how do i get it to remove the /?from=rss at the end

This is what i currently have

def print_version(self, url):
return url.replace('http://www.dpreview.com/reviews/', 'http://www.dpreview.com/reviews/print.asp?review=')
Try this:

Code:
    def print_version(self, url):
        baseurl = url.rpartition('/?')[0]
        turl = baseurl.partition('/reviews/')[2]
        return 'http://www.dpreview.com/reviews/print.asp?review=' + turl
kiklop74 is offline  
Advert
Old 08-04-2009, 11:39 AM   #636
Malakai
Junior Member
Malakai began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
Thank you so much for that, only thing missing are the pictures lol.
How do i retain those in the finished epub.
Malakai is offline  
Old 08-06-2009, 05:47 AM   #637
jbambridge
Kindle DX
jbambridge began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Aug 2009
Location: The Netherlands
Device: iPad and Kindle DX
Problem parsing guardian rss feed:

I have tried to update the Guardian Recipe to fix some problems with changes in the web site etc. I am almost there, but I am hitting the odd article that causes the following errors in ebook-convert:
Quote:
Parsing feed_0/article_7/index.html ...
Traceback (most recent call last):
File "cli.py", line 254, in <module>
File "cli.py", line 246, in main
File "calibre\ebooks\conversion\plumber.pyo", line 657, in run
File "calibre\ebooks\conversion\plumber.pyo", line 761, in create_oebbook
File "calibre\ebooks\oeb\reader.pyo", line 72, in __call__
File "calibre\ebooks\oeb\reader.pyo", line 588, in _all_from_opf
File "calibre\ebooks\oeb\reader.pyo", line 243, in _manifest_from_opf
File "calibre\ebooks\oeb\reader.pyo", line 176, in _manifest_add_missing
File "calibre\ebooks\oeb\base.pyo", line 988, in fget
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'
The modified recipe is as follows:

Quote:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
www.guardian.co.uk
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Guardian(BasicNewsRecipe):

title = u'My Guardian'
language = _('English')
encoding = 'utf-8'
oldest_article = 7
max_articles_per_feed = 20
remove_javascript = True
simultaneous_downloads = 1
use_embedded_content = False
recursions = 0
filter_regexps = [r'\.g\.doubleclick\.net']

timefmt = ' [%a, %d %b %Y]'

keep_only_tags = [dict(id=['article-wrapper', 'main-article-info'])]




no_stylesheets = True
extra_css = 'h2 {font-size: medium;} \n h1 {text-align: left;}'


feeds = [
('Front Page', 'http://feeds.guardian.co.uk/theguardian/rss'),
# ('UK', 'http://feeds.guardian.co.uk/theguardian/uk/rss'),
# ('Business', 'http://www.guardian.co.uk/business/rss'),
# ('Politics', 'http://feeds.guardian.co.uk/theguardian/politics/rss'),
# ('Culture', 'http://feeds.guardian.co.uk/theguardian/culture/rss'),
# ('Money', 'http://feeds.guardian.co.uk/theguardian/money/rss'),
# ('Life & Style', 'http://feeds.guardian.co.uk/theguardian/lifeandstyle/rss'),
# ('Travel', 'http://feeds.guardian.co.uk/theguardian/travel/rss'),
# ('Environment', 'http://feeds.guardian.co.uk/theguardian/environment/rss')
]

def print_version(self, url):
return url + '/print'
Any ideas what the error means?

John
jbambridge is offline  
Old 08-06-2009, 10:11 AM   #638
jbambridge
Kindle DX
jbambridge began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Aug 2009
Location: The Netherlands
Device: iPad and Kindle DX
One extra thought:

Checking:
Quote:
File "calibre\ebooks\oeb\base.pyo", line 917, in _parse_xhtml
in the source code shows that this is a part of the code that removes empty <a></a> tags. This is indeed the case on the example I gave where the publisher has left a strange link in the text.

Adding a
PHP Code:
remove_tags = [dict(name='a')] 
is a work around, although this also destroys valid <a> tags.

My PHP is not up to fixing the _parse_xhtml code myself though.

Can anyone suggest a better work around (that doesn't delete any valid content) or a fix to the PHP code?

John

P.S. I've attached the offending article as an example of the empty <a> tags. index.txt is after porcessing by the recipe and problem.txt is the original html file.
Attached Files
File Type: txt index.txt (6.0 KB, 241 views)
File Type: txt problem.txt (101.2 KB, 309 views)
jbambridge is offline  
Old 08-06-2009, 10:41 AM   #639
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Will be fixed in next release.
kovidgoyal is offline  
Old 08-06-2009, 11:45 AM   #640
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I tried creating recipe for a new version and was not able to make conversion_options work.

Is it operational at all?

This is what I tried:

Code:
    conversion_options = {  'tags':'aa,bb'
                          , 'publisher': 'pub'
                          , 'comments':  'desc'
                          , 'language': 'en'
                          }
kiklop74 is offline  
Old 08-06-2009, 11:55 AM   #641
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
EDIT: Actually, looking at the code, it should be.
kovidgoyal is offline  
Old 08-06-2009, 12:02 PM   #642
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by kovidgoyal View Post
EDIT: Actually, looking at the code, it should be.
well it does not work. Do you want issue for this?

Last edited by kovidgoyal; 08-06-2009 at 12:23 PM.
kiklop74 is offline  
Old 08-06-2009, 12:24 PM   #643
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,378
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by kiklop74 View Post
well it does not work. Do you want issue for this?
Never mind I'm looking at it now.
kovidgoyal is offline  
Old 08-06-2009, 10:57 PM   #644
jj2me
Guru
jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.jj2me ought to be getting tired of karma fortunes by now.
 
Posts: 820
Karma: 8820388
Join Date: Dec 2008
Device: Sony PRS-505, -350; Kindle 3 3G, DX, PW 2; various tablets
Smmithsonian Magazine - crappy edition

Since I couldn't find the Smithsonian Magazine in a search of this thread, and it's my sister's favorite magazine, I humbly submit this bare minimum effort (don't know Python) in case anyone else might like it and doesn't mind skipping over some poor formatting.

It's merely the RSS assembling from this page. Note that I set oldest_article = 30 for this monthly magazine. Change as you see fit.
Attached Files
File Type: txt Smithsonian_recipe.txt (2.8 KB, 303 views)
jj2me is offline  
Old 08-07-2009, 03:39 AM   #645
AprilHare
Wizard
AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.AprilHare ought to be getting tired of karma fortunes by now.
 
AprilHare's Avatar
 
Posts: 2,981
Karma: 11862367
Join Date: Apr 2008
Device: Sony Reader PRS-T2
Attached is the errors I got when I tried to download the Sydney Morning Herald - too long for a simple post..
Attached Files
File Type: txt SMH_error.txt (252.5 KB, 277 views)
AprilHare is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 02:58 PM.


MobileRead.com is a privately owned, operated and funded community.