Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-17-2009, 12:23 PM   #226
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
This is my final version of recipe that looks ok in ebook viewer:

Code:
class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
    extra_css = ' p {font-size: medium; font-weight: normal;} '
    
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]
    
    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
kiklop74 is offline  
Old 02-17-2009, 12:27 PM   #227
kitzj0
Member
kitzj0 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Feb 2009
Device: PRS-505
I am sure as well. I am happy with what I got working, not really a big deal to go through the Sony library. You have been a tremendous help kiklop! Perhaps if I had hundreds of feeds it would be a pain, but luckily just having issues with the one feed.
kitzj0 is offline  
Advert
Old 02-17-2009, 12:28 PM   #228
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
But it does not work in adobe digital edition. So this is clearly bug in epub generation. Reported issue #1874
kiklop74 is offline  
Old 02-17-2009, 03:02 PM   #229
xianfox
Ebook Addict
xianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it is
 
xianfox's Avatar
 
Posts: 225
Karma: 2136
Join Date: Jul 2003
Location: Appleton, Wisconsin, USA
Device: Onyx BOOX Note Air 4C, Palma
I'm really close on a custom recipe for my local paper. Here is the code I currently have:

Code:
class AdvancedUserRecipe1234841996(BasicNewsRecipe):
    title          = u'Appleton Post Crescent'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True' 
    remove_tags = [dict(name='div', attrs={'class':'article-tools'})]
    keep_only_tags     = [dict(name='div', attrs={'class':['article-headline', 'article-bodytext']})]
    
    feeds          = [(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'), (u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'), (u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml')]
The problem is, in the head section of their pages they have a malformed comment that looks like this:

Code:
<!--- OAS MACRO --->
My Sony Reader won't display the resulting output due to this malformed comment. I've tested it by manually removing it from the generated epub file and it works flawlessly.

Can anyone help with a brief bit of code that I can add to my recipe to remove this stubborn comment?

Thanks
xianfox is offline  
Old 02-17-2009, 03:38 PM   #230
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,396
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by xianfox View Post
I'm really close on a custom recipe for my local paper. Here is the code I currently have:

Code:
class AdvancedUserRecipe1234841996(BasicNewsRecipe):
    title          = u'Appleton Post Crescent'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True' 
    remove_tags = [dict(name='div', attrs={'class':'article-tools'})]
    keep_only_tags     = [dict(name='div', attrs={'class':['article-headline', 'article-bodytext']})]
    
    feeds          = [(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'), (u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'), (u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml')]
The problem is, in the head section of their pages they have a malformed comment that looks like this:

Code:
<!--- OAS MACRO --->
My Sony Reader won't display the resulting output due to this malformed comment. I've tested it by manually removing it from the generated epub file and it works flawlessly.

Can anyone help with a brief bit of code that I can add to my recipe to remove this stubborn comment?

Thanks
This will be removed automatically in the next release of calibre. Incidentally there's nothing wrong with the code, its a bug in Adobe DE that causes it to fail to handle it.
kovidgoyal is offline  
Advert
Old 02-17-2009, 03:45 PM   #231
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Commenting A Recipe

Quote:
Originally Posted by kiklop74 View Post
The original Ars Technica recipe did have a problem with article length. Here is completely rewritten recipe that works well. Tested with both LRF and EPUB.
Hi,

Thanks for the Ars Technica recipe.

Big request: would you mind commenting each segment of the source code so that I know what each is doing? I think that would help me to figure out how I can solve similar problems in other recipes I've experimented with.

I need to know which lines of code in your revised Ars Technica recipe fetches the rest of an article that is spread across two or more Web pages.

Thanks in advance...

Xanthan Gum
XanthanGum is offline  
Old 02-17-2009, 03:47 PM   #232
xianfox
Ebook Addict
xianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it isxianfox knows what time it is
 
xianfox's Avatar
 
Posts: 225
Karma: 2136
Join Date: Jul 2003
Location: Appleton, Wisconsin, USA
Device: Onyx BOOX Note Air 4C, Palma
Quote:
Originally Posted by kovidgoyal View Post
This will be removed automatically in the next release of calibre. Incidentally there's nothing wrong with the code, its a bug in Adobe DE that causes it to fail to handle it.
Thanks for the info regarding this. Your attention to detail is much appreciated.
xianfox is offline  
Old 02-17-2009, 04:00 PM   #233
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by XanthanGum View Post
Hi,
I need to know which lines of code in your revised Ars Technica recipe fetches the rest of an article that is spread across two or more Web pages.
Actually this recipe does not handle such case. I did not stumble upon such page in ars technica. To support that I would need to add more code.

Would you mind pointing me to a specific story link that goes on two pages?
kiklop74 is offline  
Old 02-17-2009, 04:43 PM   #234
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Ars Technica Article in two parts

Quote:
Originally Posted by kiklop74 View Post
Actually this recipe does not handle such case. I did not stumble upon such page in ars technica. To support that I would need to add more code.

Would you mind pointing me to a specific story link that goes on two pages?
kiklop74,

Here's an example. It's the second article that appears on the Ars Technica home page tonight:

http://arstechnica.com/gaming/news/2...a-bad-idea.ars

Xanthan Gum
XanthanGum is offline  
Old 02-17-2009, 04:57 PM   #235
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Ars Technica Not Fetching Entire Article

Quote:
Originally Posted by XanthanGum View Post
kiklop74,

Here's an example. It's the second article that appears on the Ars Technica home page tonight:

http://arstechnica.com/gaming/news/2...a-bad-idea.ars

Xanthan Gum
kiklop74,

It seems that the revised Ars Technica article is not fetching the second half of the article.

Xanthan
XanthanGum is offline  
Old 02-17-2009, 05:40 PM   #236
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Quote:
Originally Posted by XanthanGum View Post
kiklop74,

It seems that the revised Ars Technica article is not fetching the second half of the article.

Xanthan
That should be: "...the revised Ars Technica recipe..."

Xanthan Gum
XanthanGum is offline  
Old 02-17-2009, 06:45 PM   #237
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
As I already stated above the recipe was never designed to fetch multi page articles.
kiklop74 is offline  
Old 02-17-2009, 07:07 PM   #238
Hypernova
Hyperreader
Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.Hypernova solves Fermat’s last theorem while doing the crossword.
 
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
Physicsworld recipes

Code:
import re

class AdvancedUserRecipe1234495609(BasicNewsRecipe):
    title          = u'Physicsworld'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after = [dict(name='div', attrs={'id':'shareThis'})]
    preprocess_regexps = [
   (re.compile(r'<div id="shareThis">.*</body>', re.DOTALL|re.IGNORECASE),
    lambda match: '</body>'),
]    
    feeds          = [
                          (u'Headlines News', u'http://feeds.feedburner.com/PhysicsWorldNews')
                      ]
Note that to ensure that calibre can get all the article, you need to login. Making a custom recipe with login is, however, beyond my skill.
Hypernova is offline  
Old 02-17-2009, 07:56 PM   #239
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,396
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by Hypernova View Post
Code:
import re

class AdvancedUserRecipe1234495609(BasicNewsRecipe):
    title          = u'Physicsworld'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after = [dict(name='div', attrs={'id':'shareThis'})]
    preprocess_regexps = [
   (re.compile(r'<div id="shareThis">.*</body>', re.DOTALL|re.IGNORECASE),
    lambda match: '</body>'),
]    
    feeds          = [
                          (u'Headlines News', u'http://feeds.feedburner.com/PhysicsWorldNews')
                      ]
Note that to ensure that calibre can get all the article, you need to login. Making a custom recipe with login is, however, beyond my skill.
The next release of calibre will have a recipe for physics world with login
kovidgoyal is offline  
Old 02-17-2009, 08:34 PM   #240
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Updated recipe Ars technica with multipage news support
Attached Files
File Type: zip ars_technica.zip (1.3 KB, 430 views)
kiklop74 is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 06:46 PM.


MobileRead.com is a privately owned, operated and funded community.