Custom recipes (archive, read-only) - Page 16

kiklop74 · 02-17-2009, 01:23 PM

This is my final version of recipe that looks ok in ebook viewer:

Code:

class AdvancedUserRecipe1234144423(BasicNewsRecipe):
    title          = u'Cincinnati Enquirer'
    oldest_article = 7
    language       = _('English')
    __author__     = 'Joseph Kitzmiller'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    encoding = 'cp1252'
    extra_css = ' p {font-size: medium; font-weight: normal;} '
    
    keep_only_tags = [dict(name='div', attrs={'class':'padding'})]
    
    remove_tags = [
                     dict(name=['object','link','table','embed'])
                    ,dict(name='div',attrs={'id':'pluckcomments'})
                    ,dict(name='div',attrs={'class':'articleflex-container'})
                  ]
   
    feeds          = [(u'Cincinnati Enquirer', u'http://rss.cincinnati.com/apps/pbcs.dll/section?category=rssenq01&mime=xml')]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(face=True):
            del item['face']
        return soup

kitzj0 · 02-17-2009, 01:27 PM

I am sure as well. I am happy with what I got working, not really a big deal to go through the Sony library. You have been a tremendous help kiklop! Perhaps if I had hundreds of feeds it would be a pain, but luckily just having issues with the one feed.

kiklop74 · 02-17-2009, 01:28 PM

But it does not work in adobe digital edition. So this is clearly bug in epub generation. Reported issue #1874

xianfox · 02-17-2009, 04:02 PM

I'm really close on a custom recipe for my local paper. Here is the code I currently have:

Code:

class AdvancedUserRecipe1234841996(BasicNewsRecipe):
    title          = u'Appleton Post Crescent'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True' 
    remove_tags = [dict(name='div', attrs={'class':'article-tools'})]
    keep_only_tags     = [dict(name='div', attrs={'class':['article-headline', 'article-bodytext']})]
    
    feeds          = [(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'), (u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'), (u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml')]

The problem is, in the head section of their pages they have a malformed comment that looks like this:

Code:

<!--- OAS MACRO --->

My Sony Reader won't display the resulting output due to this malformed comment. I've tested it by manually removing it from the generated epub file and it works flawlessly.

Can anyone help with a brief bit of code that I can add to my recipe to remove this stubborn comment?

Thanks

kovidgoyal · 02-17-2009, 04:38 PM

Quote:

Originally Posted by xianfox

I'm really close on a custom recipe for my local paper. Here is the code I currently have:

Code:

class AdvancedUserRecipe1234841996(BasicNewsRecipe):
    title          = u'Appleton Post Crescent'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript     = True
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True' 
    remove_tags = [dict(name='div', attrs={'class':'article-tools'})]
    keep_only_tags     = [dict(name='div', attrs={'class':['article-headline', 'article-bodytext']})]
    
    feeds          = [(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'), (u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'), (u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml')]

The problem is, in the head section of their pages they have a malformed comment that looks like this:

Code:

<!--- OAS MACRO --->

My Sony Reader won't display the resulting output due to this malformed comment. I've tested it by manually removing it from the generated epub file and it works flawlessly.

Can anyone help with a brief bit of code that I can add to my recipe to remove this stubborn comment?

Thanks

This will be removed automatically in the next release of calibre. Incidentally there's nothing wrong with the code, its a bug in Adobe DE that causes it to fail to handle it.

XanthanGum · 02-17-2009, 04:45 PM

Quote:

Originally Posted by kiklop74

The original Ars Technica recipe did have a problem with article length. Here is completely rewritten recipe that works well. Tested with both LRF and EPUB.

Hi,

Thanks for the Ars Technica recipe.

Big request: would you mind commenting each segment of the source code so that I know what each is doing? I think that would help me to figure out how I can solve similar problems in other recipes I've experimented with.

I need to know which lines of code in your revised Ars Technica recipe fetches the rest of an article that is spread across two or more Web pages.

Thanks in advance...

Xanthan Gum

xianfox · 02-17-2009, 04:47 PM

Quote:

Originally Posted by kovidgoyal

This will be removed automatically in the next release of calibre. Incidentally there's nothing wrong with the code, its a bug in Adobe DE that causes it to fail to handle it.

Thanks for the info regarding this. Your attention to detail is much appreciated.

kiklop74 · 02-17-2009, 05:00 PM

Quote:

Originally Posted by XanthanGum

Hi,
I need to know which lines of code in your revised Ars Technica recipe fetches the rest of an article that is spread across two or more Web pages.

Actually this recipe does not handle such case. I did not stumble upon such page in ars technica. To support that I would need to add more code.

Would you mind pointing me to a specific story link that goes on two pages?

XanthanGum · 02-17-2009, 05:43 PM

Quote:

Originally Posted by kiklop74

Actually this recipe does not handle such case. I did not stumble upon such page in ars technica. To support that I would need to add more code.

Would you mind pointing me to a specific story link that goes on two pages?

kiklop74,

Here's an example. It's the second article that appears on the Ars Technica home page tonight:

http://arstechnica.com/gaming/news/2...a-bad-idea.ars

Xanthan Gum

XanthanGum · 02-17-2009, 05:57 PM

Quote:

Originally Posted by XanthanGum

kiklop74,

Here's an example. It's the second article that appears on the Ars Technica home page tonight:

http://arstechnica.com/gaming/news/2...a-bad-idea.ars

Xanthan Gum

kiklop74,

It seems that the revised Ars Technica article is not fetching the second half of the article.

Xanthan

XanthanGum · 02-17-2009, 06:40 PM

Quote:

Originally Posted by XanthanGum

kiklop74,

It seems that the revised Ars Technica article is not fetching the second half of the article.

Xanthan

That should be: "...the revised Ars Technica recipe..."

Xanthan Gum

kiklop74 · 02-17-2009, 07:45 PM

As I already stated above the recipe was never designed to fetch multi page articles.

Hypernova · 02-17-2009, 08:07 PM

Code:

import re

class AdvancedUserRecipe1234495609(BasicNewsRecipe):
    title          = u'Physicsworld'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after = [dict(name='div', attrs={'id':'shareThis'})]
    preprocess_regexps = [
   (re.compile(r'<div id="shareThis">.*</body>', re.DOTALL|re.IGNORECASE),
    lambda match: '</body>'),
]    
    feeds          = [
                          (u'Headlines News', u'http://feeds.feedburner.com/PhysicsWorldNews')
                      ]

Note that to ensure that calibre can get all the article, you need to login. Making a custom recipe with login is, however, beyond my skill.

kovidgoyal · 02-17-2009, 08:56 PM

Quote:

Originally Posted by Hypernova

Code:

import re

class AdvancedUserRecipe1234495609(BasicNewsRecipe):
    title          = u'Physicsworld'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    remove_javascript     = True
    remove_tags_before = dict(name='h1')
    remove_tags_after = [dict(name='div', attrs={'id':'shareThis'})]
    preprocess_regexps = [
   (re.compile(r'<div id="shareThis">.*</body>', re.DOTALL|re.IGNORECASE),
    lambda match: '</body>'),
]    
    feeds          = [
                          (u'Headlines News', u'http://feeds.feedburner.com/PhysicsWorldNews')
                      ]

Note that to ensure that calibre can get all the article, you need to login. Making a custom recipe with login is, however, beyond my skill.

The next release of calibre will have a recipe for physics world with login

kiklop74 · 02-17-2009, 09:34 PM

Updated recipe Ars technica with multipage news support

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

02-17-2009, 01:27 PM	#227
kitzj0 Member Posts: 13 Karma: 10 Join Date: Feb 2009 Device: PRS-505	I am sure as well. I am happy with what I got working, not really a big deal to go through the Sony library. You have been a tremendous help kiklop! Perhaps if I had hundreds of feeds it would be a pain, but luckily just having issues with the one feed.

02-17-2009, 01:28 PM	#228
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	But it does not work in adobe digital edition. So this is clearly bug in epub generation. Reported issue #1874

02-17-2009, 07:45 PM	#237
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	As I already stated above the recipe was never designed to fetch multi page articles.

Advert

Advert