Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-28-2017, 11:53 AM   #1
jma1
Connoisseur
jma1 began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Dec 2015
Device: Kindle
Washington Examiner byline - photos

Is it possible to include byline author + date and photos in this recipe? Created using other scripts, not in standard recipe list. Works well for article heading and body as-is. Thanks.

------------------------------------------------------------------------
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net>

from __future__ import (unicode_literals, division, absolute_import,
print_function)
from calibre.web.feeds.news import BasicNewsRecipe


def classes(classes):
q = frozenset(classes.split(' '))
return dict(attrs={
'class': lambda x: x and frozenset(x.split()).intersection(q)})


class WashingtonExaminer(BasicNewsRecipe):
title = u'Washintgon Examiner'
oldest_article = 2
language = 'en'
remove_empty_feeds = True
extra_css = """
body{font-family: Arial,sans-serif }
.caption{font-size: x-small}
.author,.datePub{font-size: small}
"""

__author__ = 'Kovid Goyal'
simultaneous_downloads = 4
max_articles_per_feed = 20
use_embedded_content = False
compress_news_images = True
compress_news_images_auto_size = 8
no_stylesheets = True
use_embedded_content = False
auto_cleanup = True
ignore_duplicate_articles = {'title', 'url'}

feeds = [
('News', 'http://www.washingtonexaminer.com/rss/news'),
('Politics', 'http://www.washingtonexaminer.com/rss/politics'),
('Editorial', 'http://www.washingtonexaminer.com/rss/editorials'),
('Policy', 'http://http://washingtonexaminer.com/rss/policy'),
('Opinion', 'http://www.washingtonexaminer.com/rss/opinion'),
('Columnists', 'http://www.washingtonexaminer.com/rss/columnists'),
('Magazine', 'http://www.washingtonexaminer.com/rss/magazine'),
]

#copied in, not working to present images
def preprocess_html(self, soup):
for img in soup.findAll(attrs={'data-src':True}):
img['src'] = img['data-src']
all_h1s = soup.findAll('h1')
for h1 in all_h1s[1:]:
h1.extract()
return soup
jma1 is offline   Reply With Quote
Old 07-29-2017, 03:41 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You will need to remove auto_cleanup = True and use keep_tags/remove_tags instead.
kovidgoyal is offline   Reply With Quote
Old 08-28-2017, 09:09 PM   #3
jma1
Connoisseur
jma1 began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Dec 2015
Device: Kindle
Washington Examiner article body font size

Kovid,
I redid the new recipe with keep tags/remove tags. I get the articles with byline author and date consistently, and I believe some of the pictures.
Now I could not get the article body to a normal (smaller) font size with extra_css command. The extra_css eliminates bolding of article body text but no response reducing the font. Could you advise on that? Thanks in advance.

Current recipe

Spoiler:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net>

from __future__ import (unicode_literals, division, absolute_import, print_function)
from calibre.web.feeds.news import BasicNewsRecipe

def classes(classes):
q = frozenset(classes.split(' '))
return dict(attrs={
'class': lambda x: x and frozenset(x.split()).intersection(q)})

class WashingtonExaminer(BasicNewsRecipe):
title = u'Washington Examiner'
__author__ = 'Kovid Goyal'
oldest_article = 2
max_articles_per_feed = 10
use_embedded_content = False
compress_news_images = True
compress_news_images_auto_size = 8
no_stylesheets = True
encoding = 'utf8'
use_embedded_content = False

language = 'en'
remove_empty_feeds = True

extra_css = 'body{font-size: 0.8em; font-weight: normal }'
# extra_css = '.author{font-weight: normal; font-size: x-small}'
# extra_css = '.caption{font-size: x-small}'

ignore_duplicate_articles = {'title', 'url'}
keep_only_tags = [
dict(itemprop=['headline', 'author', 'datePublished', 'articleBody']),
dict(name='h1'),
classes('article-body featured-image'),
]

feeds = [
('News', 'http://www.washingtonexaminer.com/rss/news'),

]

def preprocess_html(self, soup):
for img in soup.findAll(attrs={'data-src':True}):
img['src'] = img['data-src']
all_h1s = soup.findAll('h1')
for h1 in all_h1s[1:]:
h1.extract()
return soup


URL for the new source RSS feeds -

http://www.washingtonexaminer.com/rs...n=%2Fnation%2F
jma1 is offline   Reply With Quote
Old 08-28-2017, 10:39 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Add an !important to the css rule to make sure it overrides anything in the input document.
kovidgoyal is offline   Reply With Quote
Old 08-29-2017, 11:10 AM   #5
jma1
Connoisseur
jma1 began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Dec 2015
Device: Kindle
I tried changing the rule to the following, still does not respond with a smaller font for article body text.

extra_css = 'body{font-size: x-small !important; font-weight: normal }'
jma1 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NY Times byline issue jfhutson Recipes 2 01-28-2017 10:09 AM
New Yorker Recipe missing byline info parisfrog Recipes 0 12-16-2014 10:06 AM
The Examiner reviews Singapore author's spy novel, Smokescreen Khaled Talib Self-Promotions by Authors and Publishers 0 12-07-2014 08:01 PM
Author byline is not appearing on my recipe rylsfan Recipes 4 03-02-2011 12:40 PM
Examiner: HP webOS 'PalmPad' tablet will have digital pen kjk News 17 07-22-2010 12:34 PM


All times are GMT -4. The time now is 02:26 AM.


MobileRead.com is a privately owned, operated and funded community.