Washington Examiner byline - photos

jma1 · 07-28-2017, 11:53 AM

Is it possible to include byline author + date and photos in this recipe? Created using other scripts, not in standard recipe list. Works well for article heading and body as-is. Thanks.

------------------------------------------------------------------------
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net>

from __future__ import (unicode_literals, division, absolute_import,
print_function)
from calibre.web.feeds.news import BasicNewsRecipe

def classes(classes):
q = frozenset(classes.split(' '))
return dict(attrs={
'class': lambda x: x and frozenset(x.split()).intersection(q)})

class WashingtonExaminer(BasicNewsRecipe):
title = u'Washintgon Examiner'
oldest_article = 2
language = 'en'
remove_empty_feeds = True
extra_css = """
body{font-family: Arial,sans-serif }
.caption{font-size: x-small}
.author,.datePub{font-size: small}
"""

__author__ = 'Kovid Goyal'
simultaneous_downloads = 4
max_articles_per_feed = 20
use_embedded_content = False
compress_news_images = True
compress_news_images_auto_size = 8
no_stylesheets = True
use_embedded_content = False
auto_cleanup = True
ignore_duplicate_articles = {'title', 'url'}

feeds = [
('News', 'http://www.washingtonexaminer.com/rss/news'),
('Politics', 'http://www.washingtonexaminer.com/rss/politics'),
('Editorial', 'http://www.washingtonexaminer.com/rss/editorials'),
('Policy', 'http://http://washingtonexaminer.com/rss/policy'),
('Opinion', 'http://www.washingtonexaminer.com/rss/opinion'),
('Columnists', 'http://www.washingtonexaminer.com/rss/columnists'),
('Magazine', 'http://www.washingtonexaminer.com/rss/magazine'),
]

#copied in, not working to present images
def preprocess_html(self, soup):
for img in soup.findAll(attrs={'data-src':True}):
img['src'] = img['data-src']
all_h1s = soup.findAll('h1')
for h1 in all_h1s[1:]:
h1.extract()
return soup

kovidgoyal · 07-29-2017, 03:41 AM

You will need to remove auto_cleanup = True and use keep_tags/remove_tags instead.

jma1 · 08-28-2017, 09:09 PM

Kovid,
I redid the new recipe with keep tags/remove tags. I get the articles with byline author and date consistently, and I believe some of the pictures.
Now I could not get the article body to a normal (smaller) font size with extra_css command. The extra_css eliminates bolding of article body text but no response reducing the font. Could you advise on that? Thanks in advance.

Current recipe

Spoiler:

URL for the new source RSS feeds -

http://www.washingtonexaminer.com/rs...n=%2Fnation%2F

kovidgoyal · 08-28-2017, 10:39 PM

Add an !important to the css rule to make sure it overrides anything in the input document.

jma1 · 08-29-2017, 11:10 AM

I tried changing the rule to the following, still does not respond with a smaller font for article body text.

extra_css = 'body{font-size: x-small !important; font-weight: normal }'

07-28-2017, 11:53 AM	#1
jma1 Connoisseur Posts: 82 Karma: 10 Join Date: Dec 2015 Device: Kindle	Washington Examiner byline - photos Is it possible to include byline author + date and photos in this recipe? Created using other scripts, not in standard recipe list. Works well for article heading and body as-is. Thanks. ------------------------------------------------------------------------ #!/usr/bin/env python2 # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net> from __future__ import (unicode_literals, division, absolute_import, print_function) from calibre.web.feeds.news import BasicNewsRecipe def classes(classes): q = frozenset(classes.split(' ')) return dict(attrs={ 'class': lambda x: x and frozenset(x.split()).intersection(q)}) class WashingtonExaminer(BasicNewsRecipe): title = u'Washintgon Examiner' oldest_article = 2 language = 'en' remove_empty_feeds = True extra_css = """ body{font-family: Arial,sans-serif } .caption{font-size: x-small} .author,.datePub{font-size: small} """ __author__ = 'Kovid Goyal' simultaneous_downloads = 4 max_articles_per_feed = 20 use_embedded_content = False compress_news_images = True compress_news_images_auto_size = 8 no_stylesheets = True use_embedded_content = False auto_cleanup = True ignore_duplicate_articles = {'title', 'url'} feeds = [ ('News', 'http://www.washingtonexaminer.com/rss/news'), ('Politics', 'http://www.washingtonexaminer.com/rss/politics'), ('Editorial', 'http://www.washingtonexaminer.com/rss/editorials'), ('Policy', 'http://http://washingtonexaminer.com/rss/policy'), ('Opinion', 'http://www.washingtonexaminer.com/rss/opinion'), ('Columnists', 'http://www.washingtonexaminer.com/rss/columnists'), ('Magazine', 'http://www.washingtonexaminer.com/rss/magazine'), ] #copied in, not working to present images def preprocess_html(self, soup): for img in soup.findAll(attrs={'data-src':True}): img['src'] = img['data-src'] all_h1s = soup.findAll('h1') for h1 in all_h1s[1:]: h1.extract() return soup

08-28-2017, 09:09 PM	#3
jma1 Connoisseur Posts: 82 Karma: 10 Join Date: Dec 2015 Device: Kindle	Washington Examiner article body font size Kovid, I redid the new recipe with keep tags/remove tags. I get the articles with byline author and date consistently, and I believe some of the pictures. Now I could not get the article body to a normal (smaller) font size with extra_css command. The extra_css eliminates bolding of article body text but no response reducing the font. Could you advise on that? Thanks in advance. Current recipe Spoiler: #!/usr/bin/env python2 # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2016, Kovid Goyal <kovid at kovidgoyal.net> from __future__ import (unicode_literals, division, absolute_import, print_function) from calibre.web.feeds.news import BasicNewsRecipe def classes(classes): q = frozenset(classes.split(' ')) return dict(attrs={ 'class': lambda x: x and frozenset(x.split()).intersection(q)}) class WashingtonExaminer(BasicNewsRecipe): title = u'Washington Examiner' __author__ = 'Kovid Goyal' oldest_article = 2 max_articles_per_feed = 10 use_embedded_content = False compress_news_images = True compress_news_images_auto_size = 8 no_stylesheets = True encoding = 'utf8' use_embedded_content = False language = 'en' remove_empty_feeds = True extra_css = 'body{font-size: 0.8em; font-weight: normal }' # extra_css = '.author{font-weight: normal; font-size: x-small}' # extra_css = '.caption{font-size: x-small}' ignore_duplicate_articles = {'title', 'url'} keep_only_tags = [ dict(itemprop=['headline', 'author', 'datePublished', 'articleBody']), dict(name='h1'), classes('article-body featured-image'), ] feeds = [ ('News', 'http://www.washingtonexaminer.com/rss/news'), ] def preprocess_html(self, soup): for img in soup.findAll(attrs={'data-src':True}): img['src'] = img['data-src'] all_h1s = soup.findAll('h1') for h1 in all_h1s[1:]: h1.extract() return soup URL for the new source RSS feeds - http://www.washingtonexaminer.com/rs...n=%2Fnation%2F

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
NY Times byline issue	jfhutson	Recipes	2	01-28-2017 10:09 AM
New Yorker Recipe missing byline info	parisfrog	Recipes	0	12-16-2014 10:06 AM
The Examiner reviews Singapore author's spy novel, Smokescreen	Khaled Talib	Self-Promotions by Authors and Publishers	0	12-07-2014 08:01 PM
Author byline is not appearing on my recipe	rylsfan	Recipes	4	03-02-2011 12:40 PM
Examiner: HP webOS 'PalmPad' tablet will have digital pen	kjk	News	17	07-22-2010 12:34 PM

07-29-2017, 03:41 AM	#2
kovidgoyal creator of calibre Posts: 43,858 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You will need to remove auto_cleanup = True and use keep_tags/remove_tags instead.

08-28-2017, 10:39 PM	#4
kovidgoyal creator of calibre Posts: 43,858 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Add an !important to the css rule to make sure it overrides anything in the input document.

08-29-2017, 11:10 AM	#5
jma1 Connoisseur Posts: 82 Karma: 10 Join Date: Dec 2015 Device: Kindle	I tried changing the rule to the following, still does not respond with a smaller font for article body text. extra_css = 'body{font-size: x-small !important; font-weight: normal }'