|
Need Recipe Help for an Alaskan Site
I've been working on a recipe for a few hours and can get article titles if I just do the simple alaskalandmine.com/feed or /rss but when I use inspector for finding the body text on the website and plug that into a recipe, the recipe never actually downloads the page/article details. The site itself uses the convention site/landmines/specificArticle. I tried ebook-converter and it failed to pull up the article pages at all, and instead used poor links from the /feed or /rss download.
Any help is appreciated. I'd like help figuring it out, but I'd also like help learning what I'm getting wrong so I can be better at this going forward. I have a lot of Alaskan media sites to do and am getting stuck on the first one I tried. Code below:
vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe
class AlaskaLandmineAdvanced(BasicNewsRecipe):
title = 'Alaska Landmine'
description = 'Insider Alaska political news and gossip - Full Text Extraction'
oldest_article = 30
max_articles_per_feed = 100
auto_cleanup = False
keep_only_tags = [
dict(name='div', attrs={'id': 'ktmain'})
]
remove_tags = [
dict(name='div', attrs={'class': ['sharedaddy', 'jp-relatedposts', 'wpcnt', 'yarpp-related']}),
dict(name='div', attrs={'id': ['comments', 'respond', 'secondary', 'wpdiscuz-loading-bar', 'wpdiscuz-comment-message']})
]
feeds = [
('Alaska Landmine', 'https://alaskalandmine.com/feed/'),
]
calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36'
|