you can use
def preprocess_raw_html(self, raw, *a):
and do raw.search to check if its print edition and then regex group the date and then parse that date by importing
from calibre.utils.date import parse_date
from datetime import datetime, timedelta
and check
if (today - date) > timedelta(1):
self.abort_article('Skipping old article')
if not print edition or if they're older than a day, use self.abort_article to abort those articles
maybe there are other methods.. figure it out.
look for similar stuff in other recipes.
|