|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
MIT Technology Review, the recipe still works but without header content.
Code:
articleHeaderRegex= '^.*contentHeader__wrapper.*$'
editorLetterHeaderRegex = "^.*contentHeader--vertical__wrapper.*$"
articleContentRegex = "^.*contentbody__wrapper.*$"
imagePlaceHolderRegex = "^.*image__placeholder.*$"
advertisementRegex = "^.*sliderAd__wrapper.*$"
keep_only_tags = [
dict(name='header', attrs={'class': re.compile(editorLetterHeaderRegex, re.IGNORECASE)}),
dict(name='header', attrs={'class': re.compile(articleHeaderRegex, re.IGNORECASE)}),
dict(name='div', attrs={'class': re.compile(articleContentRegex, re.IGNORECASE)})
]
remove_tags = [
dict(name="aside"),
dict(name="svg"),
dict(name="blockquote"),
dict(name="img", attrs={'class': re.compile(imagePlaceHolderRegex, re.IGNORECASE)}),
dict(name="div", attrs={'class': re.compile(advertisementRegex, re.IGNORECASE)}),
https://github.com/kovidgoyal/calibre/blob/3dd95981398777f3c958e733209f3583e783b98c/recipes/mit_technology_review.recipe Only the contentBody__wrapper works which is the body & most of the article. the contentHeader__wrapper is to be changed, but from what i found is that there's different header tags for different articles. contentArticleHeader--fullBleed__intro--30Y0q contentArticleHeader__title--rp01p contentArticleHeader--vertical__intro--2soVS help find an easier way to do this. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
| Advert | |
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Complete Works of Plato - A PDF Header Question | Blaineoreski | Conversion | 7 | 06-08-2023 12:18 AM |
| recipe for Technology Review - german | schuster | Recipes | 1 | 06-05-2016 08:17 AM |
| MIT Technology Review print/bimonthly | truth1ness | Recipes | 7 | 04-15-2015 01:43 AM |
| Calibre: Header entfernen nicht mit aktueller Version ?? | KimJ | Software | 5 | 01-06-2010 01:39 AM |
| Sony Reader reviewed by MIT Technology Review | Bob Russell | Sony Reader | 38 | 11-09-2006 06:04 PM |