Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-02-2022, 02:51 AM   #1
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
MIT Technology Review, the recipe still works but without header content.

Code:
articleHeaderRegex= '^.*contentHeader__wrapper.*$'
    editorLetterHeaderRegex = "^.*contentHeader--vertical__wrapper.*$"
    articleContentRegex = "^.*contentbody__wrapper.*$"
    imagePlaceHolderRegex = "^.*image__placeholder.*$"
    advertisementRegex = "^.*sliderAd__wrapper.*$"

    keep_only_tags = [
        dict(name='header',  attrs={'class': re.compile(editorLetterHeaderRegex, re.IGNORECASE)}),
        dict(name='header',  attrs={'class': re.compile(articleHeaderRegex, re.IGNORECASE)}),
        dict(name='div',  attrs={'class': re.compile(articleContentRegex, re.IGNORECASE)})
    ]
    remove_tags = [
        dict(name="aside"),
        dict(name="svg"),
        dict(name="blockquote"),
        dict(name="img", attrs={'class': re.compile(imagePlaceHolderRegex, re.IGNORECASE)}),
        dict(name="div", attrs={'class': re.compile(advertisementRegex, re.IGNORECASE)}),

https://github.com/kovidgoyal/calibre/blob/3dd95981398777f3c958e733209f3583e783b98c/recipes/mit_technology_review.recipe


Only the contentBody__wrapper works which is the body & most of the article.

the contentHeader__wrapper is to be changed, but from what i found is that there's different header tags for different articles.

contentArticleHeader--fullBleed__intro--30Y0q
contentArticleHeader__title--rp01p
contentArticleHeader--vertical__intro--2soVS


help find an easier way to do this.
unkn0wn is offline   Reply With Quote
Old 01-03-2022, 06:59 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,342
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...0171353cada942
kovidgoyal is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Complete Works of Plato - A PDF Header Question Blaineoreski Conversion 7 06-07-2023 11:18 PM
recipe for Technology Review - german schuster Recipes 1 06-05-2016 07:17 AM
MIT Technology Review print/bimonthly truth1ness Recipes 7 04-15-2015 12:43 AM
Calibre: Header entfernen nicht mit aktueller Version ?? KimJ Software 5 01-06-2010 12:39 AM
Sony Reader reviewed by MIT Technology Review Bob Russell Sony Reader 38 11-09-2006 05:04 PM


All times are GMT -4. The time now is 07:44 PM.


MobileRead.com is a privately owned, operated and funded community.