Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2016, 05:53 AM   #1
bubak
Connoisseur
bubak began at the beginning.
 
Posts: 60
Karma: 10
Join Date: Dec 2010
Device: kindle
Multiple Page Sites

The reusable code to load multiple-page articles is IMHO wrong. It uses preprocess_html which is applied "after the cleanup as specified by remove_tags etc.", so no cleanup is done on the following pages, at least this is what I experience on FAZ.NET. This site in particular offers a link to 'Article on one page', so this could be used before cleanup instead of appending pages, but I'm not sure what would be the correct way, skip_ad_pages (but this accepts soup but returns the HTML, so in case this page is ok, one cannot use it) or get_article_url(then the article might have to be loaded twice). Couldn't we have a function that gets and returns the same object, soup or text and is applied right after loading the article content?
bubak is offline   Reply With Quote
Old 01-25-2016, 06:34 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
preprocess_raw_html()

or if the URL scheme is fixed, then print_version()
kovidgoyal is offline   Reply With Quote
Old 01-25-2016, 06:50 AM   #3
bubak
Connoisseur
bubak began at the beginning.
 
Posts: 60
Karma: 10
Join Date: Dec 2010
Device: kindle
Yes, thanks. I was actually hoping for the soup version which would be easier to parse and avoid duplicate parsing but I understand it would more or less duplicate preprocess_raw_html.
bubak is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Moon+ and slow/multiple page turns hbtaylor Android Devices 2 02-08-2014 06:20 PM
Same page, multiple fonts? larryt Kobo Reader 17 08-10-2012 08:59 AM
Syncing last page read between multiple Kindle devices? johneveryman Calibre 14 08-04-2011 11:29 PM
does any ereader have multiple page turners? parafluie Which one should I buy? 7 09-14-2010 02:29 AM
multiple page turns when pressing flipbar bazmi iRex 27 06-14-2009 01:19 PM


All times are GMT -4. The time now is 08:51 PM.


MobileRead.com is a privately owned, operated and funded community.