01-25-2016, 05:53 AM | #1 |
Connoisseur
Posts: 60
Karma: 10
Join Date: Dec 2010
Device: kindle
|
Multiple Page Sites
The reusable code to load multiple-page articles is IMHO wrong. It uses preprocess_html which is applied "after the cleanup as specified by remove_tags etc.", so no cleanup is done on the following pages, at least this is what I experience on FAZ.NET. This site in particular offers a link to 'Article on one page', so this could be used before cleanup instead of appending pages, but I'm not sure what would be the correct way, skip_ad_pages (but this accepts soup but returns the HTML, so in case this page is ok, one cannot use it) or get_article_url(then the article might have to be loaded twice). Couldn't we have a function that gets and returns the same object, soup or text and is applied right after loading the article content?
|
01-25-2016, 06:34 AM | #2 |
creator of calibre
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
preprocess_raw_html()
or if the URL scheme is fixed, then print_version() |
01-25-2016, 06:50 AM | #3 |
Connoisseur
Posts: 60
Karma: 10
Join Date: Dec 2010
Device: kindle
|
Yes, thanks. I was actually hoping for the soup version which would be easier to parse and avoid duplicate parsing but I understand it would more or less duplicate preprocess_raw_html.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Moon+ and slow/multiple page turns | hbtaylor | Android Devices | 2 | 02-08-2014 06:20 PM |
Same page, multiple fonts? | larryt | Kobo Reader | 17 | 08-10-2012 08:59 AM |
Syncing last page read between multiple Kindle devices? | johneveryman | Calibre | 14 | 08-04-2011 11:29 PM |
does any ereader have multiple page turners? | parafluie | Which one should I buy? | 7 | 09-14-2010 02:29 AM |
multiple page turns when pressing flipbar | bazmi | iRex | 27 | 06-14-2009 01:19 PM |