10-08-2021, 11:25 PM | #1 |
Connoisseur
Posts: 95
Karma: 10
Join Date: Sep 2020
Device: kindle paperwhite3/Oasis2
|
keep_only_tags and the order of the related contents
Hi,
The order of the classes in keep_only_tags controls the order how the related contents display on the html page, right? e.g, these two keep_only_tags will get contents on the page displayed in different order. keep_only_tags = [ dict(attrs={'class': re.compile('^SplitScreenContentHeaderHed')}), <--- dict(attrs={'class': re.compile('^SplitScreenContentHeaderDek')}), dict(attrs={'class': re.compile('^SplitScreenContentHeaderByline')}), dict(attrs={'class': re.compile('^SplitScreenContentHeaderPublishDate') }), dict(attrs={'class': re.compile('^SplitScreenContentHeaderLedeBlock')}) , dict(attrs={'class': re.compile('^SplitScreenContentHeaderCaption')}), ] keep_only_tags = [ dict(attrs={'class': re.compile('^SplitScreenContentHeaderDek')}), dict(attrs={'class': re.compile('^SplitScreenContentHeaderByline')}), dict(attrs={'class': re.compile('^SplitScreenContentHeaderPublishDate') }), dict(attrs={'class': re.compile('^SplitScreenContentHeaderLedeBlock')}) , dict(attrs={'class': re.compile('^SplitScreenContentHeaderCaption')}), dict(attrs={'class': re.compile('^SplitScreenContentHeaderHed')}), <--- ] |
10-09-2021, 12:12 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yes they do. combine them all into one if you dont care about order with this function
Code:
def prefixed_classes(classes): q = frozenset(classes.split(' ')) def matcher(x): if x: for candidate in frozenset(x.split()): for x in q: if candidate.startswith(x): return True return False return {'attrs': {'class': matcher}} |
10-09-2021, 05:51 AM | #3 | |
Connoisseur
Posts: 95
Karma: 10
Join Date: Sep 2020
Device: kindle paperwhite3/Oasis2
|
Quote:
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Different keep_only_tags and remove_tags for different feeds | steinarb | Recipes | 2 | 07-27-2014 04:07 PM |
Table of contents and related spine links incorrect | slicknick001 | Sigil | 9 | 12-11-2013 04:15 PM |
keep_only_tags and findAll | boocko | Recipes | 3 | 11-18-2010 11:59 AM |
keep_only_tags | ultimatebuster | Calibre | 4 | 03-19-2010 07:49 PM |
Kindle one manga problem - starts on wrong page (not order related) | Shike | Amazon Kindle | 1 | 02-13-2010 11:41 PM |