keep_only_tags and the order of the related contents
Hi,
The order of the classes in keep_only_tags controls the order how the related contents display on the html page, right?
e.g, these two keep_only_tags will get contents on the page displayed in different order.
keep_only_tags = [
dict(attrs={'class': re.compile('^SplitScreenContentHeaderHed')}), <---
dict(attrs={'class': re.compile('^SplitScreenContentHeaderDek')}),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderByline')}),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderPublishDate') }),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderLedeBlock')}) ,
dict(attrs={'class': re.compile('^SplitScreenContentHeaderCaption')}),
]
keep_only_tags = [
dict(attrs={'class': re.compile('^SplitScreenContentHeaderDek')}),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderByline')}),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderPublishDate') }),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderLedeBlock')}) ,
dict(attrs={'class': re.compile('^SplitScreenContentHeaderCaption')}),
dict(attrs={'class': re.compile('^SplitScreenContentHeaderHed')}), <---
]
|