Quote:
Originally Posted by kovidgoyal
You can find dt first and then call find dd on each dt.
|
Different than e.g. <ul><li><li></ul>, <dt> does not "include" <dd>.
When I iterate over all <dt>'s and on each of them then call findAll('dd') I get all dd included in the overall index:
Code:
for section in index.findAll('dt'):
section_title = self.tag_to_string(section).strip()
self.log('Found section ', section_title)
articles = []
for article in section.findAll('dd'):
#lists all dd's, including the ones next to the ones listed below the current dt
How would I know I stumbled to the dd's in the next dt-section?
Regarding the masthead: all I could find on the publishers website is the corresponding online logo. To avoid confusion between Spiegel Online and Der Spiegel I would stick to the wikipedia logo for now. There is an SVG source that renders the logo, does this help?