Quote:
Originally Posted by BuzzKill
Although, this works at extracting the full post contents, I want to add another bit of info at the beginning of each post: The author of the post.
|
It's already in the post, you're removing it with your "keep_only_tags" line.
If you don't like the additional stuff in the div tag, you could keep the name by keeping only the <a> tag with the "Posts by" title using this:
Code:
keep_only_tags = [
dict(name='a', attrs={'title':re.compile(r'Posts by.*', re.DOTALL|re.IGNORECASE)}),
dict(name='div', attrs={'class':'entry'})
]
I used a regex so don't forget to add this at the top: