View Single Post
Old 12-12-2010, 09:18 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by BuzzKill View Post
Although, this works at extracting the full post contents, I want to add another bit of info at the beginning of each post: The author of the post.
It's already in the post, you're removing it with your "keep_only_tags" line.

If you don't like the additional stuff in the div tag, you could keep the name by keeping only the <a> tag with the "Posts by" title using this:
Code:
    keep_only_tags = [
                      dict(name='a', attrs={'title':re.compile(r'Posts by.*', re.DOTALL|re.IGNORECASE)}), 
                      dict(name='div', attrs={'class':'entry'})
                      ]
I used a regex so don't forget to add this at the top:
Code:
import re
Starson17 is offline   Reply With Quote