View Single Post
Old 11-01-2011, 02:46 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by scissors View Post
Hi Chaps.

Can someone confirm that

preprocess_regexps = [
(re.compile(r'<head>.*</head>', re.IGNORECASE | re.DOTALL), lambda match: '<head></head>')]

OR

preprocess_regexps = [
(re.compile(r'<head>.*?</head>', re.IGNORECASE | re.DOTALL), lambda match: '<head></head>')]

Should totally remove a downloaded pages <head> section.
Not necessarily. The head tag might have attributes. I'd have checked to be sure, but this should work:
Code:
preprocess_regexps = [(re.compile(r'<head.*</head>', re.IGNORECASE | re.DOTALL), lambda match: '<head></head>')]
Starson17 is offline   Reply With Quote