View Single Post
Old 09-03-2011, 05:56 PM   #1
macpablus
Enthusiast
macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.
 
Posts: 25
Karma: 1896
Join Date: Aug 2011
Device: Kindle 3
Bad DOCTYPE declaration causes BS to crash

After some investigation, I discover that this DOCTPE declaration is causing my recipe to fail:

Code:
<!DOCTYPE html 
	PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN
	"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
As you can see, there's an erroneous quote after PUBLIC.

So far, I've tried to solve the matter with this:

Code:
preprocess_regexps = [
(re.compile(r'<!DOCTYPE html .*strict.dtd">', re.DOTALL|re.IGNORECASE),
lambda match: '<!DOCTYPE html>'),
]
and this:

Code:
    def parse_declaration(self, i):
        """Treat a bogus SGML declaration as raw data. Treat a CDATA
        declaration as a CData object."""
        j = None
        if self.rawdata[i:i+9] == '<![CDATA[':
             k = self.rawdata.find(']]>', i)
             if k == -1:
                 k = len(self.rawdata)
             data = self.rawdata[i+9:k]
             j = k+3
             self._toStringSubclass(data, CData)
        else:
            try:
                j = SGMLParser.parse_declaration(self, i)
            except SGMLParseError:
                # Could not parse the DOCTYPE declaration
                # Try to just skip the actual declaration
                match = re.search(r'<!DOCTYPE([^>]*?)>', self.rawdata,
                re.MULTILINE)
                if match:
                    toHandle = self.rawdata[i:match.end()]
                else:
                    toHandle = self.rawdata[i:]
                self.handle_data(toHandle)
                j = i + len(toHandle)
        return j
But the result's the same:

Quote:
Python function terminated unexpectedly
No articles found, aborting (Error Code: 1)
Any ideas?
macpablus is offline   Reply With Quote