Thread: plucker help
View Single Post
Old 05-04-2008, 09:10 PM   #2
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Plucker deals with real HTML, not fake HTML

Quote:
Originally Posted by richasta View Post
the error i get is:

Processing http://www.dailyreckoning.com.au/...
Retrieved ok.
Error: Runtime error parsing document http://www.dailyreckoning.com.au/: unexpected char in declaration: '<'
Parsing failed.
---- all 0 pages retrieved and parsed ----

any ideas?
Plucker deals with clean, proper HTML, not the horribly-broken and invalid constructs that are represented on that site.

This is especially important when dealing with XML, because the spec itself says that ANY error in XML should immediately throw a fatal error in the parser... as it does with Plucker.

The result is that you'll either have to tell them to clean up their HTML, or clean it up yourself in an inline filter or parse the pages locally with something like tidy or similar tools.
hacker is offline   Reply With Quote