View Single Post
Old 07-26-2006, 06:06 AM   #1
Junior Member
goybert began at the beginning.
Posts: 1
Karma: 10
Join Date: Jul 2006
Plucker: Help needed with spidering

I have been having trouble with spidering the following website,
At first I had several filters on, and couldn't get past the first page. For the purposes of testing, I removed all the filters, set max depth to 2, and still could not get past the first page. Here is the progress text:

Initializing Plucker spidering engine...

Updating channel: falconer...
Pluckerdir is 'C:\Program Files\Plucker'...
Using proxy '' with authentication for user ''...
ZLib compression turned on
Using exclusion list C:\Program Files\Plucker\exclusionlist.txt
Using exclusion list C:\Program Files\Plucker\exclusionlist.txt
---- 0 collected, 1 to do ----
Retrieved ok.
Parsed ok.
---- all 1 pages retrieved and parsed ----
Writing out collected data...
Writing document 'falconer' to file C:\Program Files\Plucker\channels/falconer/falconer.pdb
Converted 2:
Default charset is MIBenum 2252 (windows-1252)
New document <PluckerIndexDocument 'plucker:/~special~/index' at 9611924> added
Converted 1: plucker:/~special~/index
New document <PluckerMetadataDocument 'plucker:/~special~/metadata' at 9568372> added
Converted 5: plucker:/~special~/metadata
Wrote 1 <= plucker:/~special~/index
Wrote 2 <=
Wrote 5 <= plucker:/~special~/metadata
Unknown items encountered:
</tbody>: ['']
<tbody>: ['']
Installing channel output to destinations...
Setting new due date...
Tasks completed for all channels.

If anyone could possibly point out what have i been doing wrong, I'd be much obliged.

UPD: Well, I have succeeded in spidering the site after downloading sunrise XP, with the minor setback that sunrise turned out to be a sneaky son of a bitch, having its regexp filters defaulted to "exclude", resulting in me trying to download the entire internet for an hour (I got about 18% done, according to the progress bar). Thus, the problem ceased to be, but another problem arose before me - the problem of thread removal - in solving which I, sadly, failed.

Last edited by goybert; 07-26-2006 at 07:44 AM.
goybert is offline   Reply With Quote