Quote:
The only thing Perl thing I could find were some 4-year-old files in CVS.
|
Right, that stuff is ancient, old, crufty, and probably nowhere near a working solution. Thats why it doesn't appear in most releases. It was also not entirely authored by me. Pat and I worked on that for a day or two, and decided to just drop that approach altogether.
Also, I'm not releasing my Perl spider at this point, because there is no compelling reason to, and it isn't ready for public consumption. I'm constantly tweaking and adding functions, in my diminishing spare time (having a daughter, a new house, and an influx of freelance work can do that to you).
What's more-interesting though, is that JPluck has no problems converting the sample documents I have here (2.1.6b, at this point), and it takes about 1/20th as long to convert them, as Sunrise. Sunrise dies right around 80%, after taking about 15 minutes to try to convert a single site.
JPluck, by comparison, takes about 1-2 minutes. I don't care if both tool converts them to pdf, or text, or separate image files.. the point I'm trying to elucidate here, is that JPluck still beats Sunrise in speed and function, and both are from the same author.
Sunrise is definately
significantly slower on the same hardware, converting the same (validated HTML) sites than JPluck does. With that information in hand, what would the difference be? Why such a dramatic difference?
At one point, you stated (either here, or on one of your news articles; I don't seem to recall, and it seems to have disappeared at this point), that Sunrise included NONE of the JPluck code, and now you seem to imply that it does, but that parts of it were rewritten to work better. Which is it?