View Single Post
Old 06-14-2005, 08:11 PM   #2
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Well, it seems to work... but Laurens already knows I'm his hardest critic, so here goes (and some of these are the same bugs that plague JPluck):
  1. Changing a site's depth parameters breaks any subsequent fetch until you close and reload Sunrise (and JPluck). As a test, create a site that points to the Wired News site, depth of 3, and restrict to directory. Fetch it, then go back into it and modify the properties to have "No restriction" on the depth, and try to fetch again. BOOM.
  2. No tables support
  3. No robots.txt support
  4. No random or specific delay support between requests (NOT adding this will rapidly get Sunrise blocked from many sites. Many sites already block JPluck for this exact reason)
  5. No specific parameters for compression (or lack of compression, which in some cases, is desired for display speed) There is only "Default" and "Best". How about "None"? What about image compression vs. non-image compression properties?
  6. No way to modify specific AppInfo data (Beam, CopyProtect, etc.)
  7. No way to set accellerators on the menus for shortcuts (ala gtk2+, KDE dynamic accellerators)
  8. And as with JPluck as well, using Sunrise to fetch VERY large documents fails, even when allocating 3gb RAM to the JVM.
Its definately a good start for people just getting their feet wet with Plucker distilling, but for high-performance, production-quality distilling (thousands of documents per-hour), it doesn't quite fit the bill... yet.

Keep at it Laurens, I'll keep offering my feedback.
hacker is offline