View Single Post
Old 07-16-2005, 09:10 AM   #2
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Plucker Desktop is a bit old, and relies upon a version of wxWindows with a nasty progress callback bug. The bug requires that you wiggle your mouse around the progress window for it to continue fetching. Its a pain in the butt. It also ships with version 1.6 of the viewer, and we're up to 1.8 in public releases.

But that being said, Plucker used to support conversion of PDF documents through a web-based service, but now no longer does. It may re-appear again in a future release if Plucker makes another version, but Plucker Desktop does not support this feature natively.

You might find these suggestions in this thread useful. One in bash, one in Python. I use Perl myself to convert PDF (and Word, PostScript, etc.) documents to Plucker format.

For the national characters, you should probably avoid Plucker Desktop for that, since it is very old and may not contain the internationalization support that has been in the current parser for a year and a half. Use the Python code directly (Spider.py is what you want on Windows, or plucker-build on Linux, which is just a symlink to Spider.py anyway).

Many of the texts I convert have non-ascii characters and they convert very well with Plucker's Python and C++ distillers.

I'm not sure what you meant about your scheduled update question though. Can you explain further?
hacker is offline   Reply With Quote