Quote:
Originally Posted by wwang
Speed would NOT be an issue I guess If I could run all the threads at the same time that would go to different sites so I would still be limited to the 2 threads/connections issues.
|
This is actually very tricky to get right, and reduce fetching duplicate content in separate threads.
For example, lets say you want to fetch Slashdot and Freshmeat and a Technorati RSS feed. Slashdot has a link in one of its articles to Freshmeat, and Freshmeat links back to Slashdot on its page. If you have Slashdot's fetch running in one thread and Freshmeat's fetch running in another, how do you stop them from fetching each other's content in duplicate? (wasting bandwidth and fetch threads) The answer lies in shared pools of memory in one case (there are others, but this is one possible solution).
Quote:
But I run into the problem of the large files makes it impossible to do more then 1 document at a time because of the plucker file size corruption issue. ARGHHHH!!!! So help?
|
Plucker's Python and C++ distillers don't suffer from these problems. I regularly build 700M+ Plucker documents which work perfectly (though they take a very long time to build, of course).
You might try another distiller, there are about 1/2 dozen of them that output the Plucker format (2 in Java, Sunrise, cplucker, Plucker's Python distiller, a third-party C++ distiller, PDAConvert, jSyncManager and several others).
Quote:
You say wait for your comercial product, but for how long? I would like to have something in the mean time that works... Please can you help give me some practical suggestions?
|
The commercial product Laurens is writing is not going to be using the Plucker format (unless he changed his mind). When/if that commercial version is released, he is dropping support for Plucker, so you'll have no choice if you've migrated all of your documents to Sunrise.
Either you get stuck with the bugs in Sunrise, or you buy his commercial product. He's hoping you buy his commercial product, obviously. But there's another option... just keep using Plucker, but don't build your documents with the proprietary Sunrise distiller.