View Full Version : Sitescooper: The Basics
03-11-2004, 12:48 AM
Herein I will attempt to impart my so far very limited understanding of Sitescooper. The hardest part for me was getting everything working to begin with, so hopefully this will help alleviate that pain for a few others. All my discussion is based on an OS of Windows XP Pro, converting to iSilo and html, and reading on a Palm Tungsten E. But much of it should be readily applicable to other methods. If I make any errors in the following, please do not be shy about correcting me. I've only been at this for a few days myself.
03-11-2004, 12:50 AM
Part 1 - Setup
This part can be the hardest and is probably why you see so many posts by folks who just couldn't get it working at all. But follow along with me and you'll probably get by just fine.
First get perl. Perl is a scripting language often used on websites for nifty tricks and all sorts of things. I won't pretend to be knowledgeable about it. Sitescooper is written in perl, so in order for it to run, your OS must have a copy. Most Windows machines will not have it already. (I think that many Linux installs include it by default.) So go here (http://www.activestate.com/Products/Download/Download.plex?id=ActivePerl) to ActiveState and download the latest build. I download it in the "msi" format, which is a Windows installer file. If that will not run on your machine you need to update your Windows Installer package. (To be honest, I'm not sure how that's done, though I know I've done it... Perhaps someone can interject on this? You might try Windows Update?) There is also a download option called the "AS Package," which is a zip file. Don't know much about it. I do see that on the sidebar it says that you cannot uninstall from the AS Package. Anyway, install perl. Make sure that you answer yes to the question that will put perl in the PATH. You'll see why later. And I think that you need to reboot after this install, though it won't tell you so. Can't hurt.
Okay, now that's half of it. Now we need the program. Follow along with what Alexander says in the intro post and get the "bleeding edge" version from sitescooper.org. This link (http://sitescooper.org/devel/sitescooper-full.zip) will download the latest Windows version in a zip file. Unzip it wherever you like, but put it somewhere that you think it will stay. I put mine in my C:/Program Files/Palm directory.
Do you use iSilo? Make sure that you have a copy of iSiloXC. This is the command line version. You can get it here (http://www.isilox.com/download/index.htm). Unzip the file and copy the file iSiloXC.exe into the top level of the "sitescooper-3.1.3" folder. This is what Sitescooper will use to convert your scooped files. **EDIT** If this does not work by itself, put a copy of iSiloXC.exe in your "C:\Windows\System32" folder.
So that's pretty much it for setup. But you'll see later that there's lots of tweaking to be done to get things just like you want them.
03-11-2004, 12:51 AM
Part 2 - Basic Scooping
So what's in here? Lots of stuff with strange extensions but no .exe files so how the heck do you get it running? Well for those of you who want to leap into the breach, here's the quick way to get started. Write a simple batch file. That's a little text file that runs shell commands. Open your favorite text editor and type this (without the quotes): "perl sitescooper.pl -misilox". (Note that this will work only for iSilo users. If you want to use Plucker or something else, see below in Part 4.) Save it as "sitescooper.bat" also in the top level of your "sitescooper-3.1.3" folder. The .bat extension is critical, as this is what lets Windows know that there are commands to be run. Now if you double click on the sitescooper.bat, a command window will open and stuff will start happening. A text file will pop up with hundreds and hundreds of sites. Scan through the document and put an "X" in the brackets in front of any site that you want scooped. Don't be in a rush to get a ton of sites right away. It's a huge overwhelming list. Pick one or two that you want to see work and put the X's in. Save the text file and close it. The action will continue in the cmd window. Then it will vanish. Now do a sync. If all has gone well and I haven't made too many errors, your scooped files should be in your RAM and readable from iSilo! Congrats!
So assuming all went well, what now? Well, if you want to add or subtract sites, you can go into your "tmp" directory and directly edit the file there called "site_choices.txt". But there's a better (IMHO) way. If you picked any sites from the site_choices file, edit it and remove them. Create a folder in the top level of the sitescooper directory called "sites". Now browse the folder called site samples. Any site that you want, copy it into the sites folder. Now if nothing is marked in the site_choices file, Sitescooper will read from your sites directory and scoop anything that's there. (Even if something is marked in the site_choices file, I think that Sitescooper will do both those files and the files in the sites directory...) Be aware that a lot of the .site files are outdated and some no longer work. If everything seems right, but it's not working, then try a different site before you give up on it.
03-11-2004, 01:06 AM
Part 3 - Basic Troubleshooting
But what if it didn't go well? Well, there's a LOT that can go wrong, considering the many different elements that have to work together to make this happen. Since I'm a newbie myself, I won't try to guess what will go wrong for you. I'll just show you some ways to diagnose what's going on and some places to browse and tweak.
First thing to do if your sites aren't scooping is watch the progress of the program. Unfortunately, that's where batch files don't work so well, because as soon as they're done, the window closes and it all happens far too fast to read. But there is another way. From the start menu, click "run". Type "cmd". This gives you the same command prompt window that your batch file runs from. Now change to the sitescooper directory. Just type "cd c:\program files\palm\sitescooper-3.1.3" or whatever the actual address is for you. (Note that you can also type "cd c:\prog*\pal*\site*" or something like that. Wildcards rock! But wait until you see the power of regular expressions. Coming soon...) Now type the same command that we put into the batch file: "perl sitescooper.pl -misilox". Now when it's done, the window stays open and you can scroll back and have a look. This is probably the single easiest way to see where things are going wrong. Most of the time for me, the problem was that nothing was happening here. Perl wasn't being found or the sites were in the wrong place, etc, etc. Also, if you can't make heads or tails of it, you can copy it all out and paste it into a help request. Then hopefully someone else can descipher it and let you know what's happening.
The next place to check is the documentation. Now I don't want to offend, especially as this whole program impresses the **** out of me, but the docs could be better written. At the least, they could use a better table of contents. But hey, at least you've got some, which is more than you can often say. They are in html form in the "doc" folder. When you open the index file you will not immediately see any links to the other documents. Never fear, it's way down at the bottom of the page. There are good descriptions of how to install on different system that you should double check. Keep in mind that the docs are also dated and don't reflect some of the latest changes. Still, they are how I've learned most of what I've gotten working so use 'em.
If these don't help, then the next step is to start asking. I'll help if I can, and there's lots of others who know more, I'm sure.
03-11-2004, 01:18 AM
Part 4 - Plucker, DOC, and others
Well, as I said, I've only used iSilo and html so far. What I can tell you is where to change things to get your other systems started.
The first step is to edit your "sitescooper.cf" file. This is the configuration file that Sitescooper defaults to. The key in here is to tell Sitescooper the location of the conversion tool that you are using. You'll want to check the documentation on this, and the config file itself is pretty heavily commented, so it shouldn't be too hard to figure out.
The next thing is to change the command switch. The -misilox switch tells Sitescooper to use iSiloXC.exe to do the conversion. For the other formats substitute the following switches:
-doc for DOC format
-plucker for Plucker
-richreader for Richreader format
-html for html format
Pretty straightforward, right? These are enumerated in the documentation. Of these, I've used -html successfully. The one thing to note for using it is that by default Sitescooper will dump the scoops into the "tmp/txt" subdirectory. If you are having problems with other methods, you can convert to html and then use whatever desktop software you have to convert to your final format.
03-11-2004, 01:33 AM
Part 5 - What next?
And where to go from here? This is as far as I can take you this evening. But there's so much to explore. Read the docs on constructing .site files and build or modify some. Actually, just read the docs in general. There's a ton of good stuff in there. For example, the basic html form that the documents are output in can be changed with html templates. You can also tweak the way the files are named as well as other parameters with some command line switches. (Personally I don't like the way that Sitescooper defaults to a "Date - Name" convention. If you include the switch -nodates in your command line, the dates will drop out.) Also the immense power of regular expressions (see the documentation on how to build .site files and also Alexander's post (http://www.mobileread.com/forums/showthread.php?t=1489))
There's lots of power here. All we've done in this intro is (hopefully) get the engine started. I've got a little project that I'm tinkering with to warm up and then I hope to tackle some more challenging sites. And I haven't even begun to explore the huge list of sites that have already had .site files written for them...
So please let me know if this document helps you. I hope it answers more questions than it raises. Soon I hope to add a section on changing the output templates, as well as a description of my comics project. I also hope that others will contribute what they've done with Sitescooper and any cool tricks they've found.
I will install Perl/Sitescooper the next days and follow your instruction.
03-21-2004, 04:02 PM
Anyone have any comments on this intro so far? Has anyone used it successfully. I think Alex followed along here and got his working...
Just a couple of points that I have learned along the way. My work computer worked following only the steps outlined before. However, my home computer gave me a little more trouble. There were three issues and they may help with install problems.
Multiple profiles - My home machine has my wife's Palm profile too. When Sitescooper used the default Palm install app, it would put up a window asking which profile I wanted to install under. I wanted this operation to be completely autonomous, so this wouldn't do. I found that if you edit the sitescooper.cf file (your site settings) you can get around this. There is an early line that says:
# PilotInstallDir: $HOME/pilot/install
Remove the "#" in front and replace the location with something like the following (note that this line shows where the file is located on my drive; yours may be different):
PilotInstallDir: C:\Program Files\Palm\Ignatz\iSiloI
The "iSiloI" subdirectory inside your Palm install is where iSilo puts files to be installed. It appears to me that Hotsync checks here automatically to see if there is anything to install.
Command window disappears after running .bat file - If you run Sitescooper off a .bat file, you may have noticed that as soon as the command, the window disappears. If you are trying to catch errors, this is frustrating. Here's the solution. On the line following the "perl sitescooper.pl..." simply add the command, "pause". This will make the window stay open until you press a key.
Sitescooper can't find iSiloXC.exe - For some reason on my home machine, Sitescooper could not find iSiloXC.exe, though it was in the top level of the Sitescooper folder. Simple solution to this is to put a copy of iSiloXC.exe in the "C:\Windows\System32" folder. This is where most (all?) command-line executables reside.
As I say, I would love to hear feedback or problems to improve this tutorial!
03-24-2004, 11:28 AM
As I say, I would love to hear feedback or problems to improve this tutorial!
Ignatz, I think you did a wonderful tutorial. Thank you! It helped even me lazy butt to install Sitescooper and to discover the beauty of it. Here I have two interesting links to support your tutorial:
This is a step-by-step guide to writing a .site file for your favorite site:
This is an explanation of all possible .site parameters:
10-31-2004, 01:52 PM
Just happened to find this dedicated forum for Sitescooper so decided to contribute something since I simply love Sitescooper. :)
I've been a Sitescooper user for over a year and had done a couple of guides for the community at SPUG (http://www.spug.net/).
Jumpstart Guide to using Sitescooper with Plucker or IsiloX (http://changroy.googlepages.com/sitescooperguide)
How to perform Concurrent Scooping with Sitescooper (http://changroy.googlepages.com/sitescooperguide2)
Hope those interested in trying or finetuning this great tool will find these useful.
11-09-2004, 08:05 PM
I'm trying sitescooper on WinXP (SP2), with a few problems. Any ideas?
Note: I downloaded the current version of iSiloXC and changed the sitescooper.cf file to point to the proper name. I couldn' find an old version of iSiloC32.exe, which might be leading to my problems.
The error I get is:
Running: iSiloXC.exe -y -U -Is300 -Ic -Id -i"2004-Nov-09: USA Today" -d9 "C:\Program Files\Sitescooper\sitescooper-3.1.2\tmp\txt\2004_11_09_USA_Today\2004_11_0
Unrecognized option: -y
Unrecognized option: -U
Unrecognized option: -Is300
Unrecognized option: -Ic
Unrecognized option: -Id
Unrecognized option: -i2004-Nov-09: USA Today
Unrecognized option: -d9
11-10-2004, 12:02 PM
I've been kind of looking at the sitescooper site, and googling and looking around to try to find help, but am not finding many signs of life. Is sitescooper abandoned? Or maybe just "complete" and not in need of any changes for now since it's Perl-based?
But surely I'm not the only one to have this problem. (Unless I'm the only one using Sitescooper.)
Do I need to try to figure it out and modify sitescooper myself? Hope not! :)
11-10-2004, 12:12 PM
Support is pretty much abandoned, but it should still work fine with iSiloX. There was a patch released once to make it work with the latest version of iSiloXC, which is probably why it doesn't work for you. Bob I think you must either go back to iSiloXC 3.x, OR use a small modification of sitescooper to make it work with iSiloXC 4.x: