View Single Post
Old 02-04-2003, 10:13 AM   #3
stodghill
Junior Member
stodghill began at the beginning.
 
Posts: 4
Karma: 14
Join Date: Oct 2002
Location: Tokyo, Japan
The 3.3 version of IsiloX has the ability exclude specific URLs or URL ranges from being included in the spidered document.

Set the articleindex as the page to be retrieved. Under the links tab, choose 1. At the bottom of that tab is a button labeled "URL filters". Cllick on this button.

Click on add exclusion. type in "www.iht.com/" and set that as a wildcard. Everything begining with the expression will be exclused, which is everything in the IHT site at this point.

Next, add the inclusion filters. www.iht.com/articles will be one since every article in the paper will fall under this subheading with a numeric code assigned to it. Also, added the homepage www.iht.com/articleindex as a regular expression. I don't know if that is necessary or not.
stodghill is offline