Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Archive > Mobile Sites

Notices

 
 
Thread Tools Search this Thread
Old 01-31-2003, 04:19 AM   #1
stodghill
Junior Member
stodghill began at the beginning.
 
Posts: 4
Karma: 14
Join Date: Oct 2002
Location: Tokyo, Japan
http://www.iht.com/articleindex.html

This has every article in the paper.

set exclude urls for www.iht.com as a wildcard and exceptions for www.iht.com/articles as a wildcard because all of the articles are in this subdirectory.

Unfortunately, you will have to scroll down a bit for after clicking through each link because the directory of the paper is present on every page but you can get the whole paper in about 400k.
stodghill is offline  
Old 02-03-2003, 04:08 PM   #2
Mark
Junior Member
Mark began at the beginning.
 
Mark's Avatar
 
Posts: 6
Karma: 10
Join Date: Jan 2003
Just wondering as a new isolox user, what you mean by setting up the exclude file urls for www.iht.com as a wildcard etc and the exceptions for www.iht.com/articles. I hope this isn't a dumb question or otherwise bothersome, but there it is. Thanks
Mark is offline  
Advert
Old 02-04-2003, 10:13 AM   #3
stodghill
Junior Member
stodghill began at the beginning.
 
Posts: 4
Karma: 14
Join Date: Oct 2002
Location: Tokyo, Japan
The 3.3 version of IsiloX has the ability exclude specific URLs or URL ranges from being included in the spidered document.

Set the articleindex as the page to be retrieved. Under the links tab, choose 1. At the bottom of that tab is a button labeled "URL filters". Cllick on this button.

Click on add exclusion. type in "www.iht.com/" and set that as a wildcard. Everything begining with the expression will be exclused, which is everything in the IHT site at this point.

Next, add the inclusion filters. www.iht.com/articles will be one since every article in the paper will fall under this subheading with a numeric code assigned to it. Also, added the homepage www.iht.com/articleindex as a regular expression. I don't know if that is necessary or not.
stodghill is offline  
Old 02-04-2003, 11:19 PM   #4
Mark
Junior Member
Mark began at the beginning.
 
Mark's Avatar
 
Posts: 6
Karma: 10
Join Date: Jan 2003
Great,
Thanks for the info. I am using version 3.25 which is most of my problem in not understanding, I think...
Mark is offline  
Old 02-10-2003, 03:34 PM   #5
captainao
Enthusiast
captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.captainao could sell banana peel slippers to a Deveel.
 
Posts: 34
Karma: 3184
Join Date: Nov 2002
Location: NYC
Device: Axim x51v;T|X;NX73;SEK750
Too bad there's not a way to fool the site into thinking isilo is javascript capable, then the headers would be removed - or is this possible?
captainao is offline  
Advert
 


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
International Herald Tribune: European Edition Raoul O'Malley Calibre 1 05-02-2010 12:20 AM
Boston Herald Bashes iPad Lotus Esprit Apple Devices 65 04-23-2010 09:52 AM
It's the year of the e-reader ... - The Sydney Morning Herald AprilHare News 0 01-07-2010 10:18 PM
Chicago Tribune now available on the Kindle! daffy4u Amazon Kindle 14 08-11-2008 01:10 PM
Herald Tribune on how e-books spur sales Alexander Turcic News 0 08-05-2005 05:09 PM


All times are GMT -4. The time now is 05:33 PM.


MobileRead.com is a privately owned, operated and funded community.