![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 18
Join Date: Feb 2003
|
![]()
Just curious -- anyone have any luck grabbing the Zap2it.com TV news page (or any of their other individual pages.) It's at http://tv.zap2it.com/news/tvnewsdaily_headlines.html. I'm trying to only get the links listed under "TV NEWS DAILY", but no luck -- it grabs the links from all over the page, making for a very large file (and, more importantly, taking a long time to download and convert.)
I'm only a moderately knoweledgable HTML guy; I've tried to deconstruct the Zap2it.com news page into something that can be grabbed by iSiloX. However, no luck -- I get a huge file, even with the link level set to 1 in iSiloX, and only following links below the root.. Thought someone else here might have a clue as to doing this. I've searched the archives, but couldn't find any mention of Zap2it. (Also, is there an FAQ and/or "tricks of the trade" page for figuring out how to get just the info you want out of a web page for iSiloX?) TIA... Jeff |
![]() |
![]() |
#2 |
Fully Converged
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 18,175
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
|
Jeff,
Check the new link sections I included it there. Unfortunately, there is no official FAQ yet on this topic. This is the idea: In iSiloX under Channel Properties->Links, make sure that is Follow Offsite Links is ON. There you also find the URL Filters menu. What I almost always do is EXCLUDE *every* link first. Yes, everything ![]() * (type wildcard). Now we want to make EXCEPTIONS to above exclusion. For that, I look closely at the links that we want to browse with iSiloX. In Zap2it, I noticed that all the links you want to have contain /tvnewsdaily.html?, example http://tv.zap2it.com/news/tvnewsdaily.html?30388. So let's add that inclusion filter: \/tvnewsdaily\.html\? (type regular expression, more to that in a moment) In addition, I looked at the TV News pages and noticed that there are occasionally little images included with the text. We want those too. So another inclusion filter: \.jpg (type regular expression) Regular expressions is a way to define patterns. It is easy to crasp but can become quite complex. Use the search engine on this forum where I posted some interesting links to RegEx tutorials. Greets |
![]() |
Advert | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
I seriously can't believe my luck | tech_au | Sony Reader | 15 | 08-11-2010 07:51 PM |
A stroke of accessory luck... | khourianya | Sony Reader | 7 | 09-26-2008 12:04 AM |
Just ordered... wish me luck! | Abraxus | Bookeen | 4 | 06-04-2008 10:37 PM |
Non-US Residents - Any luck with Sony Connect??? | Amadeus | Sony Reader | 4 | 04-17-2007 08:08 AM |