Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > Miscellaneous > Lounge

Notices

Reply
 
Thread Tools Search this Thread
Old 03-13-2005, 01:55 AM   #1
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Really now, what is the point of RSS?

Before you jump on me, hear me out.. I'm actually soliciting some ideas and suggestions, and I have a real purpose behind this post, so I hope this thread gets nice and lengthy and opinionated. I want everyone to respond and contribute...

Feeds, blogs, syndication, rss, xml, rss, atom, opml... it goes by many different names, and there are about 13 different incompatible formats for all of them. They're all XML, so that makes it fairly easy to parse... almost.

I've personally run into dozens of feeds that are exported from very popular websites, that don't even validate as a proper feed. Techncially, as developers (or those who are parsing feed content with tools we write), we're supposed to reject the feed as invalid; the XML specification requires it, but the users don't care, they just want the content. Herein lies the complexity... and the paradox.

But what are feeds really useful for? You're only given a "teaser" in the feed, which, when clicked or followed, leads you to a full-page article with the full content from that article. Why would anyone want to use these "teasers" on a PDA? Without some serious clipping and transcoding of those full-size pages, you're wasting a ton of space on your PDA just to read news articles, if you follow more than just that top level.

For most websites, their feeds are simply used as "commercials" to help drive traffic to their site, and thus bring in some advertiser's revenue (banner ads), but why are they such a fad for mobile and PDA users? I haven't yet found a single useful feed that provides the followed content in a consistent mobile format (except ours, of course). They all just link to an overly-heavy, banner-ad-ridden, full-size webpage. These aren't fun to read on a PDA.

So here we go, an impromptu survey to solicit some discussion and opionions:

What do you use feeds for?
  • Are you reading them with a specialized reader?
  • A PDA or other mobile/handheld device?
  • A desktop browser?
  • Something else?
What is missing from your "feed" experience?
  • Better content?
  • More usable features?
  • Better integration with devices or browsers?
  • Something else?
How are you finding your favorite feeds?
  • Specialized feed-based search engine
  • Google? Yahoo? MSN?
  • Something else?
There are literally hundreds of tools out there to read, fetch, convert, integrate, migrate, and do all kinds of other things to feeds and other syndicated content... Lots of blogs and blog software can export directly to syndicated content (Drupal, Wordpress, Movable Type, and others, for example).

Thousands of people are using feeds and syndicated content.. but why?
hacker is offline   Reply With Quote
Old 03-13-2005, 02:44 AM   #2
Pride Of Lions
just kinda geeky
Pride Of Lions began at the beginning.
 
Pride Of Lions's Avatar
 
Posts: 381
Karma: 30
Join Date: Apr 2003
Location: Oakland, California
Device: iPhone
I used to use NewsMac to convert my feeds to iSilo (via iSiloXC) before the recent releases of iSiloX allowed RSS feeds, and I missed being able to read the full story or be able to customize the links. Now with iSiloX, I have a little more control over the RSS feed and how it appears on my Zodiac2.

I agree with you that the limited feed is too little info, but the full link is too much "other" info, and the "other" info is what is taking too much space on the handheld. I wish to learn more about how to train iSiloX to cut out that "other" content, but until then it's a delicate balance between what I want and what I get.

One example of a RSS feed that lets me glance at the headlines and decide whether I want to read the article is the BBC News channel. I created a custom channel with all of the RSS feeds set to a link depth of 1 off-site link. Then, as I glance through all of the poossible headlines and synopses, I can choose which ones I want to read further. The one drawback is that I have to convert everything to get the ones I want, including all of the ones I don't want. I figure that there are probably plenty of tools within iSiloX that I can use to further tailor my BBC News reading experience, but I haven't learned them yet, and they might not work on such a dynamic site like the BBC News site.

So, yes, I wish that I could make iSiloX only give me the interesting to me headlines, but until then I make allowances for the size of the channel. The BBC News channel is about 14MB everyday. It's huge, but it also contains a lot of news from all over the world. I contain it all in my Zodiac2's garguantuan RAM, or on either one of the 2 SD cards that it holds (1GB each.) If memory were more of an issue to me, I'd be more concerned with channel size, but I'm spoiled.

I usually find my feeds by checking to see if a site I like offers a RSS option. If it does it gets sent to iSiloX for conversion, and if it doesn't, it gets sent to iSiloX for conversion. The main difference being that I can more easily glance through what the site has to offer via the feed's synopses than through the converted channels method.

I hope that answers some of your questions. It sounds like you're looking to build a better mousetrap. Hopefully you'll finnd a way to make RSS meaningful and worthwhile for PDA's and desktops alike.
POL9A
Pride Of Lions is offline   Reply With Quote
Advert
Old 03-13-2005, 08:04 AM   #3
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by hacker
I've personally run into dozens of feeds that are exported from very popular websites, that don't even validate as a proper feed. Techncially, as developers (or those who are parsing feed content with tools we write), we're supposed to reject the feed as invalid; the XML specification requires it, but the users don't care, they just want the content. Herein lies the complexity... and the paradox.
Use a fault-tolerant parser to process the feeds. Usually, feed parsing issues are due to relatively harmless errors such unknown entity names caused by copying HTML directly to the feed. An RSS parser should be able to process ill-formed content, just like browsers have to deal with all sorts of HTML soup.

Quote:
Originally Posted by hacker
I haven't yet found a single useful feed that provides the followed content in a consistent mobile format (except ours, of course). They all just link to an overly-heavy, banner-ad-ridden, full-size webpage. These aren't fun to read on a PDA.
Use link rewriting to make the links point to PDA-friendly "printable" versions of pages. Both Sunrise and JPluck have supported this for a long time already. This way you can make PDA-friendly versions of many sites that don't have a dedicated "mobile" version.

Newsfeeds are especially useful for PDA's because they can cut through the fluff and link directly to articles. Furthermore, they can be presented with a consistent layout, irrespective of the site they originate from.
Laurens is offline   Reply With Quote
Old 03-13-2005, 12:02 PM   #4
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Quote:
Originally Posted by Laurens
Use a fault-tolerant parser to process the feeds. Usually, feed parsing issues are due to relatively harmless errors such unknown entity names caused by copying HTML directly to the feed. An RSS parser should be able to process ill-formed content, just like browsers have to deal with all sorts of HTML soup.
Actually, no. If an XML document is invalid, it MUST be rejected. This is not a guideline, it is a rule. Any RSS (or XML) parser that does not adhere to that, is ignoring the specification.

That being said, adding some "massaging" of the content prior to parsing could help the XML validate as well-formed, assuming the broken XML can be easily fixed to validate. Again, the users don't care about broken or invalid feeds, they just want the content "at all costs".

So what do we do? Adhere to the specification, to bring some awareness to broken feeds, or make the users happy, and ignore the specification, bringing us back into the mess that HTML created for us?

But, like the problem with "HTML soup", if we just fix the problems with invalid XML, we're going to be back in the same boat that we are with HTML, and the whole point of XML is rendered irrelevant. If content authors don't realize that their feeds are broken, there is no motivation to fix it. If we transparently fix it for them, there's no reason for them to correct their end. Its a double-edged sword.

There's a good article on XML.com on this subject titled "[font=verdana,arial,helvetica]XML on the Web Has Failed[/font][font=verdana,arial,helvetica]". Its worth the read.[/font]

Quote:
Use link rewriting to make the links point to PDA-friendly "printable" versions of pages. Both Sunrise and JPluck have supported this for a long time already. This way you can make PDA-friendly versions of many sites that don't have a dedicated "mobile" version.
And this is exactly why JPluck and Sunrise and Sitescooper will consistently fail.. they don't scale and heal as the site changes. You have to maintain templates for each site that describe what links to point to, what content to keep, and what content to strip out or ignore. As the site changes, your template has to change. If you have 5,000 templates for 5,000 websites, its a maintenance nightmare. JPluck had .jxl files, Sunrise has SDL files, Sitescooper has .site files. Its all the same thing.

This is a major factor of what killed Sitescooper, because the user community behind maintaining those templates, found that it was just too much work to keep maintaining them. Every time the site added a new nested table tag, or changed their CMS system providing the content, or reinvented their site layout, the template had to be changed.

I've come up with an approach in a tool tool I've written that tries to be a bit smarter about looking at the upstream links found in the newsfeed's RSS to render the need for per-site "templates" irrelevant. Its a lot of work though, and I can only code against the 2,000 or so sample feed sites I know are providing "broken" content links. Its definately not fun.

Quote:
Newsfeeds are especially useful for PDA's because they can cut through the fluff and link directly to articles. Furthermore, they can be presented with a consistent layout, irrespective of the site they originate from.
Again, not quite. Newsfeeds give you 1 or 2 sentences that provide a teaser that describes some of the article. Clicking on the article link provided in the feed, leads you to the full size content provider's webpage. This is most-certainly NOT useful on a PDA; not without a lot of slicing and dicing of the fluff surrounding the content.

I think once content-providers start learning how to use feeds properly, and start building their XML in a way that is consistently producing well-formed documents and output, we'll be in a better position. Right now, less than 30% of the content authors do (based on the random 2,000-feed test suite I have here). Having 13 incompatible "standard" formats and versions doesn't help either.

Great comments so far... keep them coming.
hacker is offline   Reply With Quote
Old 03-13-2005, 01:06 PM   #5
gadgetguru
Addict
gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.gadgetguru got an A in P-Chem.
 
Posts: 214
Karma: 6370
Join Date: May 2003
Location: Asia
Device: Tungsten T5
Actually I do use them with Sunrise, sure the followed links are graphics heavy but somewhere in there is the 'meat' of the article, some contain only the first page of a multi-page article but most of the time, the 'meat' of the article is on that first page.

As for broken XML, the way Sunrise overlook these errors is quite good, and most popular XML sites render well, sure, some might need rewritting for 'nested tables' or other stuffs, but most sites render well irregardless. Note that the XML is mostly only the first page since the linked article is pure HTML, that's where most formatting hell is... As for the .sdl, jxl files, as far as I know, they contain mostly the link location and parsing parameters like link depth which you would need anyway if you want to download the RSS feed, they are not really specifically recoded unless you want to block specific portions of the site.

As for adhering to standards, do you really believe that their authors will change them just for say a few PDA users? As long as it works with the major RSS readers, they don't care...just as most HTML coders make their site work well only for IE and not even the second-running Firefox.

Size of XML with 1 level links are not that large for most. Unless you linked very deep, the sites for most are under 1 MB (compressed), and with garhgatuan card sizes going for so little, most people just want the content offline, especially those that do not have 'unlimited' wireless access as per kb charges over WAP or 2.5G phone networks can be quite expensive.

More on file sizes, you could shrink these further, if you forgo graphics or block specific links. That entails additional work, but most software like Lauren's excellent Sunrise, provide a good GUI that even non-techie like me could use. They are mostly one time affair (for each site) anyway.

Some sites do not provide mobile content or blocked mobile content unless you are affiliated with them or have paid for the said content, that's where most RSS feeds come in. I do hope that major sites like CNN or NYT (you are not connected with them, hacker, are you?) do not realize this lest they block it once again as they has done so for mobile content in the past.

And finally, RSS was written with desktop in mind. RSS is to allow easy perusal of headlines to help alleviate the content overload from multiple news sources (debatable since most websites get their content from just a handful of news organizations).

The way we power-PDA users linked to them and download conytent is not what the content-provider wanted. In a way, we are not adhering to the spirit of RSS itself, but as long as we don't break any laws, we, the end-users don't care.
gadgetguru is offline   Reply With Quote
Advert
Old 03-13-2005, 01:11 PM   #6
TadW
Uebermensch
TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.
 
TadW's Avatar
 
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
What do you use feeds for?
I use FeedDemon on Windows to follow around 200 news sources, almost every day. Of those 200 news sources, I've selected around 20 news sources to also follow with Newsbreak on my PPC when I am on the run.

What is missing from your "feed" experience?
I prefer full-length feed items. I know some people prefer summary items, but that is not how I read news. So many feeds, especially commercial ones who want you to visit the main site, don't offer the full content of the news article, which is a pity. Often I don't follow a news, not because I don't find it interesting, but because I don't want to open another browser window.

How are you finding your favorite feeds?
I usually first check my favorite sites to see if they offer RSS feeds. Then I go to some public web aggregator like Bloglines and see what feeds other people read who have the same interests like I.
TadW is offline   Reply With Quote
Old 03-13-2005, 01:11 PM   #7
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by hacker
Actually, no. If an XML document is invalid, it MUST be rejected. This is not a guideline, it is a rule. Any RSS (or XML) parser that does not adhere to that, is ignoring the specification.
Who cares about the spec? As long as the RSS parser processes valid feeds correctly, I don't see a problem with attempting to process ill-formed feeds.

Quote:
Originally Posted by hacker
That being said, adding some "massaging" of the content prior to parsing could help the XML validate as well-formed, assuming the broken XML can be easily fixed to validate. Again, the users don't care about broken or invalid feeds, they just want the content "at all costs".
Exactly, which is why your argument doesn't hold.

Quote:
Originally Posted by hacker
But, like the problem with "HTML soup", if we just fix the problems with invalid XML, we're going to be back in the same boat that we are with HTML, and the whole point of XML is rendered irrelevant. If content authors don't realize that their feeds are broken, there is no motivation to fix it. If we transparently fix it for them, there's no reason for them to correct their end. Its a double-edged sword.
RSS is a lot cause. That's why aggregators like FeedDemon do enforce well-formedness when processing Atom feeds. For RSS it's just too late.

Quote:
Originally Posted by hacker
And this is exactly why JPluck and Sunrise and Sitescooper will consistently fail.. they don't scale and heal as the site changes. You have to maintain templates for each site that describe what links to point to, what content to keep, and what content to strip out or ignore. As the site changes, your template has to change. If you have 5,000 templates for 5,000 websites, its a maintenance nightmare. JPluck had .jxl files, Sunrise has SDL files, Sitescooper has .site files. Its all the same thing.
The NYT and other scripts have worked well for months. Also, scripts require almost no maintenance, almost all of them consist of only two or three lines of JavaScript. For example:

Code:
if (link.depth == 1) {
  link.uri += "&pagewanted=print";
}
I concede that the existing approach is indeed problematic when you have to update existing scripts in case something changes at the site. Users have to download the SDLs and copy documents manually. That's why I'm working on an "auto-update" mechanism for my commercial product. This feature is, coincidentally, also based on RSS/RDF.

Quote:
Originally Posted by hacker
Again, not quite. Newsfeeds give you 1 or 2 sentences that provide a teaser that describes some of the article. Clicking on the article link provided in the feed, leads you to the full size content provider's webpage. This is most-certainly NOT useful on a PDA; not without a lot of slicing and dicing of the fluff surrounding the content.
Again, you need link rewriting to make feeds useful for PDA's.
Laurens is offline   Reply With Quote
Old 03-13-2005, 01:14 PM   #8
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
Quote:
Originally Posted by Laurens
Again, you need link rewriting to make feeds useful for PDA's.
Which is, alas, often against the policies of the RSS provider. At least if you publish the results online. Some sites even do referer checking to make sure that you are not rewrting to "print" versions.
Alexander Turcic is offline   Reply With Quote
Old 03-13-2005, 01:28 PM   #9
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Quote:
Originally Posted by Laurens
Who cares about the spec? As long as the RSS parser processes valid feeds correctly, I don't see a problem with attempting to process ill-formed feeds.
It is exactly this ignorance of the standards, which causes most of the work we have to do to bring content back into compliance, so it can be parsed by proper tools. It is exactly this ignorance of the standards which cause all of the browser hacks, quirks-mode, and other things that complicate web development.

So to answer your question, I care about the spec. But then again, I write proper code that adheres to the spec, not hacks that work around it.

Quote:
Exactly, which is why your argument doesn't hold.
No. My argument holds, because no post-processing should be required to make an XML document well-formed. Period. If post-processing is required, then the XML document is broken, and should be rejected as not well-formed (invalid). There is no leeway here.

But then again, you don't care about the spec, so do what you want with it, you're inventing your own standards.

Quote:
That's why I'm working on an "auto-update" mechanism for my commercial product. This feature is, coincidentally, also based on RSS/RDF.
Instead of writing scalable tools, you continue to work around them. Each to his own. While I applaud your efforts, I disagree with them at many levels, but that is what drives us... and our users; choice. My goal is to remove the human process from the processing. Your goal is apparently to increase the human involvement in the process.

Quote:
Again, you need link rewriting to make feeds useful for PDA's.
That is one approach, I have another. Again, each to his own.

Great discussion..
hacker is offline   Reply With Quote
Old 03-13-2005, 01:32 PM   #10
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by Alexander
Which is, alas, often against the policies of the RSS provider. At least if you publish the results online. Some sites even do referer checking to make sure that you are not rewrting to "print" versions.
Sunrise allows you to set the "Referer" header for each individual link through scripting. Simply set the Referer to the original URL and rewrite the link to point to the printable version. There's a lengthy explanation on this in the scripting reference.
Laurens is offline   Reply With Quote
Old 03-13-2005, 01:33 PM   #11
Alexander Turcic
Fully Converged
Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.Alexander Turcic ought to be getting tired of karma fortunes by now.
 
Alexander Turcic's Avatar
 
Posts: 18,163
Karma: 14021202
Join Date: Oct 2002
Location: Switzerland
Device: Too many to count here.
Quote:
Originally Posted by Laurens
Sunrise allows you to set the "Referer" header for each individual link through scripting. Simply set the Referer to the original URL and rewrite the link to point to the printable version. There's a lengthy explanation on this in the scripting reference.
Very cool and very powerful.
Alexander Turcic is offline   Reply With Quote
Old 03-13-2005, 01:41 PM   #12
TadW
Uebermensch
TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.
 
TadW's Avatar
 
Posts: 2,583
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
Quote:
Originally Posted by hacker
It is exactly this ignorance of the standards, which causes most of the work we have to do to bring content back into compliance, so it can be parsed by proper tools. It is exactly this ignorance of the standards which cause all of the browser hacks, quirks-mode, and other things that complicate web development.
Before I switched to FeedDemon, I was using another Windows tool called NewzCrawler. One of its biggest pains (from a consumer's perspective!) was its strict adherence to xml standards. If you follow a lot of news feeds, you'll know how many feeds are not validating (around 1/4th of the feeds I used to read didn't work with NewzCrawler). So where does that leave me as a consumer? Just because some lazy webmasters don't follow the rules, am I the one to suffer?

I think it is wishful thinking to assume that all feeds are going to be valid XML one day.
TadW is offline   Reply With Quote
Old 03-13-2005, 01:44 PM   #13
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by hacker
No. My argument holds, because no post-processing should be required to make an XML document well-formed. Period. If post-processing is required, then the XML document is broken, and should be rejected as not well-formed (invalid). There is no leeway here.
What can I say? The real world just isn't perfect.

Quote:
Originally Posted by hacker
Instead of writing scalable tools, you continue to work around them. Each to his own. While I applaud your efforts, I disagree with them at many levels, but that is what drives us... and our users; choice. My goal is to remove the human process from the processing. Your goal is apparently to increase the human involvement in the process.
What exactly is your "scalable solution" then? Please enlighten us.
Laurens is offline   Reply With Quote
Old 03-13-2005, 01:58 PM   #14
hacker
Technology Mercenary
hacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with othershacker plays well with others
 
hacker's Avatar
 
Posts: 617
Karma: 2561
Join Date: Feb 2003
Location: East Lyme, CT
Device: Direct Neural Implant
Quote:
Originally Posted by Laurens
Sunrise allows you to set the "Referer" header for each individual link through scripting. Simply set the Referer to the original URL and rewrite the link to point to the printable version. There's a lengthy explanation on this in the scripting reference.
You mean "forge", not set. Many content providers (myself included) are beginning to reject forged referers, if the host and the referer don't match properly. Easy to do, and I've been showing more and more content providers how to do this to help save their bandwidth and continue to service their users.

One of the main problems with feeds and feed parsers, is that they don't properly adhere to the standards (again with the standards) for caching, and just continue to pound the server for the feed over and over and over, even when it shouldn't and even when content hasn't changed. This is a larger (and growing) problem.

Its the same with these spiders run on the client side, and its the primary reason why content providers block and ban them.

Users want the content as fast as possible, and decide to slam the server to get it.

The content proiders want to give their users a responsive browsing experience, but can't if 1,000 separate spiders are slamming into their site, ignoring caching rules, robots.txt, and deep-linking, etc.

This leads to blocking, banning, and other techniques to stop the users from abusing the server's resources.

Fun times, cat and mouse and all.
hacker is offline   Reply With Quote
Old 03-13-2005, 03:25 PM   #15
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by hacker
You mean "forge", not set. Many content providers (myself included) are beginning to reject forged referers, if the host and the referer don't match properly. Easy to do, and I've been showing more and more content providers how to do this to help save their bandwidth and continue to service their users.
The reference explains that you should use the Referer property only as a last resort.

I've mailed with webmasters of several big site (including CNET) about the link rewriting capability and none of them had any objections. On the contrary, they understood that link rewriting actually helps to reduce the bandwidth usage. Printable versions usually have no navigation or banner images and contain the entire article. Many NYT articles, for instance, are split across multiple pages in the "normal" version, requiring multiple requests to obtain them in their entirety.

Quote:
Originally Posted by hacker
One of the main problems with feeds and feed parsers, is that they don't properly adhere to the standards (again with the standards) for caching, and just continue to pound the server for the feed over and over and over, even when it shouldn't and even when content hasn't changed. This is a larger (and growing) problem.

Its the same with these spiders run on the client side, and its the primary reason why content providers block and ban them.
Sunrise goes to great lengths to reduce bandwidth usage. It has a download cache that works with ETag, Last-Modified and Expires headers, so it can perform conditional IMS/INM requests. There's also the option to end an update prematurely if the source URL's content hasn't changed.
Laurens is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic G:RSS: Optimized Google Reader (RSS) for the Nook [BETA Testers needed] Fmstrat Barnes & Noble NOOK 24 12-28-2010 12:22 PM
G:RSS: Optimized Google Reader (RSS) for the Kindle 3 (and Nook) Fmstrat Amazon Kindle 47 12-13-2010 12:20 PM
Is there a good way to convert partial rss to full rss feeds. Zorz Other formats 5 05-29-2010 12:17 PM
Firmware Update Two-Point-Five NOW! Sheikspeare Amazon Kindle 137 05-12-2010 02:08 AM
PRS-600 Sound off.....at this point, who's got one? DougFNJ Sony Reader 76 09-23-2009 12:01 PM


All times are GMT -4. The time now is 12:58 AM.


MobileRead.com is a privately owned, operated and funded community.