MobileRead Forums - View Single Post

nekokami · 09-04-2008, 04:13 PM

I really don't want to have to play with yet another browser, but I suppose if Google Chrome gains any appreciable market share, I'll have to install it so I can test the website I am responsible for. :sigh:

The Google ethical question is interesting. I've heard Taylor's concerns before, but not so clearly stated as they ended up being in this thread.

Quote:

Originally Posted by Taylor514ce

Both. The "Cache" and the advertising system are linked. To list specific items of concern to me:

1. Caching content in the first place. I have concerns about privacy, and authorial control.
2. Using the cached content (used for search results) to drive advertising.

Regarding the first point: The data doesn't have to be stored for the indexing to take place, and I appreciate that what Taylor is concerned with here is that Google can continue to offer copies of content he has removed, for example.

Let's suppose that one is responsible for providing critical information, e.g. current status on an epidemic. One would need to be able to control whether "stale" versions of that content are displayed. Google cache could display such stale content.

However, is it reasonable to require content publishers to "opt in" to allowing this data to be cached for viewing, when again, this is the model upon which the web was built? Perhaps the "no-cache" directive should have more strength than a "gentleman's agreement." There might be problems enforcing this internationally, however. It would probably need to be appended to something like the Berne convention.

Regarding the second point: is it reasonable that Google assumes that content publishers want their sites to be listed within a Google search result, regardless of the fact that Google will place advertisements on that page? Is it legal? Is it an ethical use of publicly available, but privately owned data?

I'm not an expert on the law, by any means. But I can readily come up with examples of cases in the physical world in which directory listings of publicly available information are provided either for a fee or via advertising-supported means. For example, most telephone companies in the US provide listings of residential and business phone numbers, and advertising is often sold in the same printed volumes. (In fact, in recent years, the phone company I deal with has taken to charging a fee not to list this data.) Names and addresses are collected by numerous agencies and sold to direct marketers. I don't like this process, but it is apparently legal. Photographs of scenic villages are used to advertise venues located in those villages. I don't believe the owners of the houses and gardens that are included in these photographs even need to be asked if they want their property included in such a photograph, so long as the view is from a public location, such as the street. The distinction, again, seems to be where to draw the line between "public" and "private."

Quote:

Originally Posted by Taylor514ce

Ideally, I want a web that is driven by content authors. I think the future web will have "search agents" rather than "search engines". Rather than go to a "search engine" which consumes sites and offers results/ads, we'll be able to construct custom search agents and send out our own spiders and bots, which will return results without violating copyright and without weighting the results based on advertising campaigns. Personal sytsems will become powerful enough to create your own individual "search cache", and web sites will cooperate in communicating and updating "subscribers". In essence, "Google" will become irrelevant.

Quote:

Originally Posted by axel77

To some very primitive degree of this is the way how gnutella searches work. You send away your query string with your destination IP, and it will spread itself autonomously over the gnutella network, and every node that has a hit, or knows of a hit sends the answer to your IP. However its a very primitive searching technique which a) only searches contents from the title line. b) It takes eons to get a good answer list.

This is one problem I can see with this kind of content author-driven search service. Another is that this really only distributes the caching problem even more widely.

It also assumes that users-- owners of private computers-- will be willing to devote their own resources to being part of a search mechanism, rather than putting up with ads (and the bias that comes with them). Based on my observations of seeder vs. leecher ratios on bittorrent networks, I'm not optimistic.

But let's assume that some non-commercial search entity exists and is effective. One would need to be able to determine whether to allow searching of ones content by commercial or non-commercial means, and the present robots.txt and nocache directives would not be sufficient, because they would exclude searches by such non-commercial services. The question might then become: do we need a "non-commercial" directive, or a "commercial-ok" directive? I.e., should allowing commercial search engines to index and/or cache one's site be opt-in, or opt-out?

I don't know the answer to this. I suspect that the vast majority of people who create web content would prefer to be included in both commercial and non-commercial search engines, even if both were viable. So from a usability standpoint, opt-out would make sense. From a legal and ethical standpoint, however, I can see preferring opt-in. Either way, again we would need something stronger than a "gentleman's agreement," which would probably require an international convention.

Well, those are my current thoughts on this complex issue....