![]() |
#16 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
Picking up b_k's idea
![]() Quote:
Now, it is possible retrieve and include the contents of a linked article and get it displayed in either HTML or LaTeX. In order to achieve this an additional flag had to be (re-)introduced -r and the "syntax" of the -f flag was extended. Its syntax is now Code:
-f <URL>;<start>;<stop> Unless -r is set, there won't be any downloads, irrespective of whether any <start>- or <stop> tags are given. Details on the usage can be found in my personal .getfeedrc I attached. Then maybe a few words of caution should be said (before getting flamed)
And if you want to know whether this is something for you, just have a look at the PDF attached. Hoping that someone finds this useful... Last edited by Tommy; 01-28-2007 at 11:16 AM. |
|
![]() |
![]() |
![]() |
#17 |
Entrepreneur
![]() Posts: 36
Karma: 10
Join Date: Oct 2007
Location: California
Device: Iliad v2
|
I am struck by how cool this could be if it were done legitimately. What I mean is if you came up with a way to pay an author for his or her reportage and a way to select what you were willing to pay for the article you could actually create something really useful out of this. Rather than try to steal the content out from under some web site which is using it to generate advertising revenue which pays the salaries of the people who are running the site in the first place.
It is too bad that "real" newspapers are so hung up on "protecting" their cash cow (which is hemmoraging but they can't start raising a new cow before it dies somehow) that they don't really "get" this opportunity. --Chuck |
![]() |
![]() |
![]() |
#18 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,359
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I just noticed this thread, I've made a lot more progress on this, though for the SONY Reader. I can generate beautifully formatted LRF files with nice hierarchical table of contents from the RSS of the nytimes, bbc and newsweek. It uses the print version of the articles, so no pictures, but otherwise generates a very pretty ebook.
It's based on a pretty simple plug-in system that should allow people to write plugins for their favorite feeds. All part of libprs500. |
![]() |
![]() |
![]() |
#19 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
Quote:
And as for ads coming along, first, an RSS feed does not transport any adds, secondly the ads will actually read if they are on the page... admittedly only by the tool, but so what? When it comes to counting hits, it doesn't matter and the content provider still can tell the advertiser how many hits he got on this particular page. And thirdly, depending on where and how the ad is placed, it will still appear on the ebook. Tommy |
|
![]() |
![]() |
![]() |
#20 |
Entrepreneur
![]() Posts: 36
Karma: 10
Join Date: Oct 2007
Location: California
Device: Iliad v2
|
I agree with you Tommy that pulling the RSS feed and putting it on the Iliad is a perfectly legitimate use of the feed. I wrote something similar to your perl script in python. The "stealing" part involves fetching the whole story, stripping off the window dressing that it had on its web site, and putting that on the Iliad. The provider of the feed expects you to click a link in your RSS reader and to go to their web site, which will display a bunch of annoying ads and on the off chance you click on one will pay them a bit of coin. So if you suck the story off the site and strip out their ads and such and put it on the Iliad they think of that as 'stealing' their content, just like they complain when people put their web page in a frame with someone else's advertising outside the frame.
I make no claim as to the rightness or wrongness of this, but for better or worse it is the current business model people like Reuters, AP, Etc use to "monetize" their work (that is code for get paid for having people do this all day long). I managed to get AP to tell me what it would cost to push the whole story to an Illiad and they said between $400 - $600 per story depending on how many people it was being sent to (I know that probably doesn't make sense but they see it as a way of collecting a fraction of the money you will be making off the story as sized by your readership, they are stuck in the magazine/newspaper model where number of subscribers determines what you can charge for ads, so if you have a lot of subscribers you can charge a lot for ads and make more per page, etc etc.) Personally I'd like to cut AP out of the loop, basically create an automated system whereby people could submit a story for publication, pay them a fixed price for it, and then put together a newspaper from the best stories. But some people can't write, and other people are carrying some sekrit agenda (like they work for Microsoft in their day job) and so out of the chute I don't want to just pay people $500 a story but rather $1 a story and then publish it and figure out some way of measuring their credibility, as their credibility index goes up would be happy to pay them more. Sort of like reading Slashdot at a high moderation level. I figure an honest, hard working, journalist who reports a balanced account of the story is worth 500x more than one who is being compensated to be the mouthpiece of some special interest. Unfortunately there isn't a "Special Interest Lapdog" registry ![]() So the value-add of an Associated Press is that they have, in theory, screened their journalists and pay them an appropriate amount to keep them honest. If someone wrote two decent articles a week and got paid $500 each for them that would be a pretty decent wage in many parts of the USA. Anyway, to hammer the point home. Ask any "famous" blogger for permission to pull their blog entries and publish them in your e-paper magazine. I expect most of them would ask you to pay them for that right, and if you said "But I don't pay anything to read you blog on Blogger" they will say but they get advertising revenue from visits to their blog page that they wouldn't get from you. So if you re-published them without their permission they might call it 'stealing' from them. --Chuck |
![]() |
![]() |
![]() |
#21 |
Member
![]() Posts: 21
Karma: 12
Join Date: Sep 2007
Device: Irex ILiad
|
![]()
Dear Tommy,
as X-MAs is near ;-) could you please provide any hint how to getfeed "Der Standard" and "Spiegel" properly. I have tried several configs and the most reasonable for me would be: Code:
-f http://derStandard.at/?page=rss&ressort=Newsroom;<!-- google_ad_section_start -->;<!-- google_ad_section_end --> Unfortunately even worse is : Code:
-f http://www.spiegel.de/schlagzeilen/rss/0,5291,,00.xml;<h4>;<div class="spDottedLine"> Kind regards Harald Last edited by fodiator; 12-21-2007 at 10:00 AM. |
![]() |
![]() |
![]() |
#22 |
Connoisseur
![]() ![]() ![]() Posts: 65
Karma: 256
Join Date: Nov 2007
Location: Switzerland
Device: Iliad, Kindle K3, iPad , iPhone, etc...
|
some feeds are harder than others... as the script needs to remove 'code' from the newsfeed.
a modification i made (easy enough if you look at the scripts) is to get the scripts to download the 'print' version rather than the webversion (usually if you go to print, you will see its a modification of the original URL which you got from the RSS feed) the print version often (havent checked the spiegel) has less code and formatting and so the scrapping works better. tommy, are you still around these parts? if so i could send you my modifications for inclusion if you wish. |
![]() |
![]() |
![]() |
#23 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
Hi Harald,
First, sorry for the late reply, I saw your message just today. I had a look at the "articles" of the Standard and all I found is essentially some javascript code. And therefore the tags you provided for that feed cannot pull anything from the respective article, so as the previous poster mentioned, one might need to change the code, to get the actual articles buried somewhere out there. As for the Spiegel I saw that the feed doesn't provide a description tag?! So, all we get there are the headlines..., but the links work! One of the LaTeX errors you receive for Spiegel articles is due to the start tag specified Code:
<h4> I regret I cannot look deeper into this right now, but as I'll be heading for holidays today, I'm a bit in a hurry. Guten Rutsch and for the EN-speakers among us Happy New Year, Tommy Quote:
|
|
![]() |
![]() |
![]() |
#24 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
Quote:
My email would be: tommy.berndt(at)gmx.de. But as I'll be away for a fortnight it'll take some time until I'll have a look at the code. So, I think it better if you published your version directly here in the forum, such that everyone could access it immediately. I uploaded my current version of getfeed.pl together with a config file, that might (or might not) be useful... Tommy |
|
![]() |
![]() |
![]() |
#25 |
Member
![]() Posts: 21
Karma: 12
Join Date: Sep 2007
Device: Irex ILiad
|
Suggestion
Hi,
I have jumped into perl and fiddled around with getfeed code to bring DerStandard.at to work. Although my implementation is quite ugly (hard coded) and I still have some probs concerning charmaps and special characters the result is promising. I found out that Tommys improvement to define some start- and stoptags in the getfeedrc file would not neccessarely be enough for complexer Web-Services. I would therefore like to discuss the idea of implementing kind of modules (containing perl code) to handle specific formatting of index and content pages. As I am a Perl newcomer I would not dare to propose how this could be done the best way, so feedback is kindly welcome! Nevertheless I would be glad to provide my getfeed patch if there is any interest. Kind regards Harald Last edited by fodiator; 01-11-2008 at 08:44 AM. |
![]() |
![]() |
![]() |
#26 |
Connoisseur
![]() ![]() ![]() Posts: 65
Karma: 256
Join Date: Nov 2007
Location: Switzerland
Device: Iliad, Kindle K3, iPad , iPhone, etc...
|
sry, been away
attached is the changed file my extra option is -P (for printed version) and takes a parameter which is from-url;to-url; which is a normal reg expression an example of my command line would be: Code:
getfeed.pl -o BBC.tex -F tex -S ../res/iliad.sty -s -r -t BBC -C "/usr/texbin/pdflatex -interaction=nonstopmode BBC.tex" -f "http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml;<div class="headline">;<div class="footer">" -P 'http:\/\/(.*);http://newsvote.bbc.co.uk/mpapps/pagetools/print/;' |
![]() |
![]() |
![]() |
#27 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
Hi,
back from holidays (unfortunately, but all things have to end somewhen...) The idea technobear implemented is actually a very neat one ![]() ...provided you have something like a regexp-view to easily see how the original URL converts to the "printed view" URL. ![]() I have to admit, I wouldn't have come up with that regexp. ![]() But as the idea is out and if all printed view URLs are actually so easily to transform, it's absolutely worth incorporating. (However, currently it works only if a single feed is used, given more, the second (all the following) will fail. So, a little further hacking will be needed) Quote:
|
|
![]() |
![]() |
![]() |
#28 | |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
RE: Suggestion
Hi Harald,
Quote:
But I'm afraid that this road would/will (further) alienate non-perl speaking people from getfeed, unless some default behaviour remains in the core for standard feeds, that don't need special processing. However, the more I think about it, the more I warm to the idea ... (some ideas have already started popping up ![]() If anyone has already thought out something in this direction, please speak out! Tommy Last edited by Tommy; 02-03-2008 at 04:07 AM. |
|
![]() |
![]() |
![]() |
#29 |
Enthusiast
![]() Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
|
![]()
Hi all,
here comes a new version of getfeed which incorporates both the ideas of thetechnobear and fodiator proposed above. Some (sort of) documentation: Code:
getfeed V0.9e (c) by T.Berndt
This program comes with ABSOLUTELY NO WARRENTY.
usage: getfeed [...] [-o <outfile>] [-f] <feed> [<feed_1> ...]
-f <feed>[;<start>;<stop>;<filter>;<server>;<srcURL>;<toURLa>;<toURLb>]
: <feed> is a URLs or a filenmae.
-d <directory: saves output into <directory>
-o <outfile> : saves output into <outfile>
-t <title> : Title of this news' edition
-r : Retrieve and append linked atricles. Default: no
-R <file> : Reads <file> instead of .getfeedrc
-e <charset> : Use <charset> for encoding. Default: utf-8
-F <format> : Output format: html(obvious) or tex(LaTeX) Default: html
-S <style> : Reads <style> and adds its content as style-information.
-P <package> : Adds a \usepackage{<package>} into the LaTeX-file
-C <cmd> : Execute <cmd>
-m : format text in two columns
-a : Auto-name the output as news_YYYYMMDD.<format> Default: no
-v : Print debugging info to STDERR/<log>.
-s : Suppress all output. Default: no (i.e. not silent)
-l <log> : Writes debugging information to <log>
Run getfeed -v -h for more information!
getfeed reads news-feeds and converts them into either an HTML-or LaTeX-file.
The feeds currently understood are RSS, ATOM and RDF {0.91, 1.0, 2.0}.
WARNING: As can be read above fodiator's idea to facilitate plugins has been realised by just calling an external program to "massage" the current item's page and return its result to getfeed for inclusion. Of course, this opens every door to malign code to wreak havoc on your computer, so it's up to you to check that program carefully, beforehand. I chose this approach as (i) it offers users to use and provide there own logic in any language they like, (ii) it doesn't impose any artifical restrictions like interfaces or APIs, and (iii) it is the simplest approach to realise second WARNING: I haven't checked this feature myself! I only wrote two sample programs - caller.pl and callee.pl - as proof of principle. Hoping you find it useful... Regards, Tommy --- please note, the "plugin" mechanism doesn't work yet :-( I just checked it. --- UPDATE The "plugin" mechanism has been fixed and is working now! I uploaded the latest version (0.9e) of getfeed along with an example "plugin" (callee.pl). This program does nothing but turn the text into upper case, to illustrate the usage of this feature. However, it might also serve as template or a starting point for your "plugins" Last edited by Tommy; 02-02-2008 at 06:07 AM. Reason: bug-report |
![]() |
![]() |
![]() |
#30 | |
Member
![]() Posts: 21
Karma: 12
Join Date: Sep 2007
Device: Irex ILiad
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Classic The Daily | hjordanh | Barnes & Noble NOOK | 3 | 02-05-2010 10:48 AM |
Daily notifications? | devilsadvocate | Feedback | 8 | 01-22-2010 12:24 PM |
Daily Dilbert | billbadger | Calibre | 2 | 12-09-2009 02:42 PM |
Daily Comics | billbadger | Calibre | 0 | 12-08-2009 07:22 PM |
Amazon Daily | daffy4u | Amazon Kindle | 13 | 06-04-2008 07:07 PM |