01-25-2008, 10:46 AM | #16 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
All feed requests should go here
https://libprs500.kovidgoyal.net/ticket/405 |
01-25-2008, 01:45 PM | #17 | |
Fanatic
Posts: 525
Karma: 1300001
Join Date: Jan 2008
Location: Keene, New Hampshire
Device: iPad Mini, iPad Pro, Fire 8", iPhone, PaperWhite 2
|
Quote:
|
|
Advert | |
|
01-25-2008, 01:46 PM | #18 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just register an account at https://libprs500.kovidgoyal.net/register then login and go to the ticket site, it will let you add a comment. Add a comment with your request.
|
01-25-2008, 02:03 PM | #19 | |
Fanatic
Posts: 525
Karma: 1300001
Join Date: Jan 2008
Location: Keene, New Hampshire
Device: iPad Mini, iPad Pro, Fire 8", iPhone, PaperWhite 2
|
Quote:
|
|
01-25-2008, 02:28 PM | #20 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Now you just have to wait for some kindly soul to write the profile for you
|
Advert | |
|
01-25-2008, 04:01 PM | #21 |
Fanatic
Posts: 525
Karma: 1300001
Join Date: Jan 2008
Location: Keene, New Hampshire
Device: iPad Mini, iPad Pro, Fire 8", iPhone, PaperWhite 2
|
|
01-25-2008, 04:21 PM | #22 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It isn't going to be me I prefer to work on the infrastructure of libprs500 and only add feeds if I want to use them. But there have been several people that have expressed an interest in writing feeds, so hopefully one of them is interested in Middle east news.
|
01-25-2008, 04:54 PM | #23 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
The Old Man,
You didn't have to wait long; attached is a quick and dirty that will download the first 10 articles in the following Jerusalem Post feed: Front Page Israel News International News Middle East News Editorials kovidgoyal The last bit of code fixed up the problem with pubdate in the profile for Agenzia Fides. I still am having some problems with how the summary is being displayed (cosmetic but ugly - various html tags are being displayed. Most notably <b></b> and <br>) Meanwhile I have start on one for the Christian Science Monitor. And they have one wild way of directing you to the files. The href points to (and later on in a <link></link>) you are pointed to: http://rss.csmonitor.com/~r/feeds/to...4s01-woaf.html which resolves to http://www.csmonitor.com/2008/0124/p04s01-woaf.html with the print version being at http://www.csmonitor.com/2008/0124/p04s01-woaf.htm The rub is that if you change the original address to http://rss.csmonitor.com/~r/feeds/to...04s01-woaf.htm it too resolves to the .html file. At first I thought this was going to be an easy one, the date is in the number 222417173 all we have to do is convert it to ascidate parse out the /2008/0124/ as '/%Y/%m%d/' and build the required address string. Doesn't work the number resolves to 1977 01 18. I can fix it by adding 2001 01 07 as an offset (that may have to be 06). Is that likely to be legitimate? Have I overlooked something. The Christian Science Monitor also does not return a valid pubdate and unless you set use_pubdate = False you go no where. However in examining the source for the feed there always seems to be two date entries for each article articlesortdate="0222880260.000000" articlelocaldate="0222885964.644872" which seem to be the epochdate of the files. would it not be possible to capture either or both? Can I get at them in my profiles? I am a bit unsure what declarations that would have to be made. |
01-25-2008, 05:22 PM | #24 |
Fanatic
Posts: 525
Karma: 1300001
Join Date: Jan 2008
Location: Keene, New Hampshire
Device: iPad Mini, iPad Pro, Fire 8", iPhone, PaperWhite 2
|
Thank you. Now I will attempt to use it. Wish me luck.
|
01-25-2008, 05:43 PM | #25 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@Deputy-Dawg
Why not let the Christian Science Monitors servers figure out the date mapping for you. Here's some code that should do just that Code:
def print_version(self, url): resolved_url = self.browser.open(url).geturl() return resolved_url.strip()[:-1] As for article date, I'm afraid there isn't any way to access that short of re-implementing the parse_feeds function. |
01-25-2008, 11:27 PM | #26 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
I am attaching a copy of the profile for the Christian Science Monitor. I am having a problem that you may have to see to understand. For reference, every article in the feed has a structure like this:
<div class="apple-rss-article apple-rss-read" onclick="javascript:handleArticleClick(this)" showSeparator="true" articlesortdate="0223013377.017225" articlesorttitle="gaza busts out of its blockade" articlesortsource="" sourceindex="0" articlesortid="00000000000000000010" articlelocaldate="0223013377.017225" articleid="a91c09df43f4cf6a33ffed73cecf111efe81204 a"> <div class="apple-rss-article-footer"></div> <div class="apple-rss-article-head" > <div class="apple-rss-unread-dot"><img src="file://localhost/System/Library/Frameworks/PubSub.framework/Versions/A/Resources/PubSubAgent.app/Contents/Resources/unread.tif" width="9" height="9" /></div> <div class="apple-rss-subject" title="Gaza busts out of its blockade"><a href="http://rss.csmonitor.com/~r/feeds/top/~3/222417168/p01s04-wome.html">Gaza busts out of its blockade</a></a></div> <div class="apple-rss-summary" >A new hole opens in the Arab-Israeli peace strategy of isolating Hamas.</div> <div class="apple-rss-date" title="Today, 10:09 PM">Today, 10:09 PM</div> </div> <div class="apple-rss-article-body-container"> <div class="apple-rss-article-body"> A new hole opens in the Arab-Israeli peace strategy of isolating Hamas. <p><a href="http://rss.csmonitor.com/~a/feeds/top?a=rt0NVe"><img src="http://rss.csmonitor.com/~a/feeds/top?i=rt0NVe" border="0" /></a></p> <div class="feedflare"><a href="http://rss.csmonitor.com/~f/feeds/top?a=7LSTtWD"><img src="http://rss.csmonitor.com/~f/feeds/top?i=7LSTtWD" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=bYiAxtD"><img src="http://rss.csmonitor.com/~f/feeds/top?i=bYiAxtD" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=ISh8dED"><img src="http://rss.csmonitor.com/~f/feeds/top?i=ISh8dED" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=FL3bvEd"><img src="http://rss.csmonitor.com/~f/feeds/top?i=FL3bvEd" border="0" /></a></div> <img src="http://rss.csmonitor.com/~r/feeds/top/~4/222417168" height="1" width="1" /> <a class="apple-rss-article-link" href="http://rss.csmonitor.com/~r/feeds/top/~3/222417168/p01s04-wome.html">Read more…</a> <!-- end articlebody --></div></div> <!-- end article --></div> The entire block: A new hole opens in the Arab-Israeli peace strategy of isolating Hamas. <p><a href="http://rss.csmonitor.com/~a/feeds/top?a=rt0NVe"><img src="http://rss.csmonitor.com/~a/feeds/top?i=rt0NVe" border="0" /></a></p> <div class="feedflare"><a href="http://rss.csmonitor.com/~f/feeds/top?a=7LSTtWD"><img src="http://rss.csmonitor.com/~f/feeds/top?i=7LSTtWD" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=bYiAxtD"><img src="http://rss.csmonitor.com/~f/feeds/top?i=bYiAxtD" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=ISh8dED"><img src="http://rss.csmonitor.com/~f/feeds/top?i=ISh8dED" border="0" /></a> <a href="http://rss.csmonitor.com/~f/feeds/top?a=FL3bvEd"><img src="http://rss.csmonitor.com/~f/feeds/top?i=FL3bvEd" border="0" /></a></div> <img src="http://rss.csmonitor.com/~r/feeds/top/~4/222417168" height="1" width="1" /> Is being used as a summary in the contents page, I have tried many various forms in the preprocess_regexps section to no avail. I also tried setting summary_length = 0 (and 100 on the off chance it did accept 0 as an argument) and again no effect. Of course the profile is useable but the output is ugly as sin! Finally is it possible to embed an HTML option in the profile? Specifically the --ignore-tables, again it is only for cosmetic effects. Last edited by Deputy-Dawg; 01-26-2008 at 11:27 AM. Reason: Uploaded repaired profile |
01-25-2008, 11:34 PM | #27 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Set
Code:
html_description = True html2lrf_options = ['--ignore-tables'] |
01-26-2008, 04:44 PM | #28 |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2008
Device: Prs 505
|
Iso-8859-1 feed howto
Hello,
Using this wonderful program (thank's a lot Govid!), i have tried to add the support for "Le Monde" a french newspaper. It was working pretty well, but yesterday they changed both their structure and encoding, switching from utf8 to iso-8859-1. Now, my new profile captures the articles but with weird encoding. If i add in the regex,for instance, <head><meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"></head> my characters are correct, but all the crap is not stripped from the articles. Here is my profile I would be very grateful for your help... |
01-26-2008, 04:59 PM | #29 |
creator of calibre
Posts: 44,022
Karma: 22669822
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Not sure what you mean. I tried it and it works fine for me. See attached LRF
|
01-26-2008, 05:17 PM | #30 |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2008
Device: Prs 505
|
This file seems to be fine, but some french letters such as "à, ê,ù..." are not correctly displayed.
à for instance becomes r ... That is my problem wich appears only in the articles, not in the index and the abstracts. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RSS Feed | timezone | Feedback | 8 | 01-02-2010 06:55 PM |
RSS Feed questions | rambling | Calibre | 2 | 11-20-2008 05:35 AM |
Working User Profile for Wired.com RSS feeds for libprs500 | DaveNB | Calibre | 6 | 11-30-2007 07:00 AM |
RSS Feed Updates | Alexander Turcic | Announcements | 0 | 06-11-2004 04:11 PM |