Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2008, 05:18 PM   #31
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah ok I didn't see that because I don't know French. Can you give me some easy to recognize sentence I can use to test things?
kovidgoyal is offline   Reply With Quote
Old 01-26-2008, 05:32 PM   #32
Lemoine
Junior Member
Lemoine began at the beginning.
 
Lemoine's Avatar
 
Posts: 7
Karma: 10
Join Date: Jan 2008
Device: Prs 505
In the international section

Article:
"Londres est prêt à oeuvrer à un désarmement nucléraire total"

The correct title is:

"Lemonde.fr:Londres est prêt à oeuvrer à un désarmement nucléaire total - Europe"

Thanks a lot for your help!
Lemoine is offline   Reply With Quote
Advert
Old 01-26-2008, 05:55 PM   #33
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try the attached
Attached Files
File Type: zip Monde.zip (1.3 KB, 563 views)
kovidgoyal is offline   Reply With Quote
Old 01-26-2008, 06:09 PM   #34
Lemoine
Junior Member
Lemoine began at the beginning.
 
Lemoine's Avatar
 
Posts: 7
Karma: 10
Join Date: Jan 2008
Device: Prs 505
Awsome!

Thank's a lot for the program and for the help!
Lemoine is offline   Reply With Quote
Old 01-26-2008, 11:41 PM   #35
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
From DefaultProfile

timefmt = ' [%a %d %b %Y]' # The format of the date shown on the first page
url_search_order = ['guid', 'link'] # THe order of elements to search for a URL when parssing the RSS feed
pubdate_fmt = None # The format string used to parse the


Which would imply that only the classes 'link' and 'guid' are searched for the link. This is born out by the fact that when you process the feed from the Denver Post with

use_pubdate = False

get the error message

Skipping article as it does not have a link url

from the source for the feed for each article in the feed the following code appears:

<li class="regularitem" xmlns:dc="http://purl.org/dc/elements/1.1/">
<h4 class="itemtitle">
<a href="http://www.denverpost.com/ci_8088727">
Man hit in crosswalk, killed
</a>
</h4>
<h5 class="itemposttime">
<span>Posted: </span>
Sat, 26 Jan 2008 20:09:37 -0700
</h5>
<div class="itemcontent" name="decodeable">
A 22-year-old Denver resident was killed in Aurora Saturday when a 71-year-old man driving a pickup ran a red light on South Parker Road, then veered into a crosswalk.
</div>
</li>

the url for the article is only contained in the class itemtitle

similarly in the feeds from izvestia the url is only contained in the classes

mainnewstime and mainnewsnotice

and at that only the variable part of the link in the form:

/world/asia/20080127/97803220.html

Which has to be concantenated with http://www.rian.ru to obtain the fully qualified address.

is it possible to handle either of these cases in web2lrf?

BTW a profile runs much faster in the Terminal than when embedded in libprs500, also I have found that if I attempt to run more than about 3 profiles sequentialy librs500 crashes. I can get around the problem by quitting and restarting. No need to remove the previously captured feeds
Deputy-Dawg is offline   Reply With Quote
Advert
Old 01-27-2008, 01:23 PM   #36
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Not sure I follow. The Denver post for example has its links in both <link> and <guid> elements see for example http://feeds.feedburner.com/dp-news-national?format=xml

The problem is that the links are embedded in a CDATA section. So you should write print_version to handle that.

The GUI crashing should be fixed in the next release.

EDIT: Actually, since all the elements in that feed are CDATA escaped, you're going to have to wait for the next release of libprs500 to create a feed for the denver post

Last edited by kovidgoyal; 01-27-2008 at 02:06 PM.
kovidgoyal is offline   Reply With Quote
Old 01-27-2008, 02:19 PM   #37
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Sigh! What it means is that the Denver Post is offering what it characterizes as RSS feeds at at least 3 different URLs and in at least two different formats. The first one, the one I was questioning is at:

http://feeds.feedburner.com/dp-news-national


The xml feed which can be accessed from the link on the page I was using, and the one you found:

feed://feeds.feedburner.com/dp-news-national?format=xml

and finally one that you will be sent to if you click on the blue "RSS" link in your browsers address box and follow your nose the xml format of the RSS feed

feed://feeds.denverpost.com/dp-news-national?format=xml

which is pointing, I think, to the same page as the previous one.

I guess I will have to wait until the new verion of DefaultProfile is available to work on it any further. This is a rather interesting one to work on because it has no printer friendly version available and to be useful, at all it will be necessary to code a preprocess_regexps that will strip out the nasty bits leaving only the story. Looks like fun. But.....

As for izvestia I think it will be necessary to teach DefaultProfile to work with Russian syntax and the Cyrillic alphabet. I could open a ticket, but is suspect it would a warm day in Siberia before it would happen.

Thanks again for all of the assistance. BTW just as an example of Russian syntax (in English) try this one on for size!

<div class="mainrubric">

or another (in russian, in cyrillic)

<!-- /список новостей -->

And I speak every language except Greek! But that is greek to me!
Deputy-Dawg is offline   Reply With Quote
Old 01-28-2008, 08:04 PM   #38
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Profile for Reuters News Service

Attached is a profile to capture several of the feeds from Reuters. This proved to be fairly interesting to write. First of all the URL returned by web2lrf on this service only contained the file id. Took me a while to figure it out. Also they do not put up a file that is printer friendly so it was necessary to create code that would parse out the text from the display page. It was doable but it causes the program to be quite slow an apparently is quite cpu intensive. At least the cooling fan in my MacBook Pro runs quite a bit more than what would be its usual want.

in any event enjoy.
Attached Files
File Type: zip reuters.py.zip (1.1 KB, 556 views)

Last edited by Deputy-Dawg; 01-31-2008 at 08:37 AM. Reason: Fixed minor typo in the code "Emviroment" to "Environment"
Deputy-Dawg is offline   Reply With Quote
Old 01-28-2008, 08:38 PM   #39
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
They do seem to provide a print version for example given the id
USN2740109620080129
The print version of the article is at
http://www.reuters.com/articlePrint?...40109620080129

Also, since you're writing a lot of feeds, can I ask you to attach them to
https://libprs500.kovidgoyal.net/wiki/UserProfiles so other people can find them easily (I'll pick them up for inclusion from there, when I get the time). You will need to create an account https://libprs500.kovidgoyal.net/register and log in https://libprs500.kovidgoyal.net/login before being able to edit the Wiki page. Thanks.
kovidgoyal is offline   Reply With Quote
Old 01-28-2008, 09:56 PM   #40
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Be happy to!! Already have an account, just was not sure were to put them. I am working on one for the AP. It seems to work fine except that the TOC points to the end of the article not the beginning.

I to thought that Reuters has a print version available, but when you go to the url you posted you don't in fact go to the print page but back to the display page. I don't know why but...

I am attaching a copy of the AP profile to this message.
Attached Files
File Type: zip ap.py.zip (1.3 KB, 564 views)

Last edited by Deputy-Dawg; 01-31-2008 at 08:39 AM. Reason: uploaded working version of the AP Profile
Deputy-Dawg is offline   Reply With Quote
Old 01-29-2008, 05:11 PM   #41
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Kovidgoyal,
This is weird. I did a run with my AP profile using --keep-downloaded-files option. I then took the kept files and moved them into the normal user space while preserving the relative path lengths. I then examined them in BBedit, GoLive and Safari and did not find any thing in the code or in the appearance of the files. Finally I converted them to a LRF using the html2lrf function and the TOC points to the end of the stories not the beginning. What have I done wrong? Or better yet how do I fix it?
Deputy-Dawg is offline   Reply With Quote
Old 01-29-2008, 05:18 PM   #42
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Probably a bug in html2lrf, open a bug report and I'll look at it when I get time.
kovidgoyal is offline   Reply With Quote
Old 01-31-2008, 08:42 AM   #43
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
I have just uploaded new copies of the Reuters profile and the AP profile. Reuters to correct a minor typo. The AP profile is a full working version
Deputy-Dawg is offline   Reply With Quote
Old 02-24-2008, 02:00 PM   #44
Lemoine
Junior Member
Lemoine began at the beginning.
 
Lemoine's Avatar
 
Posts: 7
Karma: 10
Join Date: Jan 2008
Device: Prs 505
Description of the article disappeared in rss feed

Hello,

The description of each article in the rss feeds disappeared a week ago.

Is it a new feature? is it a bug?

I miss very much those descriptions and i wonder how i can retrieve them.

Anyone has an idea?

Thanks in advance

Lemoine is offline   Reply With Quote
Old 02-24-2008, 02:01 PM   #45
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,411
Karma: 27757236
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
In which feed?
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RSS Feed timezone Feedback 8 01-02-2010 06:55 PM
RSS Feed questions rambling Calibre 2 11-20-2008 05:35 AM
Working User Profile for Wired.com RSS feeds for libprs500 DaveNB Calibre 6 11-30-2007 07:00 AM
RSS Feed Updates Alexander Turcic Announcements 0 06-11-2004 04:11 PM


All times are GMT -4. The time now is 05:52 PM.


MobileRead.com is a privately owned, operated and funded community.