![]() |
#1 |
Member
![]() Posts: 14
Karma: 10
Join Date: Jun 2010
Device: kindle 3
|
New recipe for Kathimerini (Greek newspaper)
Here's a first shot at a recipe for the revised Kathimerini.
It only downloads today's news. Kathimerini usually updates at around 12:00 from Tuesday till Saturday and on 18:00 on Sundays (Athens time). Without photos (2,5MB for this Sunday's edition): Spoiler:
With photos (10MB for this Sunday's edition): Spoiler:
A few questions: I don't know how to add the cover image. Can anyone help? The cover picture for yesterday, February 1st, 2014, was "http://s.kathimerini.gr/resources/issue-cover/01-02-2014.jpg" I tried using only one RSS feed, but it wouldn't download more than 30 articles. So, I added &page=1 etc. to get the older articles. Is there a better way to do that? Is there a way to include cartoons but not other images? Apparently, the only thing that differentiates them from other images is that cartoon pages include the class "article_SKETCH" in the <body> tag. I removed all images with: Code:
remove_tags = [dict(attrs={'class':['clearing-featured-img']})] Last edited by jennie; 02-06-2014 at 07:54 AM. Reason: Updated recipe code |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,337
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
def get_cover_url(self): import time return 'http://s.kathimerini.gr/resources/issue-cover/%s.jpg' %time.strftime('%d-%d-%Y') Implement preprocess_raw() in your recipe and replace clearing-featured-img by 'dont-remove-me' if the page is a comic page. Then remove tags wont affect it. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 14
Karma: 10
Join Date: Jun 2010
Device: kindle 3
|
Hi Kovid, thanks a lot for your reply.
I still can't get the cover to work. Here's my latest code: Spoiler:
I'm using 02 instead of %d for testing purposes, because there is no issue today. I don't exactly know how to program in any language, so I'm having trouble using the rest of your advice. I don't think I want to try implementing parse_feed at this point, but I did try to read up on preprocess_raw_html, with no tangible results yet. I'd really appreciate it if you could give me the complete fixed code. I guess what I need to implement is something in the lines of: Code:
if body contains the class "article_SKETCH" (among others) replace 'class':['clearing-featured-img'] with 'class':['do-not-remove'] |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,337
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm afraid I dont have the time to write the code for you, for the cover, you need to use this:
Code:
def get_cover_url(self): import time return 'http://s.kathimerini.gr/resources/issue-cover/%s.jpg' %time.strftime('%d-%d-%Y') |
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 14
Karma: 10
Join Date: Jun 2010
Device: kindle 3
|
The problem with the cover was the indentation. Thanx!
I have updated the code in the first post with this and a couple of other tags to remove. I might check the comics situation later. If anyone else is interested in reading this newspaper and would like to give it a try, feel free. As it is now, the code still gives a pretty clean result, so you could add it to the repository if there are no changes in, say, a week's time. Last edited by jennie; 02-03-2014 at 04:51 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Jun 2010
Device: kindle 3
|
I updated the original recipe. It now downloads news in categories.
This issue persists: Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jul 2014
Device: Kindle PaperWhite
|
Could you please tell me how this works!? I'm new here. What is a recipe? Is it something that will allow me to read my paper on my kindle? What do I need to do?
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
New Greek News Recipe (TVXS) | hargikas | Recipes | 3 | 04-11-2013 04:14 PM |
Recipe for Berria (Basque newspaper) | arraintxo | Recipes | 2 | 04-23-2012 05:44 AM |
Kathimerini recipe on Kindle 3: Only first page shows | jennie | Recipes | 2 | 05-27-2011 04:06 AM |
Request:Recipe for malayalam newspaper | onenest | Recipes | 0 | 04-29-2011 05:32 AM |
Adding recipe for Tamil Newspaper | anthiyag | Recipes | 1 | 04-08-2011 03:18 PM |