05-18-2010, 10:54 PM | #1936 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
|
05-19-2010, 05:13 AM | #1937 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
First Question:
Is there the option to add one or more lines (like the signature of the article, when the signature is a gif and it is into a table (td) withouth tag) to the downloaded article? Second Question: some newspaper give the opportunity to read the entire newspaper in various format (a jpg for every page, or a single pdf file for every page) directly in the browser. Is there the possibility to download these files? i Now i use the first jpg (pdf) for the cover image, so i am able to find the correct page and the correct date, but it is only initial page, and with a fixed resolution. At least this is a good option to obtain an overall image of all the newspaper, though it is not give a comfortable reading. Last edited by gambarini; 05-19-2010 at 05:38 AM. |
05-19-2010, 06:55 AM | #1938 |
award-winning bozo
Posts: 258
Karma: 172703
Join Date: Sep 2009
Location: Philadelphia
Device: Kobo Libra 2
|
|
05-19-2010, 07:30 AM | #1939 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
def get_article_url(self, article):
link = article.get('links') if link: return link[0]['href'] Now i am able to find the correct link; but i have another problem: i don't find the title, so the article show correctly but in the initial page (with all the article) any title..... Code:
{'summary_detail': {'base': '', 'type': 'text/html', 'value': u'ROMA<br />\xabNo, non \xe8 normale\xbb. Gianfranco Fini, da presidente della Camera, non apprezza che i "suoi" deputati lavorino solo due giorni alla settimana, come \xe8 capitato di recente. E torna a stigmatizzare la pigrizia delle aule parlamentari. Cos\xec non si pu\xf2 andare avanti, \xe8 il messaggio lanciato dal numero uno di Montecitorio. <br /><br />Fini denuncia il \xabparadosso\xbb che si sta creando: tutti stigmatizza ...(continua)', 'language': None}, 'updated_parsed': time.struct_time(tm_year=2010, tm_mon=5, tm_mday=18, tm_hour=11, tm_min=29, tm_sec=24, tm_wday=1, tm_yday=138, tm_isdst=0), 'links': [{'href': u'http://www.lastampa.it/redazione/cmsSezioni/politica/201005articoli/55141girata.asp', 'type': 'text/html', 'rel': 'alternate'}, {'type': 'text/html', 'rel': 'alternate'}], 'author': u'', 'image': {'height': 0, 'width': 0, 'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'link': u'', 'title': u''}, 'tags': [{'term': u'POLITICA', 'scheme': None, 'label': None}], 'updated': u'Tue, 18 May 2010 13:29:24 +0200', 'summary': u'ROMA<br />\xabNo, non \xe8 normale\xbb. Gianfranco Fini, da presidente della Camera, non apprezza che i "suoi" deputati lavorino solo due giorni alla settimana, come \xe8 capitato di recente. E torna a stigmatizzare la pigrizia delle aule parlamentari. Cos\xec non si pu\xf2 andare avanti, \xe8 il messaggio lanciato dal numero uno di Montecitorio. <br /><br />Fini denuncia il \xabparadosso\xbb che si sta creando: tutti stigmatizza ...(continua)', 'title_detail': {'base': '', 'type': 'text/plain', 'value': u'', 'language': None}, 'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'link': u'', 'title': u'', 'id': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'enclosures': [{'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'type': u'image/jpeg'}]} The feed appear almost identical to other feeds that work correctly. http://www.lastampa.it/redazione/cms...s_politica.xml Last edited by gambarini; 05-19-2010 at 07:59 AM. |
05-19-2010, 07:59 AM | #1940 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
When I want to control the titles on that page, I use parse_index. Try reading up on it and see if it will solve your problem. Basically, you use it to give Calibre the title and URL you want to use.
|
05-19-2010, 08:46 AM | #1941 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
05-19-2010, 09:05 AM | #1942 |
Enthusiast
Posts: 49
Karma: 2062
Join Date: May 2010
Device: iPad (one)
|
mwheinz--Thanks! Works like a charm!
|
05-19-2010, 09:38 AM | #1943 | ||
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
Quote:
If it is not readable, it is a good opportunity to have a generic look about the newspaper, and if it is readable... |
||
05-19-2010, 09:39 AM | #1944 | |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
p.s. EXCUSE FOR MY POOR ENGLISH! Last edited by gambarini; 05-19-2010 at 09:44 AM. |
|
05-19-2010, 10:34 AM | #1945 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
an example:
in this feed Code:
http://www3.lastampa.it/fotografia/feedrss.xml/ i have tried with 'id', 'guid', 'link', 'links'.... nothing. in 'ID' and in 'LINK' tag i find the obfuscated link. what's wrong? |
05-19-2010, 10:49 AM | #1946 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
|
05-19-2010, 10:52 AM | #1947 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
When I try a recipe with that feed calibre crashes in parsing xml. There is already simmilar problem with times online recipe. It appears to be some kind of bug.
|
05-19-2010, 11:12 AM | #1948 | |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
Quote:
Code:
http://www.lastampa.it/redazione/cmssezioni/politica/rss_politica.xml I'll try to use the parse_index statement. Last edited by gambarini; 05-19-2010 at 11:23 AM. |
|
05-19-2010, 11:45 AM | #1949 |
Connoisseur
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
|
New Recipe
and so, this is the long awaited recipe.
vvv.lastampa.it italian news paper |
05-19-2010, 02:25 PM | #1950 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Here's a standard usage. It may look complicated, but it's not that bad. A description is here.
Code:
def parse_index(self): feeds = [] for title, url in [('National', 'http://www.nzherald.co.nz/nz/news/headlines.cfm?c_id=1'), ('World', 'http://www.nzherald.co.nz/world/news/headlines.cfm?c_id=2'), ('Politics', 'http://www.nzherald.co.nz/politics/news/headlines.cfm?c_id=280'), ('Crime', 'http://www.nzherald.co.nz/crime/news/headlines.cfm?c_id=30'), ('Environment', 'http://www.nzherald.co.nz/environment/news/headlines.cfm?c_id=39'), ]: articles = self.nz_parse_section(url) if articles: feeds.append((title, articles)) return feeds def nz_parse_section(self, url): soup = self.index_to_soup(url) div = soup.find(attrs={'class': 'col-300 categoryList'}) date = div.find(attrs={'class': 'link-list-heading'}) current_articles = [] for tag in date.findAllNext(attrs = {'class': ['linkList', 'link-list-heading']}): if tag.get('class') == 'link-list-heading': break for li in tag.findAll('li'): a = li.find('a', href = True) if a is None: continue title = self.tag_to_string(a) url = a.get('href', False) if not url or not title: continue if url.startswith('/'): url = 'http://www.nzherald.co.nz'+url self.log('\t\tFound article:', title) self.log('\t\t\t', url) current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) return current_articles Quote:
Last edited by Starson17; 05-19-2010 at 02:27 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |