Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 05-18-2010, 10:54 PM   #1936
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Quote:
Originally Posted by mwheinz View Post
Code:
    def get_article_url(self, article):
        return article.get('feedburner_origlink', None)
The quoted code is not needed. Calibre already does that by default.
kiklop74 is offline  
Old 05-19-2010, 05:13 AM   #1937
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
First Question:
Is there the option to add one or more lines (like the signature of the article, when the signature is a gif and it is into a table (td) withouth tag) to the downloaded article?

Second Question:
some newspaper give the opportunity to read the entire newspaper in various format (a jpg for every page, or a single pdf file for every page) directly in the browser. Is there the possibility to download these files? i
Now i use the first jpg (pdf) for the cover image, so i am able to find the correct page and the correct date, but it is only initial page, and with a fixed resolution.
At least this is a good option to obtain an overall image of all the newspaper, though it is not give a comfortable reading.

Last edited by gambarini; 05-19-2010 at 05:38 AM.
gambarini is offline  
Old 05-19-2010, 06:55 AM   #1938
mwheinz
award-winning bozo
mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.mwheinz can grok the meaning of the universe.
 
Posts: 245
Karma: 157113
Join Date: Sep 2009
Location: Philadelphia
Device: Sony PRS-600
Quote:
Originally Posted by kiklop74 View Post
The quoted code is not needed. Calibre already does that by default.
It's in many of the included recipes, such as slashdot, latimes and motherjones, which is why I used it. However, I suspect the real problem was I needed to set "use_embedded_content" to false.
mwheinz is offline  
Old 05-19-2010, 07:30 AM   #1939
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
def get_article_url(self, article):
link = article.get('links')

if link:
return link[0]['href']

Now i am able to find the correct link; but i have another problem:
i don't find the title, so the article show correctly but in the initial page (with all the article) any title.....
Code:
{'summary_detail': {'base': '', 'type': 'text/html', 'value': u'ROMA<br />\xabNo, non \xe8 normale\xbb. Gianfranco Fini, da presidente della Camera, non apprezza che i "suoi" deputati lavorino solo due giorni alla settimana, come \xe8 capitato di recente. E torna a stigmatizzare la pigrizia delle aule parlamentari. Cos\xec non si pu\xf2 andare avanti, \xe8 il messaggio lanciato dal numero uno di Montecitorio. <br /><br />Fini denuncia il \xabparadosso\xbb che si sta creando: tutti stigmatizza ...(continua)', 'language': None}, 'updated_parsed': time.struct_time(tm_year=2010, tm_mon=5, tm_mday=18, tm_hour=11, tm_min=29, tm_sec=24, tm_wday=1, tm_yday=138, tm_isdst=0), 'links': [{'href': u'http://www.lastampa.it/redazione/cmsSezioni/politica/201005articoli/55141girata.asp', 'type': 'text/html', 'rel': 'alternate'}, {'type': 'text/html', 'rel': 'alternate'}], 'author': u'', 'image': {'height': 0, 'width': 0, 'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'link': u'', 'title': u''}, 'tags': [{'term': u'POLITICA', 'scheme': None, 'label': None}], 'updated': u'Tue, 18 May 2010 13:29:24 +0200', 'summary': u'ROMA<br />\xabNo, non \xe8 normale\xbb. Gianfranco Fini, da presidente della Camera, non apprezza che i "suoi" deputati lavorino solo due giorni alla settimana, come \xe8 capitato di recente. E torna a stigmatizzare la pigrizia delle aule parlamentari. Cos\xec non si pu\xf2 andare avanti, \xe8 il messaggio lanciato dal numero uno di Montecitorio. <br /><br />Fini denuncia il \xabparadosso\xbb che si sta creando: tutti stigmatizza ...(continua)', 'title_detail': {'base': '', 'type': 'text/plain', 'value': u'', 'language': None}, 'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'link': u'', 'title': u'', 'id': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'enclosures': [{'href': u'http://www.lastampa.it/redazione/cmssezioni/politica/201005images/fini05g.jpg', 'type': u'image/jpeg'}]}
Is there a solution? Is there the possibility to extract the title directly from the downloaded article?

The feed appear almost identical to other feeds that work correctly.
http://www.lastampa.it/redazione/cms...s_politica.xml

Last edited by gambarini; 05-19-2010 at 07:59 AM.
gambarini is offline  
Old 05-19-2010, 07:59 AM   #1940
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
Now i am able to find the correct link; but i have another problem:
i don't find the title ... Is there the possibility to extract the title directly from the downloaded article?
When I want to control the titles on that page, I use parse_index. Try reading up on it and see if it will solve your problem. Basically, you use it to give Calibre the title and URL you want to use.
Starson17 is offline  
Old 05-19-2010, 08:46 AM   #1941
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
First Question:
Is there the option to add one or more lines (like the signature of the article, when the signature is a gif and it is into a table (td) withouth tag) to the downloaded article?
I'm not 100% certain what you are asking. Preprocess_html or postprocess_html will let you add anything you want. You can add tags to the html with any content, including images. On your question about the table, are you asking how to put things into a table, or how to extract it from a table? Generally, both are possible with BeautifulSoup.

Quote:
Second Question:
some newspaper give the opportunity to read the entire newspaper in various format (a jpg for every page, or a single pdf file for every page) directly in the browser. Is there the possibility to download these files? i
Now i use the first jpg (pdf) for the cover image, so i am able to find the correct page and the correct date, but it is only initial page, and with a fixed resolution.
At least this is a good option to obtain an overall image of all the newspaper, though it is not give a comfortable reading.
Are you asking how to split up pdfs to get images found on pages 2 and beyond, or how to use content you already have access to?
Starson17 is offline  
Old 05-19-2010, 09:05 AM   #1942
mlstein
Enthusiast
mlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it ismlstein knows what time it is
 
Posts: 49
Karma: 2062
Join Date: May 2010
Device: iPad (one)
mwheinz--Thanks! Works like a charm!
mlstein is offline  
Old 05-19-2010, 09:38 AM   #1943
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by Starson17 View Post
I'm not 100% certain what you are asking. Preprocess_html or postprocess_html will let you add anything you want. You can add tags to the html with any content, including images. On your question about the table, are you asking how to put things into a table, or how to extract it from a table? Generally, both are possible with BeautifulSoup.
Yes... i must learn more about these two function.
Quote:
Are you asking how to split up pdfs to get images found on pages 2 and beyond, or how to use content you already have access to?
I want to download the entirely newspaper into the epub file.
If it is not readable, it is a good opportunity to have a generic look about the newspaper, and if it is readable...
gambarini is offline  
Old 05-19-2010, 09:39 AM   #1944
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by Starson17 View Post
When I want to control the titles on that page, I use parse_index. Try reading up on it and see if it will solve your problem. Basically, you use it to give Calibre the title and URL you want to use.
i don't understand; can you give me an example?


p.s.

EXCUSE FOR MY POOR ENGLISH!

Last edited by gambarini; 05-19-2010 at 09:44 AM.
gambarini is offline  
Old 05-19-2010, 10:34 AM   #1945
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
an example:
in this feed
Code:
http://www3.lastampa.it/fotografia/feedrss.xml/
i don't find any correct link;
i have tried with 'id', 'guid', 'link', 'links'.... nothing.
in 'ID' and in 'LINK' tag i find the obfuscated link.
what's wrong?
gambarini is offline  
Old 05-19-2010, 10:49 AM   #1946
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Quote:
Originally Posted by mwheinz View Post
It's in many of the included recipes, such as slashdot, latimes and motherjones, which is why I used it.
Those recipes where written before Kovid added support for feedburner_origlink to the calibre.
kiklop74 is offline  
Old 05-19-2010, 10:52 AM   #1947
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Quote:
Originally Posted by gambarini View Post
an example:
in this feed
Code:
http://www3.lastampa.it/fotografia/feedrss.xml/
i don't find any correct link;
i have tried with 'id', 'guid', 'link', 'links'.... nothing.
in 'ID' and in 'LINK' tag i find the obfuscated link.
what's wrong?
When I try a recipe with that feed calibre crashes in parsing xml. There is already simmilar problem with times online recipe. It appears to be some kind of bug.
kiklop74 is offline  
Old 05-19-2010, 11:12 AM   #1948
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Quote:
Originally Posted by kiklop74 View Post
When I try a recipe with that feed calibre crashes in parsing xml. There is already simmilar problem with times online recipe. It appears to be some kind of bug.
In this feed
Code:
http://www.lastampa.it/redazione/cmssezioni/politica/rss_politica.xml
the title tag is null; so now i find the original link, but i don't find the title of the article.

I'll try to use the parse_index statement.

Last edited by gambarini; 05-19-2010 at 11:23 AM.
gambarini is offline  
Old 05-19-2010, 11:45 AM   #1949
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
New Recipe

and so, this is the long awaited recipe.

vvv.lastampa.it

italian news paper
Attached Files
File Type: zip LaStampa.zip (1.2 KB, 88 views)
gambarini is offline  
Old 05-19-2010, 02:25 PM   #1950
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
i don't understand; can you give me an example?
Here's a standard usage. It may look complicated, but it's not that bad. A description is here.

Code:
    def parse_index(self):
            feeds = []
            for title, url in [('National', 'http://www.nzherald.co.nz/nz/news/headlines.cfm?c_id=1'),
                               ('World', 'http://www.nzherald.co.nz/world/news/headlines.cfm?c_id=2'),
                               ('Politics', 'http://www.nzherald.co.nz/politics/news/headlines.cfm?c_id=280'),
                               ('Crime', 'http://www.nzherald.co.nz/crime/news/headlines.cfm?c_id=30'),
                               ('Environment', 'http://www.nzherald.co.nz/environment/news/headlines.cfm?c_id=39'),
                              ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
        
    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class': 'col-300 categoryList'})
            date = div.find(attrs={'class': 'link-list-heading'})

            current_articles = []
            for tag in date.findAllNext(attrs = {'class': ['linkList', 'link-list-heading']}):
                if tag.get('class') == 'link-list-heading': 
                    break
                for li in tag.findAll('li'):
                    a = li.find('a', href = True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.nzherald.co.nz'+url
                    self.log('\t\tFound article:', title)
                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})

            return current_articles
Basically, you use the parse_index method when you want to control the title, description and/or date on that page, and already know the URL. A common use is when you can't parse an RSS feed automatically, and have to parse a web page to get the URL. However, I've never actually used it for that. Instead, I use it when I can figure out the URL in advance, because it's simple and there is no page or RSS feed. (I believe I used it for several comics recipes to pull the previous comics). Those recipes should be in this thread somewhere under my name.

Quote:
p.s.

EXCUSE FOR MY POOR ENGLISH!
I have less trouble understanding you than many native English speakers. I'm jealous that your English is so much better than my second language. I'm sure all the Italian speakers appreciate your efforts to build recipes for Italian web-sites. Keep up the good work!

Last edited by Starson17; 05-19-2010 at 02:27 PM.
Starson17 is offline  
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 01:50 PM.


MobileRead.com is a privately owned, operated and funded community.