Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-12-2011, 09:47 AM   #1
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Question

I want to create a new recipe with a parse index mehtod on
this page:
http://rassegnastampa.mef.gov.it/mef...e/Default.aspx

When i do this

print self.index_to_soup(url)

i don't obtain entire page but only a little part...
something like this:

Quote:
[html]
<table class="ResultsTable" summary="La tabella contiene gli articoli pubblicati nella rassegna stampa di giovedì 12 maggio 2011.">
<caption>
Articoli della rassegna
</caption><thead>
<tr>
<th class="DateCellShort" scope="col" id="data">Data</th>
<th class="TopicCellShort" scope="col" id="sezione">Sezione</th><
th class="PublicationCellShort" scope="col" id="testata">Testata</th>
<th class="TitleCellShort" scope="col" id="titolo">Titolo</th>
<th class="AuthorCellShort" scope="col" id="autore">Autore</th>
<th class="OcrLinkCellShort" scope="col" id="ocr">OCR</th>
</tr>
</thead>
<tr>
<td class="DateCellShort" headers="data">12/05/2011</td>
<td class="TopicCellShort" headers="sezione">MINISTRO</td>
<td class="PublicationCellShort" headers="testata">Corriere della Sera</td>
<td class="TitleCellShort" headers="titolo"></td>
</tr>
</table>
[/html]

why????
gambarini is offline   Reply With Quote
Old 05-12-2011, 02:44 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
i don't obtain entire page but only a little part...
something like this:
I get the entire page. The same thing that I get if I look at page source for that page. I assume you are running print self.index_to_soup(url) where the url is the url you posted, and not some later url that's linked on that page.

When asking why a custom recipe doesn't work, you really need to post it. It takes a lot of time to build a recipe trying to match what you've already done.
Starson17 is offline   Reply With Quote
Advert
Old 05-13-2011, 04:37 AM   #3
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
Hy Starson17

this is the entire recipe.
my previous post is incorrect.
I obtain the entire page, like you, and it's equal than the page obtained with "VIEW SOURCE".
My problem is:
i am not able to find anything in the page.
i have tried various combinatio of attribute but with no results.


PHP Code:
#!/usr/bin/env  python
__license__   'GPL v3'
__author__    'Lorenzo Vigentini, based on Darko Miletic, Gabriele Marini'
__copyright__ '2009, Darko Miletic <darko.miletic at gmail.com>, Lorenzo Vigentini <l.vigentini at gmail.com>'
description   'Italian daily newspaper - v1.01 (04, January 2010); 16.05.2010 new version'
'''
http://rassegnastampa.mef.gov.it/mefnazionale/Default.aspx
'''

from calibre.web.feeds.news import BasicNewsRecipe

class RassegnaMefParseIndex(BasicNewsRecipe):
    
author        'Marini Gabriele'
    
description   'Rassegna Stampa MEV'

    
cover_url      'http://rassegnastampa.mef.gov.it/Mef/sorg_n/nazionale.jpg'
    
title          u'Rassegna MEF'
    
publisher      'Ministero Economia e Finanze'
    
category       'News, politics, culture, economy, general interest'

    
language       'it'
    
timefmt        '[%a, %d %b, %Y]'

    
oldest_article 7
    max_articles_per_feed 
100
    use_embedded_content  
False
    recursion             
10

    remove_javascript 
True


    def parse_index
(self):
        
feeds = []

        for 
titleurl in [
             (
"Rassegna Nazionale""http://rassegnastampa.mef.gov.it/mefnazionale/Default.aspx"),
             (
"Rassegna Nazionale 2""http://rassegnastampa.mef.gov.it/mefnazionale/")
            ]:

            
soup self.index_to_soup(url)

            
articles = []

#Main Aperture 
            
soup soup.find(name='div'attr={'id':'results'})
            if 
soup:            
                
article soup.find('tbody')
                for 
article in soup.findAllNext('tr'):
                    
article_first article
                    tupla 
article.find(attrs={'class':'TopicCellShort'})
                    
title_url self.tag_to_string(tupla)
                    
tupla article.find(attrs={'class':'PublicationCellShort'})
                    
title_url += self.tag_to_string(tupla)
                    
tupla article.find(attrs={'class':'TitleCellShort'})
                    
title_url += self.tag_to_string(tupla)

                    
tupla article.find(attrs={'class':'OcrLinkCellShort'})
                    
link tupla.get('href'False)

                    
date ''
                    
description =  ''
                
if title_url:
                   
articles.append({'title'title_url'url'link,'description':description'date':date})
            if 
articles:
               
feeds.append((titlearticles))
        return 
feeds 

Last edited by gambarini; 05-13-2011 at 06:09 AM.
gambarini is offline   Reply With Quote
Old 05-13-2011, 08:45 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gambarini View Post
Hy Starson17

this is the entire recipe.
my previous post is incorrect.
I obtain the entire page, like you, and it's equal than the page obtained with "VIEW SOURCE".
My problem is:
i am not able to find anything in the page.
I found something on the page.
Starson17 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
A question for you all. J.M. Pierce Writers' Corner 11 09-26-2010 08:49 PM
Classic Few Nook Question and Question on Nook 3G vs WiFi blackonblack Barnes & Noble NOOK 4 07-02-2010 02:07 AM
Looking for another reader question and PRS-600 question lilpretender Which one should I buy? 9 10-24-2009 04:02 AM
Question yankgirl Kindle Formats 1 07-01-2009 05:09 PM


All times are GMT -4. The time now is 08:57 PM.


MobileRead.com is a privately owned, operated and funded community.