Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-28-2010, 11:06 PM   #1
jefferson_frantz
Member
jefferson_frantz began at the beginning.
 
jefferson_frantz's Avatar
 
Posts: 14
Karma: 12
Join Date: Jan 2009
Location: Lima, Perú
Device: Kindle 2 and Sony Reader PRS 505
I need some help with a recipe

Hello everyone. I'm new with calibre recipes and i need some help with a recipe for 'Muy Interesante' magazine (http://www.muyinteresante.es).
First, i want to change the style for the title of the articles, maybe just to put in bold style.
Second, i need to insert a <br> tag after the image in the article, so the text appears below the image and not next to it. The attached image maybe explain better what i want

Thanks in advance!.

Here is my recipe:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from BeautifulSoup import BeautifulSoup, Tag

class RevistaMuyInteresante(BasicNewsRecipe):

    title       = 'Revista Muy Interesante'
    __author__  = 'Jefferson Frantz'
    description = 'Revista de divulgacion'
    timefmt = ' [%d %b, %Y]'
    language = 'es_ES'

    keep_only_tags = [dict(name='div', attrs={'class':['article']}),dict(name='td', attrs={'class':['txt_articulo']})]

    remove_tags        = [
                             dict(name=['object','link','script','ul'])
                            ,dict(name='div', attrs={'id':['comment']})
                            ,dict(name='td', attrs={'class':['buttonheading']})
                            ,dict(name='div', attrs={'class':['tags_articles']})
                         ]

    remove_tags_after = dict(name='div', attrs={'class':'tags_articles'})


    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class':'contenido'})

            current_articles = []
            for x in div.findAllNext(attrs={'class':['headline']}):
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.muyinteresante.es'+url
                    self.log('\t\tFound article:', title)
                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url':url,
                        'description':'', 'date':''})

            return current_articles


    def parse_index(self):
            feeds = []
            for title, url in [
                ('Historia',
                 'http://www.muyinteresante.es/historia-articulos'),
             ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
PS: Sorry about my english
Attached Thumbnails
Click image for larger version

Name:	sample.JPG
Views:	450
Size:	126.3 KB
ID:	59035  
jefferson_frantz is offline   Reply With Quote
Old 09-28-2010, 11:32 PM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by jefferson_frantz View Post
Hello everyone. I'm new with calibre recipes and i need some help with a recipe for 'Muy Interesante' magazine (http://www.muyinteresante.es).
First, i want to change the style for the title of the articles, maybe just to put in bold style.
Second, i need to insert a <br> tag after the image in the article, so the text appears below the image and not next to it. The attached image maybe explain better what i want

Thanks in advance!.

Here is my recipe:



PS: Sorry about my english


as for the image thing do something like this
Spoiler:

Code:
def preprocess_html(self, soup):
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup


and as for the bold title or whatever you add extra_css
so lets say your title was in a <div class='title'>..... </div> tag and you wanted it different
you would do this:
Spoiler:

Code:
#first we need to turn off style sheets so we do this:
no_stylesheets = True

#then we add our own style(s) like this:
extra_css = '''
                   
                   .Title{font-weight: bold; font-size: xx-large}
                   p {font-size: 4px;font-family: Times New Roman;}
                '''



###########################################################
#this right here gets rid of all the inline styles that prevent extra_css from working a lot 
#of times....
###########################################################
def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
           del item['style']
        return soup

Last edited by TonytheBookworm; 09-28-2010 at 11:44 PM.
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-29-2010, 01:39 AM   #3
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Alright time to call the pro in. Starson17 if you have a second can you look at this and tell me why the image doesn't show up? My thoughts in this are
  1. get image tag
  2. assign its value to a new variable
  3. extract the image tag from the soup
  4. make a new tag that contains a div and a p
  5. put the image data back into the soup under the p tag that was created.
now sure what the heck i'm doing wrong because it looks like it should work.
when i used print statements it showed my newtag as <p> </p> but for whatever reason it never inserts the image data into that tag.
thanks for the help in advance.
Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from BeautifulSoup import BeautifulSoup, Tag

class RevistaMuyInteresante(BasicNewsRecipe):

    title       = 'Revista Muy Interesante'
    __author__  = 'Jefferson Frantz'
    description = 'Revista de divulgacion'
    timefmt = ' [%d %b, %Y]'
    language = 'es_ES'
    #conversion_options = {'linearize_tables' : True}
    keep_only_tags = [dict(name='div', attrs={'class':['article']}),dict(name='td', attrs={'class':['txt_articulo']})]
    remove_tags        = [
                             dict(name=['object','link','script','ul'])
                            ,dict(name='div', attrs={'id':['comment']})
                            ,dict(name='td', attrs={'class':['buttonheading']})
                            ,dict(name='div', attrs={'class':['tags_articles']})
                         ]

    remove_tags_after = dict(name='div', attrs={'class':'tags_articles'})


    


    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class':'contenido'})

            current_articles = []
            for x in div.findAllNext(attrs={'class':['headline']}):
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.muyinteresante.es'+url
                    self.log('\t\tFound article:', title)
                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url':url,
                        'description':'', 'date':''})

            return current_articles


    def parse_index(self):
            feeds = []
            for title, url in [
                ('Historia',
                 'http://www.muyinteresante.es/historia-articulos'),
             ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
    
    def preprocess_html(self, soup):
        
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            data = img_tag
            img_tag.extract()
            newdiv = Tag(soup,'div')
            newtag = Tag(soup,'p')
            newtag.insert(0,data)
            newdiv.insert(0,newtag)
            parent_tag.insert(0,newdiv)
            
            
            
            
            
        return soup


i keep getting this crap:
newdiv is: <div><p></p></div>
data is: <img style="float: left;" alt="ivision-marrojo" height="225" width="300" src="/images/stories/historia/ivision-marrojo.jpg" />
newtag is: <p></p>

which tells me it is obviously picking up the image tag and has it stored.
but for whatever reason it refuses to insert it into the newdiv

Last edited by TonytheBookworm; 09-29-2010 at 01:45 AM.
TonytheBookworm is offline   Reply With Quote
Old 09-29-2010, 11:28 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Alright time to call the pro in. Starson17 if you have a second can you look at this and tell me why the image doesn't show up?
It's staring you in the face, but you probably haven't run into it before.

Quote:
now sure what the heck i'm doing wrong because it looks like it should work.
when i used print statements it showed my newtag as <p> </p> but for whatever reason it never inserts the image data into that tag.
thanks for the help in advance.
Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from BeautifulSoup import BeautifulSoup, Tag

class RevistaMuyInteresante(BasicNewsRecipe):

    title       = 'Revista Muy Interesante'
    __author__  = 'Jefferson Frantz'
    description = 'Revista de divulgacion'
    timefmt = ' [%d %b, %Y]'
    language = 'es_ES'
    #conversion_options = {'linearize_tables' : True}
    keep_only_tags = [dict(name='div', attrs={'class':['article']}),dict(name='td', attrs={'class':['txt_articulo']})]
    remove_tags        = [
                             dict(name=['object','link','script','ul'])
                            ,dict(name='div', attrs={'id':['comment']})
                            ,dict(name='td', attrs={'class':['buttonheading']})
                            ,dict(name='div', attrs={'class':['tags_articles']})
                         ]

    remove_tags_after = dict(name='div', attrs={'class':'tags_articles'})


    


    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class':'contenido'})

            current_articles = []
            for x in div.findAllNext(attrs={'class':['headline']}):
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.muyinteresante.es'+url
                    self.log('\t\tFound article:', title)
                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url':url,
                        'description':'', 'date':''})

            return current_articles


    def parse_index(self):
            feeds = []
            for title, url in [
                ('Historia',
                 'http://www.muyinteresante.es/historia-articulos'),
             ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
    
    def preprocess_html(self, soup):
        
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            data = img_tag
            img_tag.extract()
            newdiv = Tag(soup,'div')
            newtag = Tag(soup,'p')
            newtag.insert(0,data)
            newdiv.insert(0,newtag)
            parent_tag.insert(0,newdiv)
            
            
            
            
            
        return soup


i keep getting this crap:
newdiv is: <div><p></p></div>
data is: <img style="float: left;" alt="ivision-marrojo" height="225" width="300" src="/images/stories/historia/ivision-marrojo.jpg" />
newtag is: <p></p>

which tells me it is obviously picking up the image tag and has it stored.
but for whatever reason it refuses to insert it into the newdiv
It's here (last two characters of "/>"):
Code:
data is:  <img style="float: left;" alt="ivision-marrojo" height="225" width="300" src="/images/stories/historia/ivision-marrojo.jpg" />
For some reason, this self closing tag format makes Beautiful Soup very unhappy. Try this change to "data" in your recipe (bit of a kludge to work into whatever you're doing):
Code:
            data = img_tag
            new_img_tag = Tag(soup,'img')
            new_img_tag['src'] = img_tag['src']
            data = new_img_tag
and don't do the img_tag.extract(). I doubt if that's the most efficient way to do this, but I'm not really sure what you're doing with the recipe.
Starson17 is offline   Reply With Quote
Old 09-29-2010, 12:34 PM   #5
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
It's staring you in the face, but you probably haven't run into it before.


It's here (last two characters of "/>"):
Code:
data is:  <img style="float: left;" alt="ivision-marrojo" height="225" width="300" src="/images/stories/historia/ivision-marrojo.jpg" />
and don't do the img_tag.extract(). I doubt if that's the most efficient way to do this, but I'm not really sure what you're doing with the recipe.
Thanks that worked for the data part of it. At least it shows correctly in the print statements. However; when i do a print statement on the soup. I see no change what so ever that I can detect. It's like it isn't inserting it into the parent tag... If you get time could you look at this because I would really like to know how to fix it so I can use the knowledge in the future. I was thinking it was something to do with the tables so i linearized them. that didn't work so then i took and renamed the table tr, td to 'div' and that still didn't work So could you spoon feed me just a little bit more (i'm not full yet) thanks.

edit: the soup looks like this even after the changes
Spoiler:

Code:
parent tag is:  <td valign="top" colspan="2" class="txt_articulo">
<img style="float: left;" alt="boton-rojo" src="/images/stories/historia/boton-rojo.jpg" width="300" height="225" />Pocas veces cae el destino del mundo en las manos de un solo hombre. La media noche del <strong>26 de septiembre de 1983</strong> pudo ser la última para millones de personas si no hubiera sido por <strong>Stanislav Petrov</strong>. En una época llena de tensiones provocadas por la<a href="/tag/Guerra Fría "> Guerra Fría </a>y el miedo a un Apocalipsis nuclear, <strong>mantuvo la calma cuando las alarmas de un satélite de la URSS avisaron de un <a href="/tag/ataque nuclear">ataque nuclear</a> inminente</strong>. Se trataba del <strong>hombre que tenía a su alcance el “botón rojo</strong>”. <br /><br /> Orbitando sobre la Tierra, los satélites de alerta temprana rusos estaban preparados para detectar cualquier proyectil que se elevase sobre la línea del horizonte. Aquella noche, Petrov, teniente coronel de la Fuerza de misiles estratégicos del <a href="/tag/Ejercito ruso">Ejercito ruso</a>, se encontraba al mando del bunker Serpukhov-15 en Moscú. A las 00.14 de la noche saltaron todos los indicadores alertando de una fuente de calor que ascendía por el este. Sus características correspondían con las de un <a href="/tag/misil nuclear">misil nuclear</a> intercontinental.  <br /><br /> A pesar de la alarma que resonó en todo el bunker, Petrov se mantuvo escéptico. Podía ser un error, así que ordenó suspender la alarma y esperar. Sin embargo, poco después volvieron a sonar las sirenas cuando los satélites detectaron cuatro fuentes de calor más. Ya había perdido mucho tiempo y como declaró en el diario <em>Moscow News</em>: “No se pueden analizar bien las cosas en sólo un par de minutos, todo lo que se puede hacer es confiar en la intuición. Tenía dos opciones: o pensar que los ataques con misiles no parten de una sola base, o que el ordenador ha perdido la cabeza”. Optó por la segunda opción y esperó unos minutos más. <br /><br /> La tremenda tensión que “atenazaba a todos los presentes” desapareció de golpe cuando las alarmas cesaron. <strong>Lo que en realidad ocurrió es que, en estas fechas próximas al equinoccio de otoño, los satélites, la Tierra y el Sol se alinearon provocando un extraño error en los detectores</strong>. El Sol se había elevado sobre el horizonte en el ángulo exacto para que los <a target="_blank" href="/tag/satélites">satélites</a> interpretaran sus señales térmicas como un ataque de misiles.  <br /><br /> Después de esto, Stanislav Petrov fue relegado a un puesto inferior por desacatar las normas, y el error fue ocultado por el gobierno de la <a href="/tag/URSS">URSS</a>. El reconocimiento de su hazaña, en el que más tarde se llamó <strong>“Incidente del Equinoccio de Otoño”</strong>, no vino hasta mucho tiempo después cuando recibió su primer premio, "World Citizen Award", el 21 de mayo de 2004. En 2006 viajó a EEUU y fue homenajeado por las Naciones Unidas por su valiente actuación. A pesar de todo, cada vez que se entrevistó a Petrov siempre comentaba: “En todo este tiempo no me he considerado un héroe, sólo alguien que hizo su trabajo y lo hizo bien”. <br /><br /><strong><span style="color: #888888;"> Diego López Donaire</span></strong><br /><br /><div class="article_autor">Muy Interesante</div><div class="article_fecha">29/09/2010</div></td>


notice the image tag at the beginning is still unchanged shouldn't it have <div><p><img ....... ></p></div> ?

Last edited by TonytheBookworm; 09-29-2010 at 12:44 PM.
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-29-2010, 01:21 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I'm not sure what you're asking. The images appear in the html produced with your code and my changes - they don't appear in your code without them. The img tag appears in my print of the newdiv tag with my changes, but not with your code. Do you want me to post your code with my changes, as tested?
Starson17 is offline   Reply With Quote
Old 09-29-2010, 01:49 PM   #7
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
I'm not sure what you're asking. The images appear in the html produced with your code and my changes - they don't appear in your code without them. The img tag appears in my print of the newdiv tag with my changes, but not with your code. Do you want me to post your code with my changes, as tested?
if you don't mind cause i would like to see what I'm doing wrong. thanks. as for the issue at hand. the dang image wrapped around the text Like the original poster mentioned in his screenshot. I figured to solve the problem i would simply remove the tables and then enclose the image tag inside a div tag or p tag. didn't work that well

here is the code i am using:
Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from BeautifulSoup import BeautifulSoup, Tag

class RevistaMuyInteresante(BasicNewsRecipe):

    title       = 'Revista Muy Interesante'
    __author__  = 'Jefferson Frantz'
    description = 'Revista de divulgacion'
    timefmt = ' [%d %b, %Y]'
    language = 'es_ES'
    conversion_options = {'linearize_tables' : True}
    keep_only_tags = [dict(name='div', attrs={'class':['article']}),dict(name='td', attrs={'class':['txt_articulo']})]
    remove_tags        = [
                             dict(name=['object','link','script','ul'])
                            ,dict(name='div', attrs={'id':['comment']})
                            ,dict(name='td', attrs={'class':['buttonheading']})
                            ,dict(name='div', attrs={'class':['tags_articles']})
                         ]

    remove_tags_after = dict(name='div', attrs={'class':'tags_articles'})


    


    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class':'contenido'})

            current_articles = []
            for x in div.findAllNext(attrs={'class':['headline']}):
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.muyinteresante.es'+url
                    self.log('\t\tFound article:', title)
                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url':url,
                        'description':'', 'date':''})

            return current_articles


    def parse_index(self):
            feeds = []
            for title, url in [
                ('Historia',
                 'http://www.muyinteresante.es/historia-articulos'),
             ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds
    
    def preprocess_html(self, soup):
        
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            data = img_tag
            new_img_tag = Tag(soup,'img')
            new_img_tag['src'] = img_tag['src']
            data = new_img_tag
           
            
            newdiv = Tag(soup,'div')
            newtag = Tag(soup,'p')
            newtag.insert(0,data)
            newdiv.insert(0,newtag)
            parent_tag.insert(0,newdiv)
            print 'parent tag is: ', parent_tag
            print 'newdiv is: ', newdiv
            print 'data is: ',data
            print 'newtag is: ', newtag
            print 'the soup is: ', soup
            
            
            
        return soup
    
    def postprocess_html(self, soup, first):
        for tag in soup.findAll(name=['table', 'tr', 'td']):
            tag.name = 'div'
        return soup

Last edited by TonytheBookworm; 09-29-2010 at 01:51 PM. Reason: code added
TonytheBookworm is offline   Reply With Quote
Old 10-01-2010, 12:49 AM   #8
jefferson_frantz
Member
jefferson_frantz began at the beginning.
 
jefferson_frantz's Avatar
 
Posts: 14
Karma: 12
Join Date: Jan 2009
Location: Lima, Perú
Device: Kindle 2 and Sony Reader PRS 505
Thanks Tony for your help!!!
I tried your first suggestion, but the text didn't move below the image
But, this gave me an idea jeje
So, after some trial and error i found the solution ... possibly not the finest, but it works for me

Here is my new recipe:

Spoiler:

Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
from BeautifulSoup import BeautifulSoup, Tag

class RevistaMuyInteresante(BasicNewsRecipe):

    title       = 'Revista Muy Interesante'
    __author__  = 'Jefferson Frantz'
    description = 'Revista de divulgacion'
    timefmt = ' [%d %b, %Y]'
    language = 'es_ES'

    no_stylesheets = True


    #then we add our own style(s) like this:
    extra_css = '''
                       .contentheading{font-weight: bold}
                       p {font-size: 4px;font-family: Times New Roman;}
                    '''

    ###########################################################
    #this right here gets rid of all the inline styles that prevent extra_css from working a lot
    #of times....
    ###########################################################
    def preprocess_html(self, soup):
            for item in soup.findAll(style=True):
               del item['style']
            return soup

    def preprocess_html(self, soup):
            for img_tag in soup.findAll('img'):
                parent_tag = img_tag.parent
                if parent_tag.name == 'td':
                    if not parent_tag.get('class') == 'txt_articulo': break
                    imagen = img_tag
                    new_tag = Tag(soup,'p')
                    img_tag.replaceWith(new_tag)
                    div = soup.find(attrs={'class':'article_category'})
                    div.insert(0,imagen)
            return soup

    keep_only_tags = [dict(name='div', attrs={'class':['article']}),dict(name='td', attrs={'class':['txt_articulo']})]

    remove_tags        = [
                             dict(name=['object','link','script','ul'])
                            ,dict(name='div', attrs={'id':['comment']})
                            ,dict(name='td', attrs={'class':['buttonheading']})
                            ,dict(name='div', attrs={'class':['tags_articles']})
                         ]

    remove_tags_after = dict(name='div', attrs={'class':'tags_articles'})


    #TO GET ARTICLES IN SECTION
    def nz_parse_section(self, url):
            soup = self.index_to_soup(url)
            div = soup.find(attrs={'class':'contenido'})
            current_articles = []
            for x in div.findAllNext(attrs={'class':['headline']}):
                    a = x.find('a', href=True)
                    if a is None:
                        continue
                    title = self.tag_to_string(a)
                    url = a.get('href', False)
                    if not url or not title:
                        continue
                    if url.startswith('/'):
                         url = 'http://www.muyinteresante.es'+url
#                    self.log('\t\tFound article:', title)
#                    self.log('\t\t\t', url)
                    current_articles.append({'title': title, 'url':url,
                        'description':'', 'date':''})

            return current_articles


    # To GET SECTIONS
    def parse_index(self):
            feeds = []
            for title, url in [
                ('Historia',
                 'http://www.muyinteresante.es/historia-articulos'),
             ]:
               articles = self.nz_parse_section(url)
               if articles:
                   feeds.append((title, articles))
            return feeds


Thanks again!

PS: The solution for the title with the extra_css works like a charm

Quote:
Originally Posted by TonytheBookworm View Post
as for the image thing do something like this
Spoiler:

Code:
def preprocess_html(self, soup):
        for img_tag in soup.findAll('img'):
            parent_tag = img_tag.parent
            if parent_tag.name == 'a':
                new_tag = Tag(soup,'p')
                new_tag.insert(0,img_tag)
                parent_tag.replaceWith(new_tag)
            elif parent_tag.name == 'p':
                if not self.tag_to_string(parent_tag) == '':
                    new_div = Tag(soup,'div')
                    new_tag = Tag(soup,'p')
                    new_tag.insert(0,img_tag)
                    parent_tag.replaceWith(new_div)
                    new_div.insert(0,new_tag)
                    new_div.insert(1,parent_tag)
        return soup


and as for the bold title or whatever you add extra_css
so lets say your title was in a <div class='title'>..... </div> tag and you wanted it different
you would do this:
Spoiler:

Code:
#first we need to turn off style sheets so we do this:
no_stylesheets = True

#then we add our own style(s) like this:
extra_css = '''
                   
                   .Title{font-weight: bold; font-size: xx-large}
                   p {font-size: 4px;font-family: Times New Roman;}
                '''



###########################################################
#this right here gets rid of all the inline styles that prevent extra_css from working a lot 
#of times....
###########################################################
def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
           del item['style']
        return soup

Last edited by jefferson_frantz; 10-01-2010 at 12:52 AM.
jefferson_frantz is offline   Reply With Quote
Old 10-01-2010, 03:31 PM   #9
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
I'm glad you figured it out. I battled with it for a while can never could get the image to move but i see how you done it. what i don't understand is why creating the div tags like i was doing didn't work but whatever. If one bullet don't work and the other one does then that is all that matters.
TonytheBookworm is offline   Reply With Quote
Old 11-21-2010, 10:20 AM   #10
zeener
Member
zeener began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Nov 2010
Device: nook
Revista Muy Interesante

Here you have my recipe for Muy Interesante magazine:

http://gazambuja.pastebin.com/t33w40SF
zeener is offline   Reply With Quote
Old 11-21-2010, 10:34 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,292
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@zeener: Are there some improvements in your version that should be merged into the builtin recipe?
kovidgoyal is offline   Reply With Quote
Old 11-22-2010, 06:44 AM   #12
zeener
Member
zeener began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Nov 2010
Device: nook
Quote:
Originally Posted by kovidgoyal View Post
@zeener: Are there some improvements in your version that should be merged into the builtin recipe?
I think so. Please, I encourage @jefferson_frantz to test my version.
zeener is offline   Reply With Quote
Old 11-22-2010, 11:56 AM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,292
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Can you tell us what the improvements are, makes it easier to test.
kovidgoyal is offline   Reply With Quote
Old 11-22-2010, 01:40 PM   #14
zeener
Member
zeener began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Nov 2010
Device: nook
Quote:
Originally Posted by kovidgoyal View Post
Can you tell us what the improvements are, makes it easier to test.
  • Use RSS.
  • Get cover from website
  • Better "look & feel"
zeener is offline   Reply With Quote
Old 11-22-2010, 02:06 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 44,292
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well I've added the get_cover o the builtin recipe, for the rest, let's wait for comments from jefferson_frantz.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New recipe kiklop74 Recipes 0 10-05-2010 04:41 PM
New recipe kiklop74 Recipes 0 10-01-2010 02:42 PM
Recipe Help lrain5 Calibre 3 05-09-2010 10:42 PM
Recipe Help hellonewman Calibre 1 01-23-2010 03:45 AM
Recipe Help Please estral Calibre 1 06-11-2009 02:35 PM


All times are GMT -4. The time now is 08:34 PM.


MobileRead.com is a privately owned, operated and funded community.