Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-11-2009, 06:00 PM   #1
phkoech
Member
phkoech began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
Recipe not working

Hello everybody,
I'm trying to grab french cooking from this type of page : http://www.marmiton.org/Recettes/Rec...ses_45471.aspx, but the "comment" section of the page (from "les commentaires des internautes" to "Bel effet dans l'assiette et excellent.") doesn't appear in the book produced by Calibre (via the News system).
Do you have any idea about how to succeed grabing this section ?

My recipe :
Code:
class Recettes(BasicNewsRecipe):
    title          = 'RecettesPrint'
    __author__ = 'Kek <kek.fr>'
    description = 'Recettes'
    oldest_article = 3
    language = _('French')
    max_articles_per_feed = 50
    no_stylesheets = True

    html2lrf_options = ['--base-font-size', '10']

    feeds =  [
             ('Recette Top', 'url from the uml feed'),
             ]
    
    def print_version(self, url):
        if 'marmiton.org/Recettes/' in url:
            url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url)
            return url
phkoech is offline   Reply With Quote
Old 08-12-2009, 03:24 AM   #2
phkoech
Member
phkoech began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
I've tried to modify my code (see below), but I still have the problem with comments not output by Calibre.
That's very strange because HTML is quite simple. The only strange things I see is :
- fontsize = 1 (rest of the page have fontsize = 2) => I supose Calibre is able to manage it
- there is a bug in HTML source code because there is a </b> tag without the <b> before => can I correct it with proprocess_html ?


Code:
class RecettesPrint(BasicNewsRecipe):
    title          = 'RecettesPrint'
    __author__ = 'Kek <kek.fr>'
    description = 'Recettes'
    oldest_article = 3
    language = _('French')
    max_articles_per_feed = 5000
    no_stylesheets = True
    use_embedded_content = False
    remove_javascript = True
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }'
    html2lrf_options = ['--ignore-tables']    
    html2epub_options = 'linearize_tables = True'

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        for item in soup.findAll(align=True):
            del item['align']
        for item in soup.findAll(valign=True):
            del item['valign']
        for item in soup.findAll(face=True):
            del item['face']
        return soup
    
    def print_version(self, url):
        if 'marmiton.org/Recettes/' in url:
            url = re.sub('Recettes/Recette', 'Recettes/Recette-Impression', url)
            return url
phkoech is offline   Reply With Quote
 
Enthusiast
Old 08-12-2009, 08:50 AM   #3
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
I can just point out that html2lrf_options and html2epub_options are no longer valid. You should use new flag conversion_option like this:

Code:
conversion_options = {  'tags'         :'aa,bb'
                          , 'publisher'        : 'pub'
                          , 'comments'      :  'desc'
                          , 'language'       : 'en'
                          , 'linearize_tables' : True
                          }
kiklop74 is offline   Reply With Quote
Old 08-13-2009, 05:41 PM   #4
phkoech
Member
phkoech began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jul 2009
Device: Sony PRS-505
Thanks reading my code.
I've tried this correction, but no change. Strange.
I've found another solution but grabing the normal pages instead of the printable ones. In this case, I do not have the problem anymore. My problem is probably due to an unclean HTML code.
phkoech is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Guardian Recipe has stopped working jbambridge Calibre 2 04-11-2010 01:14 PM
The Economist (free) recipe not working paladin10000 Calibre 1 01-28-2010 12:44 PM
Google Reader recipe not working :( techie_007 Calibre 1 01-26-2010 09:58 PM
New Yorker recipe not working ... cartesio Calibre 11 08-20-2009 01:24 AM
The Moscow Times recipe isn't working. girlperson1 Calibre 4 12-01-2008 06:42 AM


All times are GMT -4. The time now is 07:57 AM.


MobileRead.com is a privately owned, operated and funded community.