Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-27-2011, 10:54 AM   #1
tolyluis
Enthusiast
tolyluis doesn't littertolyluis doesn't litter
 
Posts: 49
Karma: 196
Join Date: Jan 2011
Device: Kindle 3
David Bravo's blog v1.0 - Spanish

Hi everybody:

One of the most-notorious writeres in spanish blogosfera is David Bravo, young lawyer who talks about laws, copyright and p2p tech. Always interesting.

David Bravo's Blog - laws, copyright and p2p

SOURCE CODE:

Code:
__license__   = 'GPL v3'
__author__    = 'Luis Hernandez'
__copyright__ = 'Luis Hernandez<tolyluis@gmail.com>'
    description   = 'blog sobre leyes, p2p y copyright v1.0'

'''
http://www.filmica.com/david_bravo/
'''

class AdvancedUserRecipe1294946868(BasicNewsRecipe):

    title             = u'Blog de David Bravo'
    publisher      = u'Filmica'

    __author__  = 'Luis Hernández'
    description   = 'blog sobre leyes, p2p y copyright'
    cover_url     = 'http://www.elpais.es/edigitales/image.php?foto=par/portada/1551.jpg'

    oldest_article = 365
    max_articles_per_feed = 100

    remove_javascript = True
    no_stylesheets        = True
    use_embedded_content  = False

    encoding              = 'ISO-8859-1'
    language              = 'es'
    timefmt        = '[%a, %d %b, %Y]'

    keep_only_tags     = [
                                    dict(name='div', attrs={'class':['blog','date','blogbody','comments-head','comments-body']})
                                   ,dict(name='span', attrs={'class':['comments-post']})
                                ]

    remove_tags_before = dict(name='div' , attrs={'id':['bitacoras']})
    remove_tags_after  = dict(name='div' , attrs={'id':['comments-body']})

    extra_css             = ' p{text-align: justify; font-size: 100%} body{ text-align: left; font-family: serif; font-size: 100% } h2{ font-family: sans-serif; font-size:75%; font-weight: 800; text-align: justify } h3{ font-family: sans-serif; font-size:150%; font-weight: 600; text-align: left } img{margin-bottom: 0.4em} '
  


    feeds          = [(u'Blog', u'http://www.filmica.com/david_bravo/index.rdf')]
Enjoy it!
tolyluis is offline   Reply With Quote
Old 01-28-2011, 01:42 AM   #2
miwie
Connoisseur
miwie began at the beginning.
 
Posts: 76
Karma: 12
Join Date: Nov 2010
Device: Android, PB Pro 602
Quote:
Originally Posted by tolyluis View Post
Unfortunately I get encoding errors with this recipe.

Code:
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 380: invalid continuation byte
I tried ISO-8859-1, UTF-8, and cp1252 without any success.
miwie is offline   Reply With Quote
Advert
Old 01-28-2011, 09:35 AM   #3
tolyluis
Enthusiast
tolyluis doesn't littertolyluis doesn't litter
 
Posts: 49
Karma: 196
Join Date: Jan 2011
Device: Kindle 3
Quote:
Originally Posted by miwie View Post
Unfortunately I get encoding errors with this recipe.

Code:
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 380: invalid continuation byte
I tried ISO-8859-1, UTF-8, and cp1252 without any success.
Unfortunately i get encoding errors with various recipes self-made, I must admit that I don't test my recipes properly, just download with Calibre and it works My english level is very basic and is difficult for me get some concepts in the tutorials.

How can I fix this?

Anyway, Calibre is downloading this recipes with no character errors, for instance I can read with no problems this recipe, maybe the best thing is leave this recipes in the forum for personal use only (but my idea was create more recipes for spanish Calibre users)
tolyluis is offline   Reply With Quote
Old 01-28-2011, 09:39 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by miwie View Post
Unfortunately I get encoding errors with this recipe.

Code:
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 380: invalid continuation byte
I tried ISO-8859-1, UTF-8, and cp1252 without any success.
The error is in his name. I suspect you changed the line
encoding = 'ISO-8859-1'
That affects the output and what it sees as the input feed, but the error is that the recipe itself has, "Hernández," instead of "Hernandez." There were some other indent errors for me.
Try this one:
Spoiler:
Code:
__license__   = 'GPL v3'
__author__    = 'Luis Hernandez'
__copyright__ = 'Luis Hernandez<tolyluis@gmail.com>'
description   = 'blog sobre leyes, p2p y copyright v1.0'

'''
http://www.filmica.com/david_bravo/
'''

class AdvancedUserRecipe1294946868(BasicNewsRecipe):
    title             = u'Blog de David Bravo'
    publisher      = u'Filmica'
    __author__  = 'Luis Hernandez'
    description   = 'blog sobre leyes, p2p y copyright'
    cover_url     = 'http://www.elpais.es/edigitales/image.php?foto=par/portada/1551.jpg'
    oldest_article = 365
    max_articles_per_feed = 100
    remove_javascript = True
    no_stylesheets        = True
    use_embedded_content  = False
    encoding              = 'ISO-8859-1'
    language              = 'es'
    timefmt        = '[%a, %d %b, %Y]'

    keep_only_tags     = [
                                    dict(name='div', attrs={'class':['blog','date','blogbody','comments-head','comments-body']})
                                   ,dict(name='span', attrs={'class':['comments-post']})
                                ]

    remove_tags_before = dict(name='div' , attrs={'id':['bitacoras']})
    remove_tags_after  = dict(name='div' , attrs={'id':['comments-body']})

    extra_css             = ' p{text-align: justify; font-size: 100%} body{ text-align: left; font-family: serif; font-size: 100% } h2{ font-family: sans-serif; font-size:75%; font-weight: 800; text-align: justify } h3{ font-family: sans-serif; font-size:150%; font-weight: 600; text-align: left } img{margin-bottom: 0.4em} '
  


    feeds          = [(u'Blog', u'http://www.filmica.com/david_bravo/index.rdf')]
Starson17 is offline   Reply With Quote
Old 01-28-2011, 09:48 AM   #5
miwie
Connoisseur
miwie began at the beginning.
 
Posts: 76
Karma: 12
Join Date: Nov 2010
Device: Android, PB Pro 602
Quote:
Originally Posted by Starson17 View Post
Try this one:
Thanks!
This works.
miwie is offline   Reply With Quote
Advert
Old 01-28-2011, 10:03 AM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by miwie View Post
Thanks!
This works.
I'm not that knowledgeable about character encoding issues, so I'd like to know if it worked for him, or if he added his name later in the process and didn't test it. The OS may treat his recipe differently if it's set up in a different language, so it doesn't generate that error, or .... maybe he added the error line after the rest was tested. I'm glad the change fixed it for you. (I searched his recipe for E1 in hex to find the problem - amazing what you can solve by reading the error msg).
Starson17 is offline   Reply With Quote
Old 01-28-2011, 10:11 AM   #7
tolyluis
Enthusiast
tolyluis doesn't littertolyluis doesn't litter
 
Posts: 49
Karma: 196
Join Date: Jan 2011
Device: Kindle 3
Fantástico! I catch the STUPID error, thank, thank you for your advices! I've a recipe with no problems (go through the test), look it and try it, please!:

Spoiler:
Code:
__license__   = 'GPL v3'
__author__    = 'Luis Hernandez'
__copyright__ = 'Luis Hernandez<tolyluis@gmail.com>'

'''
http://www.filmica.com/david_bravo/
'''

class AdvancedUserRecipe1294946868(BasicNewsRecipe):

    title             = u'Blog de David Bravo'
    publisher      = u'Filmica'

    __author__  = 'Luis Hernandez'
    description   = 'blog'
    cover_url     = 'http://www.elpais.es/edigitales/image.php?foto=par/portada/1551.jpg'

    oldest_article = 365
    max_articles_per_feed = 100

    remove_javascript = True
    no_stylesheets        = True
    use_embedded_content  = False

    encoding              = 'ISO-8859-1'
    language              = 'es'
    timefmt        = '[%a, %d %b, %Y]'

    keep_only_tags     = [
                                    dict(name='div', attrs={'class':['blog','date','blogbody','comments-head','comments-body']})
                                   ,dict(name='span', attrs={'class':['comments-post']})
                                ]

    remove_tags_before = dict(name='div' , attrs={'id':['bitacoras']})
    remove_tags_after  = dict(name='div' , attrs={'id':['comments-body']})

    extra_css             = ' p{text-align: justify; font-size: 100%} body{ text-align: left; font-family: serif; font-size: 100% } h2{ font-family: sans-serif; font-size:75%; font-weight: 800; text-align: justify } h3{ font-family: sans-serif; font-size:150%; font-weight: 600; text-align: left } img{margin-bottom: 0.4em} '
  


    feeds          = [(u'Blog', u'http://www.filmica.com/david_bravo/index.rdf')]


The error is not the "Hernández" (maybe give errors, but I correct it and the errors didn't go, <joking>anyway, are you saying that my surname is baaad, boy? </joking>

The error was one description line, the initial fourth line:

Spoiler:
Code:
description   = 'blog...


I will revise all my recipes, (published in this forum and not yet -hehe-) and I'll make the test BEFORE publish them in the forum.

Starson17,
tolyluis is offline   Reply With Quote
Old 01-28-2011, 10:29 AM   #8
tolyluis
Enthusiast
tolyluis doesn't littertolyluis doesn't litter
 
Posts: 49
Karma: 196
Join Date: Jan 2011
Device: Kindle 3
David Bravo's Blog (v1.0 code tested)

Ehem, ehem.....

Hi all, I've a new revision of this recipe, this time i tested with:

Code:
ebook-convert DavidBravo(es).recipe output_dir --test -vv
No major changes made, only revised and tested

It convert with no errors with this

SOURCE CODE

Code:
__license__   = 'GPL v3'
__author__    = 'Luis Hernandez'
__copyright__ = 'Luis Hernandez<tolyluis@gmail.com>'

'''
http://www.filmica.com/david_bravo/
'''

class AdvancedUserRecipe1294946868(BasicNewsRecipe):

    title             = u'Blog de David Bravo'
    publisher      = u'Filmica'

    __author__  = 'Luis Hernandez'
    description   = 'spanish blog'
    cover_url     = 'http://www.elpais.es/edigitales/image.php?foto=par/portada/1551.jpg'

    oldest_article = 365
    max_articles_per_feed = 100

    remove_javascript = True
    no_stylesheets        = True
    use_embedded_content  = False

    encoding              = 'ISO-8859-1'
    language              = 'es'
    timefmt        = '[%a, %d %b, %Y]'

    keep_only_tags     = [
                                    dict(name='div', attrs={'class':['blog','date','blogbody','comments-head','comments-body']})
                                   ,dict(name='span', attrs={'class':['comments-post']})
                                ]

    remove_tags_before = dict(name='div' , attrs={'id':['bitacoras']})
    remove_tags_after  = dict(name='div' , attrs={'id':['comments-body']})

    extra_css             = ' p{text-align: justify; font-size: 100%} body{ text-align: left; font-family: serif; font-size: 100% } h2{ font-family: sans-serif; font-size:75%; font-weight: 800; text-align: justify } h3{ font-family: sans-serif; font-size:150%; font-weight: 600; text-align: left } img{margin-bottom: 0.4em} '
  


    feeds          = [(u'Blog', u'http://www.filmica.com/david_bravo/index.rdf')]
Enjoy this recipe with this fantastic blog (and the responses) a must-read for spanish readers. Can it go into the next release of Calibre?
tolyluis is offline   Reply With Quote
Old 01-28-2011, 10:29 AM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If you have non ascii characters in your recipe you need to add a coding declaration as the first line

# -*- coding: utf-8

And of curse make sure the recipe file is actually saved in utf-8
kovidgoyal is offline   Reply With Quote
Old 01-28-2011, 10:51 AM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
If you have non ascii characters in your recipe you need to add a coding declaration as the first line

# -*- coding: utf-8
Thanks. I knew there must be some way to do that.
Starson17 is offline   Reply With Quote
Old 01-28-2011, 11:00 AM   #11
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tolyluis View Post
The error is not the "Hernández" (maybe give errors, but I correct it and the errors didn't go, <joking>anyway, are you saying that my surname is baaad, boy? </joking>
Your name has "á" in it, and that non-ascii character was encoded as E1 in your recipe. The error message was complaining about E1, which in UTF8 encoding would have been followed by another byte. That error stopped the recipe. When I fixed that one, I got some indent errors, which also stopped it, but honestly, I don't know if I put them in, or if they were already there.

Quote:
<joking>anyway, are you saying that my surname is baaad, boy? </joking>
<smiling>Your surname is in the non-ascii hinterlands. You'll have to complain to ascii central, or move to UTF-8 land </smiling> I wish my Spanish was as good as your English.
Starson17 is offline   Reply With Quote
Old 01-28-2011, 11:49 AM   #12
tolyluis
Enthusiast
tolyluis doesn't littertolyluis doesn't litter
 
Posts: 49
Karma: 196
Join Date: Jan 2011
Device: Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
If you have non ascii characters in your recipe you need to add a coding declaration as the first line

# -*- coding: utf-8

And of curse make sure the recipe file is actually saved in utf-8
I'll try to don't take non-ASCII characters for my recipes, I think is better for all us, in this recipe discarding non-ASCII characters is not a trauma (at least not for me, good-bye "á" character ). My recipes are set to ASCII format (using npp++)
tolyluis is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spanish? OBoyle General Discussions 11 02-10-2011 08:32 PM
PRS-600 Menu in Spanish miguelfer Sony Reader 10 05-22-2010 01:56 AM


All times are GMT -4. The time now is 06:46 PM.


MobileRead.com is a privately owned, operated and funded community.