Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-16-2008, 11:03 AM   #1
mccande
Member
mccande began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
Problem with preprocess_regexps and Unicode

I am preparing a recipe for a Belgian newspaper where I have to replace a styled apostrophe with a simple one (Unicode characters 0x92 and 0x27)

The formula I use is

preprocess_regexps = [
(re.compile(ru'\0092'), lambda match: ru'\u0027')
]

but I cannot get the epub2disk to start. I always receive the standard error message
C:\Documents and Settings\Denis\test>feeds2disk --debug --test libe.py
Traceback (most recent call last):
File "main.py", line 167, in <module>
File "main.py", line 162, in main
File "main.py", line 133, in run_recipe
File "calibre\web\feeds\recipes\__init__.pyo", line 80, in compile_recipe
File "c:\docume~1\denis\locals~1\temp\calibre_0.4.115_s _e8f1_recipes\recipe1.p
y", line 4, in <module>
libe.py
NameError: name 'libe' is not defined

What is wrong with the use of regexp?
mccande is offline   Reply With Quote
Old 12-16-2008, 11:39 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
post the full recipe
kovidgoyal is offline   Reply With Quote
Advert
Old 12-17-2008, 10:53 AM   #3
mccande
Member
mccande began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
Recipe

Here is the recipe which works without the regex part.
Attached Files
File Type: rar libe.rar (610 Bytes, 241 views)
mccande is offline   Reply With Quote
Old 12-17-2008, 11:58 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,600
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The first thing I see wrong is

(re.compile(ru'\0092'), lambda match: ru'\u0027')
should be

(re.compile(ru'\u0092'), lambda match: ru'\u0027')

Note the missing u
kovidgoyal is offline   Reply With Quote
Old 12-18-2008, 03:24 AM   #5
mccande
Member
mccande began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
Regex

Thanks but it still does not work
mccande is offline   Reply With Quote
Advert
Old 12-18-2008, 04:28 AM   #6
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by mccande View Post
Thanks but it still does not work
You are missing this at the start of your script:

Code:
import string, re

class AdvancedUserRecipe1229426345(BasicNewsRecipe):
....
kiklop74 is offline   Reply With Quote
Old 12-18-2008, 04:53 PM   #7
mccande
Member
mccande began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
This still does not start

import string, re
class AdvancedUserRecipe1229426345(BasicNewsRecipe):
title = u'La Libre Belgique'
__author__ = 'Denis McCann'
oldest_article = 1
max_articles_per_feed = 100
use_embedded_content = False
no_stylesheets = True
simultaneous_downloads = 1

remove_tags_after = [dict(id='articleText')]

preprocess_regexps = [
(re.compile(ru'\u0092'), lambda match: ru'\u0027')
]


keep_only_tags = [
dict(name='p', attrs={'id':'avantTitre'}),
dict(name='p', attrs={'id':'writer'}),
dict(name='p', attrs={'id':'publicationDate'}),
dict(name='div', attrs={'id':'articleHat'}),
dict(name='div', attrs={'id':'c'}),
dict(name='div', attrs={'id':'articleText'})
]

feeds = [
(u'A la Une', u'http://www.lalibre.be/rss/?section=10'),
(u'Belgique', u'http://www.lalibre.be/rss/?section=10&subsection=90'),
(u'Europe', u'http://www.lalibre.be/rss/?section=10&subsection=91'),
(u'Bruxelles', u'http://www.lalibre.be/rss/?section=10&subsection=1083'),
(u'Brabant', u'http://www.lalibre.be/rss/?section=10&subsection=1106'),
(u'Economie', u'http://www.lalibre.be/rss/?section=3'),
(u'Opinion', u'http://www.lalibre.be/rss/?section=11&subsection=118')
]
mccande is offline   Reply With Quote
Old 12-18-2008, 05:18 PM   #8
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Change the regular expression to look like this and it will work:

Code:
    
preprocess_regexps = [(re.compile(u'\u0092'), lambda match: u'\u0027')]
Note the absence of r. String can be unicode or raw but not both.
kiklop74 is offline   Reply With Quote
Old 12-19-2008, 09:26 AM   #9
mccande
Member
mccande began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
Thanks a lot. That works and will be useful for other feeds.

The syntax of this function is far from obvious.
mccande is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unicode support in K3 tomsem Amazon Kindle 22 09-02-2010 04:14 PM
Hacks 2.52 with unicode-fonts-hack? yuenslhk Amazon Kindle 4 06-17-2010 07:00 PM
PRS-500 Unicode Enabled RTF Honza Sony Reader Dev Corner 33 03-31-2010 09:45 AM
Python Unicode Demystified ahi Workshop 2 09-18-2009 12:45 PM
Unicode errors in isbndb JvdW Calibre 3 08-01-2008 05:07 AM


All times are GMT -4. The time now is 06:11 PM.


MobileRead.com is a privately owned, operated and funded community.