12-16-2008, 11:03 AM | #1 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
|
Problem with preprocess_regexps and Unicode
I am preparing a recipe for a Belgian newspaper where I have to replace a styled apostrophe with a simple one (Unicode characters 0x92 and 0x27)
The formula I use is preprocess_regexps = [ (re.compile(ru'\0092'), lambda match: ru'\u0027') ] but I cannot get the epub2disk to start. I always receive the standard error message C:\Documents and Settings\Denis\test>feeds2disk --debug --test libe.py Traceback (most recent call last): File "main.py", line 167, in <module> File "main.py", line 162, in main File "main.py", line 133, in run_recipe File "calibre\web\feeds\recipes\__init__.pyo", line 80, in compile_recipe File "c:\docume~1\denis\locals~1\temp\calibre_0.4.115_s _e8f1_recipes\recipe1.p y", line 4, in <module> libe.py NameError: name 'libe' is not defined What is wrong with the use of regexp? |
12-16-2008, 11:39 AM | #2 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
post the full recipe
|
Advert | |
|
12-17-2008, 10:53 AM | #3 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
|
Recipe
Here is the recipe which works without the regex part.
|
12-17-2008, 11:58 AM | #4 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The first thing I see wrong is
(re.compile(ru'\0092'), lambda match: ru'\u0027') should be (re.compile(ru'\u0092'), lambda match: ru'\u0027') Note the missing u |
12-18-2008, 03:24 AM | #5 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
|
Regex
Thanks but it still does not work
|
Advert | |
|
12-18-2008, 04:28 AM | #6 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
|
12-18-2008, 04:53 PM | #7 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
|
This still does not start
import string, re class AdvancedUserRecipe1229426345(BasicNewsRecipe): title = u'La Libre Belgique' __author__ = 'Denis McCann' oldest_article = 1 max_articles_per_feed = 100 use_embedded_content = False no_stylesheets = True simultaneous_downloads = 1 remove_tags_after = [dict(id='articleText')] preprocess_regexps = [ (re.compile(ru'\u0092'), lambda match: ru'\u0027') ] keep_only_tags = [ dict(name='p', attrs={'id':'avantTitre'}), dict(name='p', attrs={'id':'writer'}), dict(name='p', attrs={'id':'publicationDate'}), dict(name='div', attrs={'id':'articleHat'}), dict(name='div', attrs={'id':'c'}), dict(name='div', attrs={'id':'articleText'}) ] feeds = [ (u'A la Une', u'http://www.lalibre.be/rss/?section=10'), (u'Belgique', u'http://www.lalibre.be/rss/?section=10&subsection=90'), (u'Europe', u'http://www.lalibre.be/rss/?section=10&subsection=91'), (u'Bruxelles', u'http://www.lalibre.be/rss/?section=10&subsection=1083'), (u'Brabant', u'http://www.lalibre.be/rss/?section=10&subsection=1106'), (u'Economie', u'http://www.lalibre.be/rss/?section=3'), (u'Opinion', u'http://www.lalibre.be/rss/?section=11&subsection=118') ] |
12-18-2008, 05:18 PM | #8 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Change the regular expression to look like this and it will work:
Code:
preprocess_regexps = [(re.compile(u'\u0092'), lambda match: u'\u0027')] |
12-19-2008, 09:26 AM | #9 |
Member
Posts: 13
Karma: 10
Join Date: Oct 2008
Device: PRS-505
|
Thanks a lot. That works and will be useful for other feeds.
The syntax of this function is far from obvious. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unicode support in K3 | tomsem | Amazon Kindle | 22 | 09-02-2010 04:14 PM |
Hacks 2.52 with unicode-fonts-hack? | yuenslhk | Amazon Kindle | 4 | 06-17-2010 07:00 PM |
PRS-500 Unicode Enabled RTF | Honza | Sony Reader Dev Corner | 33 | 03-31-2010 09:45 AM |
Python Unicode Demystified | ahi | Workshop | 2 | 09-18-2009 12:45 PM |
Unicode errors in isbndb | JvdW | Calibre | 3 | 08-01-2008 05:07 AM |