|  12-16-2008, 11:03 AM | #1 | 
| Member  Posts: 13 Karma: 10 Join Date: Oct 2008 Device: PRS-505 | 
				
				Problem with preprocess_regexps and Unicode
			 
			
			I am preparing a recipe for a Belgian newspaper where I have to replace a styled apostrophe with a simple one (Unicode characters 0x92 and 0x27) The formula I use is preprocess_regexps = [ (re.compile(ru'\0092'), lambda match: ru'\u0027') ] but I cannot get the epub2disk to start. I always receive the standard error message C:\Documents and Settings\Denis\test>feeds2disk --debug --test libe.py Traceback (most recent call last): File "main.py", line 167, in <module> File "main.py", line 162, in main File "main.py", line 133, in run_recipe File "calibre\web\feeds\recipes\__init__.pyo", line 80, in compile_recipe File "c:\docume~1\denis\locals~1\temp\calibre_0.4.115_s _e8f1_recipes\recipe1.p y", line 4, in <module> libe.py NameError: name 'libe' is not defined What is wrong with the use of regexp? | 
|   |   | 
|  12-16-2008, 11:39 AM | #2 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			post the full recipe
		 | 
|   |   | 
| Advert | |
|  | 
|  12-17-2008, 10:53 AM | #3 | 
| Member  Posts: 13 Karma: 10 Join Date: Oct 2008 Device: PRS-505 | 
				
				Recipe
			 
			
			Here is the recipe which works without the regex part.
		 | 
|   |   | 
|  12-17-2008, 11:58 AM | #4 | 
| creator of calibre            Posts: 45,600 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			The first thing I see wrong is  (re.compile(ru'\0092'), lambda match: ru'\u0027') should be (re.compile(ru'\u0092'), lambda match: ru'\u0027') Note the missing u | 
|   |   | 
|  12-18-2008, 03:24 AM | #5 | 
| Member  Posts: 13 Karma: 10 Join Date: Oct 2008 Device: PRS-505 | 
				
				Regex
			 
			
			Thanks but it still does not work
		 | 
|   |   | 
| Advert | |
|  | 
|  12-18-2008, 04:28 AM | #6 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | |
|   |   | 
|  12-18-2008, 04:53 PM | #7 | 
| Member  Posts: 13 Karma: 10 Join Date: Oct 2008 Device: PRS-505 | 
			
			This still does not start import string, re class AdvancedUserRecipe1229426345(BasicNewsRecipe): title = u'La Libre Belgique' __author__ = 'Denis McCann' oldest_article = 1 max_articles_per_feed = 100 use_embedded_content = False no_stylesheets = True simultaneous_downloads = 1 remove_tags_after = [dict(id='articleText')] preprocess_regexps = [ (re.compile(ru'\u0092'), lambda match: ru'\u0027') ] keep_only_tags = [ dict(name='p', attrs={'id':'avantTitre'}), dict(name='p', attrs={'id':'writer'}), dict(name='p', attrs={'id':'publicationDate'}), dict(name='div', attrs={'id':'articleHat'}), dict(name='div', attrs={'id':'c'}), dict(name='div', attrs={'id':'articleText'}) ] feeds = [ (u'A la Une', u'http://www.lalibre.be/rss/?section=10'), (u'Belgique', u'http://www.lalibre.be/rss/?section=10&subsection=90'), (u'Europe', u'http://www.lalibre.be/rss/?section=10&subsection=91'), (u'Bruxelles', u'http://www.lalibre.be/rss/?section=10&subsection=1083'), (u'Brabant', u'http://www.lalibre.be/rss/?section=10&subsection=1106'), (u'Economie', u'http://www.lalibre.be/rss/?section=3'), (u'Opinion', u'http://www.lalibre.be/rss/?section=11&subsection=118') ] | 
|   |   | 
|  12-18-2008, 05:18 PM | #8 | 
| Guru            Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage | 
			
			Change the regular expression to look like this and it will work: Code: preprocess_regexps = [(re.compile(u'\u0092'), lambda match: u'\u0027')] | 
|   |   | 
|  12-19-2008, 09:26 AM | #9 | 
| Member  Posts: 13 Karma: 10 Join Date: Oct 2008 Device: PRS-505 | 
			
			Thanks a lot.  That works and will be useful for other feeds. The syntax of this function is far from obvious. | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Unicode support in K3 | tomsem | Amazon Kindle | 22 | 09-02-2010 04:14 PM | 
| Hacks 2.52 with unicode-fonts-hack? | yuenslhk | Amazon Kindle | 4 | 06-17-2010 07:00 PM | 
| PRS-500 Unicode Enabled RTF | Honza | Sony Reader Dev Corner | 33 | 03-31-2010 09:45 AM | 
| Python Unicode Demystified | ahi | Workshop | 2 | 09-18-2009 12:45 PM | 
| Unicode errors in isbndb | JvdW | Calibre | 3 | 08-01-2008 05:07 AM |