02-12-2011, 02:22 PM | #1 |
Member
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
preprocess_regexps and ePub-based Recipes?
Hi all,
For recipes based on downloading ePubs and then converting them such as, for example, "Now Toronto" the does not seem to get used. These recipes essentially only implement build_index(self). Any idea what needs to get added to this build_index(self) function such that the rules defined in preprocess_regexps get processed? Alternatively, I would also be happy with another way to add regular expression processing to the named type of recipes that goes beyond the three sr1_search and sr1_replace tags (1, 2, 3) in conversion_options Thanks, Tobias |
02-12-2011, 02:54 PM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You are free to run over the HTML in the downloaded epub and run whatever regexes you like in build_index
|
Advert | |
|
02-13-2011, 06:25 AM | #3 |
Member
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
I am not too familiar with Python programming. Is there a simple call that I can add in build_index such that anything that is defined in preprocess_regexps gets processed? Right now the code (in the "Now Toronto" example) is as follows:
Code:
preprocess_regexps = [ (re.compile(r'foo'), lambda match: 'bar'), ] def build_index(self): epub_feed = "http://feeds.feedburner.com/NowEpubEditions" soup = self.index_to_soup(epub_feed) url = soup.find(name = 'feedburner:origlink').string f = urllib2.urlopen(url) tmp = PersistentTemporaryFile(suffix='.epub') self.report_progress(0,_('downloading epub')) tmp.write(f.read()) tmp.close() zfile = zipfile.ZipFile(tmp.name, 'r') self.report_progress(0,_('extracting epub')) zfile.extractall(self.output_dir) tmp.close() index = os.path.join(self.output_dir, 'content.opf') self.report_progress(1,_('epub downloaded and extracted')) return index Tobias |
02-13-2011, 09:12 AM | #4 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There's no simple call, you have to write the code to iterate over all html files, read them run the regexps on them and write them back.
|
02-13-2011, 01:31 PM | #5 |
Member
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
I now looked into the source for a while to get some idea of how to do this, but to no avail. There is too much I would need to do to be able to properly debug things and figure out how this works. Would you (or someone else for this matter) maybe be able to provide the lines that I would need to "iterate over all html files, read them run the regexps on them and write them back"? I would much appreciate this. I would think such an example may be generally helpful for the recipes that are based on ePub downloads.
Thanks in advance, to whoever finds the time to provide the lines. Cheers, Tobias |
Advert | |
|
02-13-2011, 01:54 PM | #6 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
from calibre import walk for path in walk('.'): if os.path.splitext(path)[1:].lower() in ('html', 'htm'): with open(path, 'r+b') as f: raw = f.read() raw = raw.decode('utf-8') for pat, func in self.preprocess_regexps: raw = pat.sub(func, raw) f.seek(0) f.truncate() f.write(raw.encode('utf-8')) |
02-13-2011, 04:59 PM | #7 |
Member
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
|
Awesome, thanks so much, I will give it a try.
Cheers, Tobias |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to improve navigation in EPUB from recipes? | siebert | Calibre | 17 | 12-11-2010 11:14 AM |
recipes and --no-default-epub-cover option | m.tarenskeen | Recipes | 1 | 11-02-2010 12:06 PM |
Free web-based epub creator: eBookFuel | CraigAtk | ePub | 0 | 10-28-2010 01:17 PM |
Where my recipes are kept? | bthoven | Calibre | 6 | 02-26-2010 12:20 AM |
Problem with preprocess_regexps and Unicode | mccande | Calibre | 8 | 12-19-2008 09:26 AM |