Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-12-2011, 02:22 PM   #1
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
preprocess_regexps and ePub-based Recipes?

Hi all,

For recipes based on downloading ePubs and then converting them such as, for example, "Now Toronto" the does not seem to get used. These recipes essentially only implement build_index(self). Any idea what needs to get added to this build_index(self) function such that the rules defined in preprocess_regexps get processed? Alternatively, I would also be happy with another way to add regular expression processing to the named type of recipes that goes beyond the three sr1_search and sr1_replace tags (1, 2, 3) in conversion_options

Thanks,

Tobias
tobias2 is offline   Reply With Quote
Old 02-12-2011, 02:54 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You are free to run over the HTML in the downloaded epub and run whatever regexes you like in build_index
kovidgoyal is offline   Reply With Quote
Advert
Old 02-13-2011, 06:25 AM   #3
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
I am not too familiar with Python programming. Is there a simple call that I can add in build_index such that anything that is defined in preprocess_regexps gets processed? Right now the code (in the "Now Toronto" example) is as follows:

Code:
    preprocess_regexps    = [
        (re.compile(r'foo'), lambda match: 'bar'),
    ]

    def build_index(self):
        epub_feed = "http://feeds.feedburner.com/NowEpubEditions"
        soup = self.index_to_soup(epub_feed)
        url = soup.find(name = 'feedburner:origlink').string
        f = urllib2.urlopen(url)
        tmp = PersistentTemporaryFile(suffix='.epub')
        self.report_progress(0,_('downloading epub'))
        tmp.write(f.read())
        tmp.close()
        zfile = zipfile.ZipFile(tmp.name, 'r')
        self.report_progress(0,_('extracting epub'))
        zfile.extractall(self.output_dir)
        tmp.close()
        index = os.path.join(self.output_dir, 'content.opf')
        self.report_progress(1,_('epub downloaded and extracted'))

        return index
Thanks,

Tobias
tobias2 is offline   Reply With Quote
Old 02-13-2011, 09:12 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There's no simple call, you have to write the code to iterate over all html files, read them run the regexps on them and write them back.
kovidgoyal is offline   Reply With Quote
Old 02-13-2011, 01:31 PM   #5
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
I now looked into the source for a while to get some idea of how to do this, but to no avail. There is too much I would need to do to be able to properly debug things and figure out how this works. Would you (or someone else for this matter) maybe be able to provide the lines that I would need to "iterate over all html files, read them run the regexps on them and write them back"? I would much appreciate this. I would think such an example may be generally helpful for the recipes that are based on ePub downloads.

Thanks in advance, to whoever finds the time to provide the lines.

Cheers,

Tobias
tobias2 is offline   Reply With Quote
Advert
Old 02-13-2011, 01:54 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
from calibre import walk

for path in walk('.'):
   if os.path.splitext(path)[1:].lower() in ('html', 'htm'):
       with open(path, 'r+b') as f:
           raw = f.read()
           raw = raw.decode('utf-8')
           for pat, func in self.preprocess_regexps:
                 raw = pat.sub(func, raw)
           f.seek(0)
           f.truncate()
           f.write(raw.encode('utf-8'))
This will need some adjustments, of course.
kovidgoyal is offline   Reply With Quote
Old 02-13-2011, 04:59 PM   #7
tobias2
Member
tobias2 began at the beginning.
 
Posts: 18
Karma: 36
Join Date: Feb 2011
Device: Kindle
Awesome, thanks so much, I will give it a try.

Cheers,

Tobias
tobias2 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to improve navigation in EPUB from recipes? siebert Calibre 17 12-11-2010 11:14 AM
recipes and --no-default-epub-cover option m.tarenskeen Recipes 1 11-02-2010 12:06 PM
Free web-based epub creator: eBookFuel CraigAtk ePub 0 10-28-2010 01:17 PM
Where my recipes are kept? bthoven Calibre 6 02-26-2010 12:20 AM
Problem with preprocess_regexps and Unicode mccande Calibre 8 12-19-2008 09:26 AM


All times are GMT -4. The time now is 04:20 PM.


MobileRead.com is a privately owned, operated and funded community.