View Single Post
Old 09-16-2012, 10:43 AM   #1
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Most efficient way to process file contents of exploded ePub

I have a plugin which has several functions to support my post-conversion workflow. One function parses contents.opf for xhtml files then applies a set of regexes to their contents. Each xhtml file is read in turn to a list, by using readlines, and then processed line-by-line against all the regexes by using a simple for line in item: statement.

I am now wondering if it would be more efficient coding, and run faster, if I read each xhtml file as a string and then applied the regexes to the whole contents at once, (using DOTALL to span lines)?

Is there an accepted 'best practise' for doing this sort of file processing in Python or is it just down to programmer preference?
Agama is offline   Reply With Quote