MobileRead Forums - View Single Post - Most efficient way to process file contents of exploded ePub

Agama · 09-16-2012, 10:43 AM

I have a plugin which has several functions to support my post-conversion workflow. One function parses contents.opf for xhtml files then applies a set of regexes to their contents. Each xhtml file is read in turn to a list, by using readlines, and then processed line-by-line against all the regexes by using a simple for line in item: statement.

I am now wondering if it would be more efficient coding, and run faster, if I read each xhtml file as a string and then applied the regexes to the whole contents at once, (using DOTALL to span lines)?

Is there an accepted 'best practise' for doing this sort of file processing in Python or is it just down to programmer preference?

09-16-2012, 10:43 AM	#1
Agama Guru Posts: 776 Karma: 2751519 Join Date: Jul 2010 Location: UK Device: PW2, Nexus7	Most efficient way to process file contents of exploded ePub I have a plugin which has several functions to support my post-conversion workflow. One function parses contents.opf for xhtml files then applies a set of regexes to their contents. Each xhtml file is read in turn to a list, by using readlines, and then processed line-by-line against all the regexes by using a simple for line in item: statement. I am now wondering if it would be more efficient coding, and run faster, if I read each xhtml file as a string and then applied the regexes to the whole contents at once, (using DOTALL to span lines)? Is there an accepted 'best practise' for doing this sort of file processing in Python or is it just down to programmer preference?