MobileRead Forums - View Single Post - profile txt -> mobi convert

forceps · 06-08-2012, 07:59 AM

Text to mobi convert appears rather slow. Here I'd like to present a profile result performed on a 5000 lines txt file in unicode Chinese, the file size is 960902 bytes.

The ebook-convert seems spend a lot time in detect char encode. There must something can be done to speed up the convert.

$ file sample.txt
sample.txt: UTF-8 Unicode text

Code:

Sort by tottime
         ncalls    tottime    percall    cumtime    percall filename:lineno(function)  
             15    10.6280     0.7090    12.2590     0.8170 sbcharsetprober.py:63(feed)
  574218/574217     3.5850     0.0000     4.0200     0.0000 {built-in method sub}
            268     2.7980     0.0100     2.7980     0.0100 {cPalmdoc.compress}
            751     2.5250     0.0030     2.5250     0.0030 stylizer.py:126(__call__)
         960902     1.0610     0.0000     1.1240     0.0000 codingstatemachine.py:40(next_state)
       16097860     0.9990     0.0000     0.9990     0.0000 {ord}
              1     0.6140     0.6140     1.7810     1.7810 utf8prober.py:50(feed)
         145029     0.5670     0.0000     0.8200     0.0000 __init__.py:194(unit_convert)
  245208/145027     0.4080     0.0000     0.5700     0.0000 stylizer.py:564(_get)
  145029/145027     0.3840     0.0000     1.4220     0.0000 stylizer.py:577(_unit_convert)
           5033     0.3510     0.0000     0.3510     0.0000 {built-in method findall}
         5001/1     0.3430     0.0000     3.6550     3.6550 mobiml.py:292(mobimlize_elem)
  347846/327027     0.3390     0.0000     1.0280     0.0000 {hasattr}
              1     0.3180     0.3180     0.3800     0.3800 hebrewprober.py:188(feed)
         361570     0.3180     0.0000     1.3790     0.0000 re.py:229(_compile)
          20565     0.2880     0.0000     0.4750     0.0000 cssstyledeclaration.py:397(getProperty)
          41234     0.2840     0.0000     0.6710     0.0000 serialize.py:1001(do_css_Value)
         642375     0.2640     0.0000     0.2640     0.0000 {isinstance}
         150028     0.2630     0.0000     3.3070     0.0000 stylizer.py:558(__getitem__)
         215733     0.2470     0.0000     0.2470     0.0000 {built-in method match}
             40     0.2460     0.0060     0.2460     0.0060 {method 'xpath' of 'lxml.etree._Element' objects}
              1     0.2410     0.2410     0.2490     0.2490 page_margin.py:127(find_levels)

Sort by cumtime
         ncalls    tottime    percall    cumtime    percall filename:lineno(function)  
              1     0.0140     0.0140    36.9980    36.9980 plumber.py:934(run_me)
              1     0.0000     0.0000    19.8110    19.8110 conversion.py:193(__call__)
              1     0.0100     0.0100    19.8100    19.8100 txt_input.py:54(convert)
              1     0.0010     0.0010    14.5750    14.5750 __init__.py:20(detect)
              1     0.0000     0.0000    14.5750    14.5750 chardet.py:36(detect)
              2     0.0000     0.0000    14.4200     7.2100 charsetgroupprober.py:55(feed)
              1     0.0000     0.0000    14.4200    14.4200 universaldetector.py:61(feed)
             15    10.6280     0.7090    12.2590     0.8170 sbcharsetprober.py:63(feed)
              1     0.0010     0.0010     9.7800     9.7800 mobi_output.py:167(convert)
              1     0.0110     0.0110     9.7680     9.7680 mobi_output.py:204(write_mobi)
              4     0.0590     0.0150     6.5140     1.6290 stylizer.py:176(__init__)
              1     0.0030     0.0030     4.7810     4.7810 html_input.py:57(convert)
              1     0.0170     0.0170     4.6750     4.6750 html_input.py:94(create_oebbook)
              1     0.0000     0.0000     4.4910     4.4910 flatcss.py:122(__call__)
              1     0.0000     0.0000     4.4740     4.4740 mobiml.py:104(__call__)
              1     0.0000     0.0000     4.4740     4.4740 mobiml.py:114(mobimlize_spine)
             52     0.0000     0.0000     4.3990     0.0850 base.py:903(fget)
              1     0.0000     0.0000     4.3960     4.3960 base.py:830(_parse_xhtml)
              1     0.0030     0.0030     4.3960     4.3960 parse_utils.py:201(parse_html)
              1     0.0010     0.0010     4.3420     4.3420 preprocess.py:495(__call__)
  574218/574217     3.5850     0.0000     4.0200     0.0000 {built-in method sub}
              1     0.0000     0.0000     3.9730     3.9730 flatcss.py:150(stylize_spine)

the profiler is inserted into plumber.py, as showed below.

diff --git a/src/calibre/ebooks/conversion/plumber.py b/src/calibre/ebooks/conversion/plumber.py
index 78821fa..9d8b4a6 100644
@@ -926,8 +926,13 @@ OptionRecommendation(name='search_replace',

self.log.info('Input debug saved to:', out_dir)

-
def run(self):
+ '''debug profile '''
+ import cProfile
+ cProfile.runctx('self.run_me()', globals(), locals())
+
+ def run_me(self):
+ #def run(self):
'''
Run the conversion pipeline
'''