Text to mobi convert appears rather slow. Here I'd like to present a profile result performed on a 5000 lines txt file in unicode Chinese, the file size is 960902 bytes.
The ebook-convert seems spend a lot time in detect char encode. There must something can be done to speed up the convert.
$ file sample.txt
sample.txt: UTF-8 Unicode text
Code:
Sort by tottime
ncalls tottime percall cumtime percall filename:lineno(function)
15 10.6280 0.7090 12.2590 0.8170 sbcharsetprober.py:63(feed)
574218/574217 3.5850 0.0000 4.0200 0.0000 {built-in method sub}
268 2.7980 0.0100 2.7980 0.0100 {cPalmdoc.compress}
751 2.5250 0.0030 2.5250 0.0030 stylizer.py:126(__call__)
960902 1.0610 0.0000 1.1240 0.0000 codingstatemachine.py:40(next_state)
16097860 0.9990 0.0000 0.9990 0.0000 {ord}
1 0.6140 0.6140 1.7810 1.7810 utf8prober.py:50(feed)
145029 0.5670 0.0000 0.8200 0.0000 __init__.py:194(unit_convert)
245208/145027 0.4080 0.0000 0.5700 0.0000 stylizer.py:564(_get)
145029/145027 0.3840 0.0000 1.4220 0.0000 stylizer.py:577(_unit_convert)
5033 0.3510 0.0000 0.3510 0.0000 {built-in method findall}
5001/1 0.3430 0.0000 3.6550 3.6550 mobiml.py:292(mobimlize_elem)
347846/327027 0.3390 0.0000 1.0280 0.0000 {hasattr}
1 0.3180 0.3180 0.3800 0.3800 hebrewprober.py:188(feed)
361570 0.3180 0.0000 1.3790 0.0000 re.py:229(_compile)
20565 0.2880 0.0000 0.4750 0.0000 cssstyledeclaration.py:397(getProperty)
41234 0.2840 0.0000 0.6710 0.0000 serialize.py:1001(do_css_Value)
642375 0.2640 0.0000 0.2640 0.0000 {isinstance}
150028 0.2630 0.0000 3.3070 0.0000 stylizer.py:558(__getitem__)
215733 0.2470 0.0000 0.2470 0.0000 {built-in method match}
40 0.2460 0.0060 0.2460 0.0060 {method 'xpath' of 'lxml.etree._Element' objects}
1 0.2410 0.2410 0.2490 0.2490 page_margin.py:127(find_levels)
Sort by cumtime
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.0140 0.0140 36.9980 36.9980 plumber.py:934(run_me)
1 0.0000 0.0000 19.8110 19.8110 conversion.py:193(__call__)
1 0.0100 0.0100 19.8100 19.8100 txt_input.py:54(convert)
1 0.0010 0.0010 14.5750 14.5750 __init__.py:20(detect)
1 0.0000 0.0000 14.5750 14.5750 chardet.py:36(detect)
2 0.0000 0.0000 14.4200 7.2100 charsetgroupprober.py:55(feed)
1 0.0000 0.0000 14.4200 14.4200 universaldetector.py:61(feed)
15 10.6280 0.7090 12.2590 0.8170 sbcharsetprober.py:63(feed)
1 0.0010 0.0010 9.7800 9.7800 mobi_output.py:167(convert)
1 0.0110 0.0110 9.7680 9.7680 mobi_output.py:204(write_mobi)
4 0.0590 0.0150 6.5140 1.6290 stylizer.py:176(__init__)
1 0.0030 0.0030 4.7810 4.7810 html_input.py:57(convert)
1 0.0170 0.0170 4.6750 4.6750 html_input.py:94(create_oebbook)
1 0.0000 0.0000 4.4910 4.4910 flatcss.py:122(__call__)
1 0.0000 0.0000 4.4740 4.4740 mobiml.py:104(__call__)
1 0.0000 0.0000 4.4740 4.4740 mobiml.py:114(mobimlize_spine)
52 0.0000 0.0000 4.3990 0.0850 base.py:903(fget)
1 0.0000 0.0000 4.3960 4.3960 base.py:830(_parse_xhtml)
1 0.0030 0.0030 4.3960 4.3960 parse_utils.py:201(parse_html)
1 0.0010 0.0010 4.3420 4.3420 preprocess.py:495(__call__)
574218/574217 3.5850 0.0000 4.0200 0.0000 {built-in method sub}
1 0.0000 0.0000 3.9730 3.9730 flatcss.py:150(stylize_spine)
the profiler is inserted into plumber.py, as showed below.
diff --git a/src/calibre/ebooks/conversion/plumber.py b/src/calibre/ebooks/conversion/plumber.py
index 78821fa..9d8b4a6 100644
@@ -926,8 +926,13 @@ OptionRecommendation(name='search_replace',
self.log.info('Input debug saved to:', out_dir)
-
def run(self):
+ '''debug profile '''
+ import cProfile
+ cProfile.runctx('self.run_me()', globals(), locals())
+
+ def run_me(self):
+ #def run(self):
'''
Run the conversion pipeline
'''