Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 06-08-2012, 07:59 AM   #1
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
profile txt -> mobi convert

Text to mobi convert appears rather slow. Here I'd like to present a profile result performed on a 5000 lines txt file in unicode Chinese, the file size is 960902 bytes.

The ebook-convert seems spend a lot time in detect char encode. There must something can be done to speed up the convert.



$ file sample.txt
sample.txt: UTF-8 Unicode text

Code:
Sort by tottime
         ncalls    tottime    percall    cumtime    percall filename:lineno(function)  
             15    10.6280     0.7090    12.2590     0.8170 sbcharsetprober.py:63(feed)
  574218/574217     3.5850     0.0000     4.0200     0.0000 {built-in method sub}
            268     2.7980     0.0100     2.7980     0.0100 {cPalmdoc.compress}
            751     2.5250     0.0030     2.5250     0.0030 stylizer.py:126(__call__)
         960902     1.0610     0.0000     1.1240     0.0000 codingstatemachine.py:40(next_state)
       16097860     0.9990     0.0000     0.9990     0.0000 {ord}
              1     0.6140     0.6140     1.7810     1.7810 utf8prober.py:50(feed)
         145029     0.5670     0.0000     0.8200     0.0000 __init__.py:194(unit_convert)
  245208/145027     0.4080     0.0000     0.5700     0.0000 stylizer.py:564(_get)
  145029/145027     0.3840     0.0000     1.4220     0.0000 stylizer.py:577(_unit_convert)
           5033     0.3510     0.0000     0.3510     0.0000 {built-in method findall}
         5001/1     0.3430     0.0000     3.6550     3.6550 mobiml.py:292(mobimlize_elem)
  347846/327027     0.3390     0.0000     1.0280     0.0000 {hasattr}
              1     0.3180     0.3180     0.3800     0.3800 hebrewprober.py:188(feed)
         361570     0.3180     0.0000     1.3790     0.0000 re.py:229(_compile)
          20565     0.2880     0.0000     0.4750     0.0000 cssstyledeclaration.py:397(getProperty)
          41234     0.2840     0.0000     0.6710     0.0000 serialize.py:1001(do_css_Value)
         642375     0.2640     0.0000     0.2640     0.0000 {isinstance}
         150028     0.2630     0.0000     3.3070     0.0000 stylizer.py:558(__getitem__)
         215733     0.2470     0.0000     0.2470     0.0000 {built-in method match}
             40     0.2460     0.0060     0.2460     0.0060 {method 'xpath' of 'lxml.etree._Element' objects}
              1     0.2410     0.2410     0.2490     0.2490 page_margin.py:127(find_levels)

Sort by cumtime
         ncalls    tottime    percall    cumtime    percall filename:lineno(function)  
              1     0.0140     0.0140    36.9980    36.9980 plumber.py:934(run_me)
              1     0.0000     0.0000    19.8110    19.8110 conversion.py:193(__call__)
              1     0.0100     0.0100    19.8100    19.8100 txt_input.py:54(convert)
              1     0.0010     0.0010    14.5750    14.5750 __init__.py:20(detect)
              1     0.0000     0.0000    14.5750    14.5750 chardet.py:36(detect)
              2     0.0000     0.0000    14.4200     7.2100 charsetgroupprober.py:55(feed)
              1     0.0000     0.0000    14.4200    14.4200 universaldetector.py:61(feed)
             15    10.6280     0.7090    12.2590     0.8170 sbcharsetprober.py:63(feed)
              1     0.0010     0.0010     9.7800     9.7800 mobi_output.py:167(convert)
              1     0.0110     0.0110     9.7680     9.7680 mobi_output.py:204(write_mobi)
              4     0.0590     0.0150     6.5140     1.6290 stylizer.py:176(__init__)
              1     0.0030     0.0030     4.7810     4.7810 html_input.py:57(convert)
              1     0.0170     0.0170     4.6750     4.6750 html_input.py:94(create_oebbook)
              1     0.0000     0.0000     4.4910     4.4910 flatcss.py:122(__call__)
              1     0.0000     0.0000     4.4740     4.4740 mobiml.py:104(__call__)
              1     0.0000     0.0000     4.4740     4.4740 mobiml.py:114(mobimlize_spine)
             52     0.0000     0.0000     4.3990     0.0850 base.py:903(fget)
              1     0.0000     0.0000     4.3960     4.3960 base.py:830(_parse_xhtml)
              1     0.0030     0.0030     4.3960     4.3960 parse_utils.py:201(parse_html)
              1     0.0010     0.0010     4.3420     4.3420 preprocess.py:495(__call__)
  574218/574217     3.5850     0.0000     4.0200     0.0000 {built-in method sub}
              1     0.0000     0.0000     3.9730     3.9730 flatcss.py:150(stylize_spine)
the profiler is inserted into plumber.py, as showed below.

diff --git a/src/calibre/ebooks/conversion/plumber.py b/src/calibre/ebooks/conversion/plumber.py
index 78821fa..9d8b4a6 100644
@@ -926,8 +926,13 @@ OptionRecommendation(name='search_replace',

self.log.info('Input debug saved to:', out_dir)

-
def run(self):
+ '''debug profile '''
+ import cProfile
+ cProfile.runctx('self.run_me()', globals(), locals())
+
+ def run_me(self):
+ #def run(self):
'''
Run the conversion pipeline
'''
forceps is offline   Reply With Quote
Old 06-08-2012, 08:23 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,427
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
--input-encoding
kovidgoyal is online now   Reply With Quote
Old 06-08-2012, 10:59 AM   #3
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
thanks Kovid for the great work! Turn on --input-encoding certainly helps,

Code:
          specify encoding       NOT specify encoding
----------------------------------------------------------------
time             22s                36s

total
func calls       14M                30M

top1 func call    ord (0.7M)         ord (16M)
It still not fast enough for a 5000 line text file, also the encoding detecting along takes 14 seconds. Something is not completely right.
forceps is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert from epub/mobi back to TXT or any format? KDA1 Calibre 1 01-26-2012 04:19 PM
txt to mobi how to codrutoctavian Conversion 7 01-24-2012 10:42 PM
How to config calibre when convert Chinese txt to mobi? fifth Calibre 6 10-04-2010 08:56 AM
Unable Convert Gutenberg TXT to Mobi ascherjim Calibre 4 06-23-2009 08:55 AM
Convert Mobi to txt jflatto Kindle Formats 1 10-19-2008 04:14 PM


All times are GMT -4. The time now is 01:44 PM.


MobileRead.com is a privately owned, operated and funded community.