Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-18-2012, 03:00 PM   #1
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
Slow txt to mobi convertion, performance at o(n^2) as lines of txt grow?

When convert plain text to mobi, the time used grow more rapidly as text grows. It is roughly like:

T = k * n ^ 2

where T is the time used, n is the total lines of text file, and k is a constant between 2 to 3 on my system.

In my case, each line of text file is converted to a <p> … </p> paragraph, If for each <p> … </p>, Calibre try find its parent during convertion, presumingly by search everyline before that <p>..., then n * (n + 1) /2 search need to be done, that might be an explaination.

May I suggest add performance tuning to the future development?
forceps is offline   Reply With Quote
Old 11-18-2012, 03:25 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,782
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
You did not offer to write better code.

Moderator Notice
Please read the sticky before posting in Development. https://www.mobileread.com/forums/sho...d.php?t=122042
theducks is online now   Reply With Quote
Advert
Old 11-19-2012, 06:24 AM   #3
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
You might consider using some other program, such as Open Office or even Sigil, to establish the paragraphs yourself. Then Calibre would be much faster.

After all, how is Calibre supposed to know where the paragraphs are supposed to be? HTML input is one format Calibre is happy with.
mrmikel is offline   Reply With Quote
Old 11-19-2012, 03:39 PM   #4
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
How about an option of convert plain txt to "plain" mobi, no sections, no fonts, just like read a plain txt file, but in mobi format.

I tried to create the html from a 10M txt file with a simple python script. Inside the <body> tag, there are only <p>..<p>, or only <br>, then feed that html to ebook-convert, it still runs not fast enough.

I wish I could contribute, but the task looks daunting for my skill level...
forceps is offline   Reply With Quote
Old 11-20-2012, 08:36 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
try import text to MS word, save as filtered HTML, then have calibre convert html to mobi - that may go faster even though it is 2 step. if you don't like word, use some other program that can save as html / epub e.g. sigil.

a 10M text file. sounds awfully big though!, unless I am misunderstanding your unit of measurement.

a long novel, formatted as a text file, would be less than 1 Mb
cybmole is offline   Reply With Quote
Advert
Old 11-22-2012, 10:40 PM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Set the paragraph and formatting style manually. Do not use heuristic for the formatting. This will reduce the amount of processing calibre does to A) determine the paragraph style and B) format the text.

Heuristic formatting uses, you guessed it, heuristics and the larger the text the more it needs to process. Most heuristics are a series of regular expressions and increasing the amount of text will drastically slow down the process. Every regex needs to run over the entire document. So every line you add every regex needs to run over that much more text.

Quote:
Originally Posted by forceps
How about an option of convert plain txt to "plain" mobi, no sections, no fonts, just like read a plain txt file, but in mobi format.
Again, actually look at the options that are available. Formatting style: plain. This will do what you're asking for.
user_none is offline   Reply With Quote
Old 11-26-2012, 11:47 AM   #7
forceps
Enthusiast
forceps doesn't litterforceps doesn't litter
 
Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
after set:
--formatting-type plain --input-encoding utf8 --markdown-disable-toc --paragraph-type off

conversion time of a 10M txt to mobi is reduced to less than 2 minutes, a 20M txt takes 14 minutes. It is much better.

Thanks.
forceps is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
txt to mobi how to codrutoctavian Conversion 7 01-24-2012 10:42 PM
Convertion error txt to epub "IndexError: list index out of range" economix Conversion 6 12-25-2011 06:14 AM
.txt to .mobi BroCraig Conversion 9 03-10-2011 02:40 PM
txt to mobi - dashes becoming ? cybmole Calibre 5 10-14-2010 11:02 AM
inserting blank lines into rtf/txt/html errata Ectaco jetBook 7 07-10-2010 09:16 PM


All times are GMT -4. The time now is 05:39 PM.


MobileRead.com is a privately owned, operated and funded community.