|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Enthusiast
![]() ![]() Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
|
Slow txt to mobi convertion, performance at o(n^2) as lines of txt grow?
When convert plain text to mobi, the time used grow more rapidly as text grows. It is roughly like:
T = k * n ^ 2 where T is the time used, n is the total lines of text file, and k is a constant between 2 to 3 on my system. In my case, each line of text file is converted to a <p> … </p> paragraph, If for each <p> … </p>, Calibre try find its parent during convertion, presumingly by search everyline before that <p>..., then n * (n + 1) /2 search need to be done, that might be an explaination. May I suggest add performance tuning to the future development? |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,873
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You did not offer to write better code.
Moderator Notice
Please read the sticky before posting in Development. https://www.mobileread.com/forums/sho...d.php?t=122042 |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
You might consider using some other program, such as Open Office or even Sigil, to establish the paragraphs yourself. Then Calibre would be much faster.
After all, how is Calibre supposed to know where the paragraphs are supposed to be? HTML input is one format Calibre is happy with. |
![]() |
![]() |
![]() |
#4 |
Enthusiast
![]() ![]() Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
|
How about an option of convert plain txt to "plain" mobi, no sections, no fonts, just like read a plain txt file, but in mobi format.
I tried to create the html from a 10M txt file with a simple python script. Inside the <body> tag, there are only <p>..<p>, or only <br>, then feed that html to ebook-convert, it still runs not fast enough. I wish I could contribute, but the task looks daunting for my skill level... |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
try import text to MS word, save as filtered HTML, then have calibre convert html to mobi - that may go faster even though it is 2 step. if you don't like word, use some other program that can save as html / epub e.g. sigil.
a 10M text file. sounds awfully big though!, unless I am misunderstanding your unit of measurement. a long novel, formatted as a text file, would be less than 1 Mb |
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Set the paragraph and formatting style manually. Do not use heuristic for the formatting. This will reduce the amount of processing calibre does to A) determine the paragraph style and B) format the text.
Heuristic formatting uses, you guessed it, heuristics and the larger the text the more it needs to process. Most heuristics are a series of regular expressions and increasing the amount of text will drastically slow down the process. Every regex needs to run over the entire document. So every line you add every regex needs to run over that much more text. Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Enthusiast
![]() ![]() Posts: 26
Karma: 168
Join Date: May 2005
Location: Wuhan, China
Device: Kindle DXG
|
after set:
--formatting-type plain --input-encoding utf8 --markdown-disable-toc --paragraph-type off conversion time of a 10M txt to mobi is reduced to less than 2 minutes, a 20M txt takes 14 minutes. It is much better. Thanks. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
txt to mobi how to | codrutoctavian | Conversion | 7 | 01-24-2012 10:42 PM |
Convertion error txt to epub "IndexError: list index out of range" | economix | Conversion | 6 | 12-25-2011 06:14 AM |
.txt to .mobi | BroCraig | Conversion | 9 | 03-10-2011 02:40 PM |
txt to mobi - dashes becoming ? | cybmole | Calibre | 5 | 10-14-2010 11:02 AM |
inserting blank lines into rtf/txt/html | errata | Ectaco jetBook | 7 | 07-10-2010 09:16 PM |