Conversion process VERY slow on certain books

louwin · 06-04-2012, 07:59 AM

I couldn't get "Send to device(iPad)" to work so I selected 36 to convert TXT to EPUB and these are the results at point of "hang"?....

Half the books took between 3 seconds to 40 seconds to convert but one has taken an hour and is still converting.

One book is .4Mb and took 48 minutes to convert. The biggest book is 1.2Mb and took 54 seconds to convert.

The book that took 3 seconds to convert is .1Mb.

I know size isn't important (

) but why would a 1.2Mb text book take less than a minute to convert and another .4Mb text book take 48 minutes?

Two books are still converting 1) .4Mb - 65 minutes and still converting and 2) .8Mb and taken 61 minutes so far. Both are set at 1%

Is this kind of disparity of timing normal?

DoctorOhh · 06-04-2012, 08:31 AM

Quote:

Originally Posted by louwin

I know size isn't important (

) but why would a 1.2Mb text book take less than a minute to convert and another .4Mb text book take 48 minutes?

You jest, but the size of the book has little to do with conversion times.

Quote:

Originally Posted by louwin

Two books are still converting 1) .4Mb - 65 minutes and still converting and 2) .8Mb and taken 61 minutes so far. Both are set at 1%

Is this kind of disparity of timing normal?

Yes this kind of disparity is normal.

Text to ePub should progress quickly although if I recall correctly certain heuristic processing can take a long time.

Most the disparity is converting certain htm or Lit files to ePubs. Some books initially created in MS Word often have a crap load of extra CSS and font-face items that need to be processed. I have had more than one book take close to 6 hours.

Quote:

Originally Posted by louwin

Both are set at 1%

Don't let the 1% get you down, it is often not accurate and will jump from 1% to 40% to 70% to complete.

Good Luck.

jackie_w · 06-04-2012, 10:26 AM

Quote:

Originally Posted by dwanthny

Some books initially created in MS Word often have a crap load of extra CSS and font-face items that need to be processed.

You'll be pleased to hear there should be some improvement in this area in from v0.8.51

Quote:

Conversion pipeline: Filter out the useless font-face rules inserted by Microsoft Word for every font on the system

...well... every little helps

louwin · 06-04-2012, 11:02 AM

Oops, I may have aborted the process too early?

It was, I thought "hung", so I pulled the plug at 230 minutes.

And Sorry, it wasn't text to epub

It was mobi to epub.

Of the 35 books being converted it was stuck(?) on three mobi books and 3 waiting, see attachment.

Do I start the conversion again and let it run for more than 6 hours????

OBTW I am using the current portable version on an i7 Windows 7 64bit

theducks · 06-04-2012, 03:31 PM

Quote:

Originally Posted by louwin

Oops, I may have aborted the process too early?

It was, I thought "hung", so I pulled the plug at 230 minutes.

And Sorry, it wasn't text to epub

It was mobi to epub.

Of the 35 books being converted it was stuck(?) on three mobi books and 3 waiting, see attachment.

Do I start the conversion again and let it run for more than 6 hours????

OBTW I am using the current portable version on an i7 Windows 7 64bit

Let it run till it is done or crashes

You can't tell which will need this from the outside

I have had a few take 24 hours

louwin · 06-06-2012, 07:49 AM

WOW! how could it run for that long and NOT be "hung".

What could it be doing for that long?

And this is normal? Not a bug?

I can't understand why mine took 230 minutes

forceps · 06-06-2012, 09:59 AM

Calibre is great, but there is still room for improvement, the conversion speed should be one of them.

I did several txt to mobi conversion recently. Since the output is mobi, the mobi output plugin is used, and so does the SVGRasterizer, which takes rather long time to run. While running, SVGRasterize called Stylizer, which apparently go through every line of the temporary XHTML file. Those time are completely wasted because there is no svg image involved in a text file.

I also notice the speed decreases more rapidly than the growth of the size of text file. Similar speed problem happened before when use BeautifulSoup to parse large html file.

GreenMonkey · 06-06-2012, 01:21 PM

Quote:

Originally Posted by theducks

Let it run till it is done or crashes

You can't tell which will need this from the outside

I have had a few take 24 hours

I had one that ran for like 12 hours on my laptop (AMD Turion dual core CPU...not the speediest CPU in raw crunching ability) before I gave up.

I gave up and moved it over to my desktop (Phenom II x4) where it had a lot more CPU/memory bandwidth and it finished in a few hours instead.

It was html-> epub with a LOT of CSS to process. I haven't had one take longer than 3-4 hours but I've heard it's possible if the CSS is complicated.

naisren · 06-11-2012, 11:36 AM

mobi is not supported by many readers.
ePub is popular, so mobi2epub is very important.
But no good way to convert between with ease until now. The main problem are conveting speed and keeping original structures and links.
Wish a quick converter.

naisren · 06-11-2012, 11:43 AM

A mobi ebook of 5m size, I take a snapshot of middleway (about 14 minutes past), I guess our calibre would take 30 minutes to finish the work of mobi2epub. It is too SLOW!

Code:

Resolved conversion options
calibre version: 0.8.54
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 0.0,
 'book_producer': None,
 'change_justification': u'original',
 'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|prologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': u'C:\\DOCUME~1\\user\\LOCALS~1\\Temp\\calibre_0.8.54_tmp_ovaq7o\\6oauzj.jpeg',
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_split_on_page_breaks': False,
 'duplicate_links_in_toc': False,
 'enable_heuristics': False,
 'epub_flatten': False,
 'extra_css': None,
 'extract_to': None,
 'filter_css': u'',
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': None,
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x034212B0>,
 'insert_blank_line': False,
 'insert_blank_line_size': 0.5,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': None,
 'level2_toc': None,
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 5.0,
 'markup_chapter_headings': True,
 'max_toc_links': 50,
 'minimum_line_height': 120.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.iPadOutput object at 0x03421550>,
 'page_breaks_before': u"//*[name()='h1' or name()='h2']",
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'C:\\DOCUME~1\\user\\LOCALS~1\\Temp\\calibre_0.8.54_tmp_ovaq7o\\h9qx3r.opf',
 'remove_fake_margins': True,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'search_replace': '[]',
 'series': None,
 'series_index': None,
 'smarten_punctuation': False,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'unsmarten_punctuation': False,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: MOBI Input running
on C:\DOCUME~1\user\LOCALS~1\Temp\calibre_0.8.54_tmp_ovaq7o\sq4coe.mobi
Extracting text...
Adding anchors...
Extracting images...

Cleaning up HTML...

Parsing HTML...

Converting style information to CSS...

Creating OPF...

Parsing all content...
Parsing styles.css ...
Parsing dummy.html ...

Forcing dummy.html into XHTML namespace

Reading TOC from NCX...

Merging user specified metadata...
Detecting structure...

	Detected chapter: BOOK ITHE BYZANTINE ZENITH325–565
	Detected chapter: BOOK IIISLAMIC CIVILIZATION569–1258
	Detected chapter: BOOK IIIJUDAIC CIVILIZATION135–1300
	Detected chapter: BOOK IVTHE DARK AGES566–1095
	Detected chapter: BOOK VTHE CLIMAX OF CHRISTIANITY1095–1300
	Detected chapter: Epilogue

Flattening CSS and remapping font sizes...

Source base font size is 12.00000pt

Removing fake margins...

Found 682 items of level: div_1
Found 659 items of level: p_4
Found 655 items of level: p_5
Found 12402 items of level: p_2
Found 3119 items of level: p_1
div_1  left margin stats: Counter()
div_1  right margin stats: Counter()
p_4  left margin stats: Counter({u'0': 659})
p_4  right margin stats: Counter({u'0': 659})
Negative text indent detected at level  p_5, ignoring this level
Negative text indent detected at level  p_2, ignoring this level

p_1  left margin stats: Counter({u'0': 3119})
p_1  right margin stats: Counter({u'0': 3119})
Cleaning up manifest...
Trimming unused files from manifest...

Trimming u'images/00056.jpg' from manifest
Creating EPUB Output...

Rescaling image from 1000x637 to 749x477 images/00054.jpg
Rescaling image from 900x1350 to 670x1005 images/00001.jpg
Rescaling image from 825x976 to 749x886 images/00002.jpg

Rescaling image from 900x1350 to 670x1005 cover.jpeg

Rescaling image from 825x1147 to 723x1005 images/00055.jpg
Splitting markup on page breaks and flow limits, if any...

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

		Splitting on page-break

KevinH · 06-11-2012, 01:52 PM

Hi,

Resizing and converting images can be quite time consuming and memory intensive.

Also often long execution times in python are related to memory allocation/deallocation thrashing. All strings in Python are immutable. So any time you append or delete or modify a string in any way, python allocates new memory and deallocates the old. So appending/changing lots of little substrings in a long (large) string can result in massive thrashing of memory (the cpu is pegged just deallocating and reallocating memory and the paging and virtual memory that can be associated with that).

Walking and modifying long css and xhtml files can be quite costly is lots of little changes are needed on the large xhtml file.

The only "workaround" for the python approach of having strings be immutable is to create lists of string pieces and when finally ready allocating the required space once, and then copying all of the strings in the list (in order) into that newly allocated string.

This approach prevents the memory thrashing and can significantly speed up operations in python on memory limited platforms.

Perhaps these may be contributing to what you are seeing. Does your "benchmark" conversion properly resize and convert images and properly process/convert CSS entities, processing large xhtml files, etc?

KevinH

underdunne · 06-22-2012, 08:33 AM

Quote:

Originally Posted by theducks

Let it run till it is done or crashes

You can't tell which will need this from the outside

I have had a few take 24 hours

What size file was that?

I've been running the conversion process process on a 50Mb PDF to ePub file.
It's got images in it but they're mainly B&W. It's currently just gone over 4 hrs.

theducks · 06-22-2012, 09:54 AM

Quote:

Originally Posted by underdunne

What size file was that?

I've been running the conversion process process on a 50Mb PDF to ePub file.
It's got images in it but they're mainly B&W. It's currently just gone over 4 hrs.

ANY SIZE (or Size Doesn't Matter'

)
That is why my line above: 'You can't tell from the outside'
Proceed to rule 1 (and be prepared for 24 hours or more with your other activities. Cancelling only means you start over

)

underdunne · 06-22-2012, 02:56 PM

Bummer...637 mins and still only 1%. wish there was some better indication.

JSWolf · 06-22-2012, 03:04 PM

I've never had any Mobi > ePub conversion or ePub > Mobi conversion take longer then 10 minutes. I've not performed enough conversions to the combination KF8/Mobi to be able to say who well that goes.

06-04-2012, 07:59 AM	#1
louwin Newbie Nerd Posts: 114 Karma: 1000354 Join Date: Feb 2012 Location: Perth, Western Australia Device: iPad 3 64Gb Black	Conversion process VERY slow on certain books I couldn't get "Send to device(iPad)" to work so I selected 36 to convert TXT to EPUB and these are the results at point of "hang"?.... Half the books took between 3 seconds to 40 seconds to convert but one has taken an hour and is still converting. One book is .4Mb and took 48 minutes to convert. The biggest book is 1.2Mb and took 54 seconds to convert. The book that took 3 seconds to convert is .1Mb. I know size isn't important ( ) but why would a 1.2Mb text book take less than a minute to convert and another .4Mb text book take 48 minutes? Two books are still converting 1) .4Mb - 65 minutes and still converting and 2) .8Mb and taken 61 minutes so far. Both are set at 1% Is this kind of disparity of timing normal?

06-04-2012, 11:02 AM	#4
louwin Newbie Nerd Posts: 114 Karma: 1000354 Join Date: Feb 2012 Location: Perth, Western Australia Device: iPad 3 64Gb Black	Oops, I may have aborted the process too early? It was, I thought "hung", so I pulled the plug at 230 minutes. And Sorry, it wasn't text to epub It was mobi to epub. Of the 35 books being converted it was stuck(?) on three mobi books and 3 waiting, see attachment. Do I start the conversion again and let it run for more than 6 hours???? OBTW I am using the current portable version on an i7 Windows 7 64bit Attached Thumbnails

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Runaway conversion process!	johnb0647	Calibre	3	02-28-2012 05:37 AM
Trying to understand conversion process	AlexBell	Conversion	4	06-16-2011 07:46 AM
Help w/ Conversion Process	dftr	Workshop	2	06-20-2009 08:33 PM
calibre - very slow conversion, very slow on PRS	cremofix	Calibre	3	06-10-2009 04:21 PM
New Conversion Process	Gideon	Kindle Formats	2	02-19-2009 11:04 PM

06-06-2012, 07:49 AM	#6
louwin Newbie Nerd Posts: 114 Karma: 1000354 Join Date: Feb 2012 Location: Perth, Western Australia Device: iPad 3 64Gb Black	WOW! how could it run for that long and NOT be "hung". What could it be doing for that long? And this is normal? Not a bug? I can't understand why mine took 230 minutes

06-06-2012, 09:59 AM	#7
forceps Enthusiast Posts: 26 Karma: 168 Join Date: May 2005 Location: Wuhan, China Device: Kindle DXG	Calibre is great, but there is still room for improvement, the conversion speed should be one of them. I did several txt to mobi conversion recently. Since the output is mobi, the mobi output plugin is used, and so does the SVGRasterizer, which takes rather long time to run. While running, SVGRasterize called Stylizer, which apparently go through every line of the temporary XHTML file. Those time are completely wasted because there is no svg image involved in a text file. I also notice the speed decreases more rapidly than the growth of the size of text file. Similar speed problem happened before when use BeautifulSoup to parse large html file.

06-11-2012, 11:36 AM	#9
naisren Enthusiast Posts: 41 Karma: 12 Join Date: Jul 2009 Device: ppc	mobi is not supported by many readers. ePub is popular, so mobi2epub is very important. But no good way to convert between with ease until now. The main problem are conveting speed and keeping original structures and links. Wish a quick converter.

06-11-2012, 01:52 PM	#11
KevinH Sigil Developer Posts: 9,748 Karma: 6774572 Join Date: Nov 2009 Device: many	Hi, Resizing and converting images can be quite time consuming and memory intensive. Also often long execution times in python are related to memory allocation/deallocation thrashing. All strings in Python are immutable. So any time you append or delete or modify a string in any way, python allocates new memory and deallocates the old. So appending/changing lots of little substrings in a long (large) string can result in massive thrashing of memory (the cpu is pegged just deallocating and reallocating memory and the paging and virtual memory that can be associated with that). Walking and modifying long css and xhtml files can be quite costly is lots of little changes are needed on the large xhtml file. The only "workaround" for the python approach of having strings be immutable is to create lists of string pieces and when finally ready allocating the required space once, and then copying all of the strings in the list (in order) into that newly allocated string. This approach prevents the memory thrashing and can significantly speed up operations in python on memory limited platforms. Perhaps these may be contributing to what you are seeing. Does your "benchmark" conversion properly resize and convert images and properly process/convert CSS entities, processing large xhtml files, etc? KevinH

06-22-2012, 02:56 PM	#14
underdunne Member Posts: 12 Karma: 10 Join Date: Jun 2012 Location: Ireland Device: kindle,	Bummer...637 mins and still only 1%. wish there was some better indication.

06-22-2012, 03:04 PM	#15
JSWolf Resident Curmudgeon Posts: 84,001 Karma: 153695583 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	I've never had any Mobi > ePub conversion or ePub > Mobi conversion take longer then 10 minutes. I've not performed enough conversions to the combination KF8/Mobi to be able to say who well that goes.

Advert

Advert