Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-12-2011, 06:14 AM   #16
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Jonnster View Post
Would it be worth me taking the HTML from the input folder of the Calibre debug, converting it to CHM and then converting to mobi? If so how do I create the CHM?
This is what I was trying to tell you before, but you don't need to convert it to CHM. Take the input.html file from the input directory, open it in text/html editor, and massage the html as you see fit. Then import the html file back to Calibre - no need to convert it to CHM - and then convert from HTML to Mobi.

Note that fixing the html input document up will require a major amount of effort though. You'll need to unwrap lines yourself, and it would be a good idea to manually put the code block in <pre> tags as itimpi just described.
ldolse is offline   Reply With Quote
Old 05-12-2011, 06:15 AM   #17
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
The document is 1700 pages long. "massageing" it by hand is just not an option.
Jonnster is offline   Reply With Quote
Old 05-12-2011, 11:31 AM   #18
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
There really isn't any any other option. The CHM author has gone through and marked all code sections in pre tags. That's what you need to do with the PDF output to make it look right on your Kindle.

Due to how PDF's are made there is no good / easy way to detect and add pre tags when converting. PDF files don't even differentiate paragraphs. It's all fixed with lines. There has been many hours of work put into calibre's PDF conversion to determine which lines belong in the previous or a new paragraph.
user_none is offline   Reply With Quote
Old 05-12-2011, 12:06 PM   #19
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@user_none - if I could ask a question here. I've never looked at what a PDF structure looks like internally so have no full appreciation of the difficulties it causes. However one thing I have noted that the conversion *always* gets wrong is when a sentence in an indented paragraph starts at the leftmost column.

Code:
    Some first line.
My second line.
Will always become two paragaphs when converted.

Out of technical curiosity and ignorance what is the issue with detecting this? And does the new PDF engine (which I know is on hold) address this?
kiwidude is offline   Reply With Quote
Old 05-12-2011, 01:00 PM   #20
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
The issue is paragraphs in a novel typically start with an indent. The massaging is around 100 heuristics that re applied to the text.

I don't know much about the new engine. Kovid started and is pretty much the only on working on it. I gave up on PDF a long time ago.
user_none is offline   Reply With Quote
Old 05-12-2011, 02:07 PM   #21
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I've looked at the new engine - it's got a lot of potential. Vertical and horizontal positional information is retained so paragraphs can be detected through indents and other tests (though none of those tests are done now). Header and footer removal will also become trivial as it can be done based on position on the page. Last time I looked at it though I couldn't quite figure out the logic as the reflow function covers single column and two column unwrapping in the same function.

@kiwidude, the specific problem in your example is that punctuation at the end of a line is a full stop - since the current engine loses all positional information including indents punctuation is all we've got. If a line in the middle of a paragraph ends in with a full stop punctuation element then the paragraphs will be split there.
ldolse is offline   Reply With Quote
Old 05-12-2011, 02:12 PM   #22
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Idolse - ahhh, thanks for the info, now I understand. It is the full stop at the end of the previous line that is "significant" in this case.

Having spent many hours resurrecting some PDF conversions in Sigil on a page by page basis, this is one particular limitation I am looking forward to the new engine solving one day...
kiwidude is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Structure Detection - Remove Header (or Footer) Regex DarkKipper Conversion 69 11-09-2013 12:21 PM
structure detection - documentation ? cybmole Calibre 27 01-12-2011 02:14 AM
Trouble w structure detection jeff47 Calibre 1 10-13-2010 12:51 AM
Structure Detection Ceased To Exist? radiofred Calibre 3 10-01-2010 12:33 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 09:22 AM.


MobileRead.com is a privately owned, operated and funded community.