Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-10-2011, 07:32 AM   #1
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
Structure Detection Problems

Before I bought my Kindle 3, I bought a very large programming book on C# and .NET. This also came with a PDF version of the book. I could sell the book and buy the Kindle specific version but I'd lose money and the Kindle version is not that much cheaper.

I would like to convert the PDF version to mobi and have been trying with Calibre (along with other options) unsuccessfully. I have just discovered the debug options in Calibre and have been looking through the debug output to try and see what is going on.

There are a couple of problem. Any tables or table based diagrams in the PDF are lost completely. I can live with this if I can solve the next problem.

The book contains a lot of short example code. Looking in the input directory, this is formatted correctly like this:

Code:
public class Garage 
{ 
  private Car[] carArray = new Car[4]; 
  ... 
  // Iterator method. 
  public IEnumerator GetEnumerator() 
  { 
    foreach (Car c in carArray) 
    { 
      yield return c; 
    } 
  } 
}
However, the parsed directory in the debug output has messed up this formatting and looks like this:

Code:
public class Garage

{

private Car[] carArray = new Car[4];

...

// Iterator method.

public IEnumerator GetEnumerator()

{

foreach (Car c in carArray)

{

yield return c;

}

}

}
From what I have read, this is caused by the Structure Detection settings in Calibre. However, this page is a foreign language to me and I wouldn't know where to start to solve this. Can anyone help?
Jonnster is offline   Reply With Quote
Old 05-10-2011, 07:40 AM   #2
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
To add to this, here is the actual HTML

Input directory:
Code:
public class Garage&nbsp;<br>
{&nbsp;<br>
&nbsp; private Car[] carArray = new Car[4];&nbsp;<br>
&nbsp; ...&nbsp;<br>
<b>&nbsp; // Iterator method.&nbsp;</b><br>
&nbsp; public IEnumerator GetEnumerator()&nbsp;<br>
&nbsp; {&nbsp;<br>
&nbsp; &nbsp; foreach (Car c in carArray)&nbsp;<br>
&nbsp; &nbsp; {&nbsp;<br>
&nbsp; &nbsp; &nbsp; yield return c;&nbsp;<br>
&nbsp; &nbsp; }&nbsp;<br>
&nbsp; }&nbsp;<br>
}&nbsp;<br>

Parsed directory:
Code:
public class Garage </p>
<p>{ </p>
<p>private Car[] carArray = new Car[4]; </p>
<p>... </p>
<p><b>  // Iterator method. </b></p>
<p>public IEnumerator GetEnumerator() </p>
<p>{ </p>
<p>foreach (Car c in carArray) </p>
<p>{ </p>
<p>yield return c; </p>
<p>} </p>
<p>} </p>
<p>} </p>

So the problem is that the parsed code has had paragraph tags added to every line and it has removed all the &nbsp bits.

How can I solve this?

Update: Sorry repasted the parsed code as I had accidentally just pasted the input code twice.

Last edited by Jonnster; 05-10-2011 at 08:49 AM.
Jonnster is offline   Reply With Quote
Advert
Old 05-10-2011, 11:09 AM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Please read mobileread.mobi/forums/showthread.php?t=118605 especially rhe section about text formatting.
user_none is offline   Reply With Quote
Old 05-10-2011, 12:24 PM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I think user_none meant this thread: https://www.mobileread.com/forums/sho...d.php?t=118605
ldolse is offline   Reply With Quote
Old 05-10-2011, 12:34 PM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
@ldolse, It's the same thread. My link is to the mobile version. I tend to read and respond on my phone so I don't even think about the fact that there I use the mobile version of MobileRead...
user_none is offline   Reply With Quote
Advert
Old 05-10-2011, 03:54 PM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Ah, I didn't realize .mobi was now a valid TLD, I didn't test the link and had assumed something got messed up in your clipboard.
ldolse is offline   Reply With Quote
Old 05-11-2011, 04:03 AM   #7
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
So are you saying I might as well forget sorting this out? No other applications that can retain this type of formatting during conversion?

For example, is there no way I can open the html file from the input directory and do a multiple search and replace on &nbsp to something else that calibre will be happy with?

Last edited by Jonnster; 05-11-2011 at 05:05 AM.
Jonnster is offline   Reply With Quote
Old 05-11-2011, 06:27 AM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
You can open the html in the input directory and leave all the &nbsp there, but then you're on you're own to fix all the other problems in the document - once you've done that you can import it back to Calibre as html.

The problem is that the poppler libraries stick &nbsp everywhere, and in order to do all the things to fix a typical novel they get removed first. There are probably other approaches that can be taken in the code, but investing significant effort in the existing pdf conversion engine isn't worth it since it's going to go away.

Blocks of code examples are always going to be a sticky problem regardless of what happens in the future.
ldolse is offline   Reply With Quote
Old 05-11-2011, 06:41 AM   #9
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
Sorry I don't understand what you mean by "You can open the html in the input directory and leave all the &nbsp there". Whatever conversion tool I use is going to remove the &nbsp tags. How do I leave them there and read it on my Kindle?
Jonnster is offline   Reply With Quote
Old 05-11-2011, 07:25 AM   #10
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Why not just read the PDF file on your Kindle instead of converting it? The Kindle 3 can read PDF files natively.
user_none is offline   Reply With Quote
Old 05-11-2011, 07:47 AM   #11
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
The screen is way too small for reading technical PDFs. Even in landscape in my opinion. I know the obvious answer is I should have bought a larger screen reader (and I was warned about this here at MR before purchase) but they are too expensive and too big and heavy. A 6 inch is the only practical reader for me from a cost and size perspective but unfortunately PDF reading has proved to be rather problematic.

Strangely enough I considered purchasing the Kindle version of this book on Amazon. They had a sample download of the book and the structure and formatting of that is only slightly better. The Kindle version converts all the sample source code to images and is really blurry. They are also not added to the ebook all the same size and it just looks weird. It's better than the PDF conversion but it isn't worth the £26 they are asking for it. It feels like a bit of a rip off.
Jonnster is offline   Reply With Quote
Old 05-11-2011, 07:59 AM   #12
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
IMO for reading technical manuals (programming books in particular) you can forget about trying to read them on a Kindle. That is why I bought a tablet to supplement my Kindle. All my fiction reading I do on my Kindle which I love and goes with me everywhere, and my tablet is used for relaxing on the couch at home.

PDF is a crap format to try to convert from as is repeated everywhere, I think you are pushing the proverbial uphill to try to convert in a way you will find satisfactory on your Kindle. Some things need a large screen to be practical to read, and technical books like this fall into this category imho.
kiwidude is offline   Reply With Quote
Old 05-11-2011, 08:49 AM   #13
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
Yes I agree but a tablet isn't an option for me. Too heavy, too pricey and poor battery life. I love the Kindle for its long battery life as I hate any gadget that requires a lot of charging. I just need to accept that PDF viewing is out of the question. I also haven't got enough time for all this conversion hassle anyway. I'll stick to formats that the Kindle can just handle immediately.
Jonnster is offline   Reply With Quote
Old 05-12-2011, 04:20 AM   #14
Jonnster
Member
Jonnster began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2011
Device: Kindle 3
I have another programming ebook which came on the CD with the book which is in CHM format. If I convert this to mobi using Mobipocket Creator then all the source code formatting is retained. This tells me that it is possible (CHM is just compiled HTML). So why is it the formatting is lost from HTML but not CHM?

Would it be worth me taking the HTML from the input folder of the Calibre debug, converting it to CHM and then converting to mobi? If so how do I create the CHM?
Jonnster is offline   Reply With Quote
Old 05-12-2011, 04:33 AM   #15
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
The code formatting will be maintained if it is specified as fixed in the HTML (typically by surrounding it with <PRE> style tags. Hoever one then needs to be aware that this can make the text go off-screen if it is too wide as the <PRE> tag would inhibit word-wrap.
itimpi is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Structure Detection - Remove Header (or Footer) Regex DarkKipper Conversion 69 11-09-2013 12:21 PM
structure detection - documentation ? cybmole Calibre 27 01-12-2011 02:14 AM
Trouble w structure detection jeff47 Calibre 1 10-13-2010 12:51 AM
Structure Detection Ceased To Exist? radiofred Calibre 3 10-01-2010 12:33 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 06:46 PM.


MobileRead.com is a privately owned, operated and funded community.