possible mobi to epub error - lost chapter numbers

cybmole · 02-02-2012, 04:25 AM

I have a retail mobi copy of Phil Rickman's The Lamp of the wicked

when I view mobi, in KIndle for PC or incalibre viewer, chapter numbers are all OK
but when I convert mobi to epub & view the outrput I am losing chapter numbers 1 through 9,
in sigil i see code like

Code:

<body class="calibre">
<div class="sgc-1" id="filepos22360"></div>

  <h1 class="calibre1" id="calibre_pb_16">Foul Water</h1>

what i should be seeing is

Code:

<div class="sgc-1" id="filepos22360">1</div>

for all the 2 digit chapter numbers I do see a 2 digit number in that div line

UPDATE EDIT: bug report now filed, with mobi attached ( no DRM)

it is possible the the free sample from amazon will exhibit the same behaviour though.
does anyone have a hypotheses which could explain the lost numbers.
NB converting with heuristics OFF, no strange settings

UPDATE - looking harder - I also see this, on each of the problem chapter pages:
div.sgc-1 {height:0pt}
what puts that there ?

also from chapter 10 onwards the epub conversion puts each chapter number into its own xhtml file, followed by an xhtml which contains the chapter header text & the chapter text. see example below
don't know how to check if that is in the original mobi ?

Code:

</head>

<body class="calibre">
  <h1 class="calibre1" id="filepos212373">10</h1>
</body>
</html>

cybmole · 02-02-2012, 06:09 AM

I got Kovid's fast bug response:
"That MOBI file uses two separate <h1> tags for the chapter number and the chapter. calibre inserts page breaks on <h1> tags and removes pages with too little content (the pages with numbers 1-9). Turn off the insert page break before setting.
status invalid"

I get that but don't understand why chapters 10 to end are not treated the same way ?

is the definition of "too little content" equal to one cahracter but not 2 or more characters ?

& is it sensible to delete even 1 character of text during a conversion. the conversion program does not know the possible significance of that 1 character ?

I'm not comfortable with a conversion program making hidden decisions to remove ANY text, surely it would be better to merge any short page with next page, not delete it entirely ?

PS I usually lose the "is this a bug" debates & expect to lose this one also, but to me -
throwing away a valid character of a book's text during a (default options) conversion IS a BUG, especially when
1. it is removed from within a <DIV.... /DIV> yet the surrounding empty div styling is left in.
2 the (mobi to epub) conversion used by calibre viewer behaves differently and does NOT do that.

02-02-2012, 04:25 AM	#1
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	possible mobi to epub error - lost chapter numbers I have a retail mobi copy of Phil Rickman's The Lamp of the wicked when I view mobi, in KIndle for PC or incalibre viewer, chapter numbers are all OK but when I convert mobi to epub & view the outrput I am losing chapter numbers 1 through 9, in sigil i see code like Code: <body class="calibre"> <div class="sgc-1" id="filepos22360"></div> <h1 class="calibre1" id="calibre_pb_16">Foul Water</h1> what i should be seeing is Code: <div class="sgc-1" id="filepos22360">1</div> for all the 2 digit chapter numbers I do see a 2 digit number in that div line UPDATE EDIT: bug report now filed, with mobi attached ( no DRM) it is possible the the free sample from amazon will exhibit the same behaviour though. does anyone have a hypotheses which could explain the lost numbers. NB converting with heuristics OFF, no strange settings UPDATE - looking harder - I also see this, on each of the problem chapter pages: div.sgc-1 {height:0pt} what puts that there ? also from chapter 10 onwards the epub conversion puts each chapter number into its own xhtml file, followed by an xhtml which contains the chapter header text & the chapter text. see example below don't know how to check if that is in the original mobi ? Code: </head> <body class="calibre"> <h1 class="calibre1" id="filepos212373">10</h1> </body> </html> Last edited by cybmole; 02-02-2012 at 05:09 AM.

02-02-2012, 06:09 AM	#2
cybmole Wizard Posts: 3,720 Karma: 1759970 Join Date: Sep 2010 Device: none	I got Kovid's fast bug response: "That MOBI file uses two separate <h1> tags for the chapter number and the chapter. calibre inserts page breaks on <h1> tags and removes pages with too little content (the pages with numbers 1-9). Turn off the insert page break before setting. status invalid" I get that but don't understand why chapters 10 to end are not treated the same way ? is the definition of "too little content" equal to one cahracter but not 2 or more characters ? & is it sensible to delete even 1 character of text during a conversion. the conversion program does not know the possible significance of that 1 character ? I'm not comfortable with a conversion program making hidden decisions to remove ANY text, surely it would be better to merge any short page with next page, not delete it entirely ? PS I usually lose the "is this a bug" debates & expect to lose this one also, but to me - throwing away a valid character of a book's text during a (default options) conversion IS a BUG, especially when 1. it is removed from within a <DIV.... /DIV> yet the surrounding empty div styling is left in. 2 the (mobi to epub) conversion used by calibre viewer behaves differently and does NOT do that. Last edited by cybmole; 02-02-2012 at 06:15 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
images lost converting epub to mobi	fwiginton	Calibre	0	01-21-2012 12:21 PM
Converting EPUB to MOBI - missing chapter markers	peartree	Amazon Kindle	10	04-01-2011 06:02 PM
Epub to Mobi Chapter Detection	ice2097	Calibre	4	12-29-2010 02:14 AM
Epub to Mobi Conversion - Designating Chapter Starts?	CAJensen01	ePub	18	09-29-2010 12:46 PM
To MOBI, Chapter detection fails? Works for EPUB	Fmstrat	Calibre	7	08-29-2010 05:37 PM

Advert