Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 07-31-2011, 05:00 PM   #46
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Never mind, I see my algorithm for interpreting the extra data flags was incorrect, it just happened to work when the value was all 111s (which is true for kindlegen 1.2 and amazon periodical files, but not kindlegen1.1. I'll fix it in a bit.
kovidgoyal is offline   Reply With Quote
Old 07-31-2011, 05:14 PM   #47
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by kovidgoyal View Post
That's because the extra data flags in your mobi are incorrect. They should be:

0b11 (assuming the only trailing data is multibyte overlap and indexing)

Instead, they are

0b1011

This causes the reading of the trailing data to be incorrect.
The trailing data flags (0b1011 = 0xB) were generated by Kindlegen 1.1 and the trailing data flags (0b11 = 0x3) were generated by Kindlegen 1.2, so I don't understand how either can be "incorrect"--they are what they are.

The original mobiunpack code interpreted trailing data flags with more than one bit set above the lowest bit as indicating more than one TBS, but this didn't seem to be the correct interpretation in the case of Kindlegen and Amazon-generated documents (the HTML content goes right up to the one and only TBS), so I fixed the number of TBS at one regardless of the traing data flags.

By the way, the last byte in a TBS is of the form 0x8n where n is the number of bytes in the TBS. If you look at the raw MOBI files you'll see that each HTML record is followed by exactly one TBS.

EDIT: You posted the last while I was typing
nickredding is offline   Reply With Quote
Old 07-31-2011, 05:17 PM   #48
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Fix committed. Note that I have no idea how to interpret the trailing data bytes corresponding to the 0b1000 flag bit.
kovidgoyal is offline   Reply With Quote
Old 07-31-2011, 06:33 PM   #49
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
FYI. From what I've seen, this is how the trailing data flags are to be interpreted

lowest bit (0b1) = multibyte overlap characters of the form

<number of characters><char1><char2><char3>

where number of characters is encoded in the bottom two bits of a single byte (there can never be more than three overlap multibyte characters, since in UTF-8 the maximum character size is 4 bytes)

0b10 - Indexing trailing bytes
0b100 - Uncrossable breaks
Higher bits unknown

The trailing data for each bit >= 0b10 is encoded in the form

<data><size>

where size is a backward encoded vwi and gives the full size of that entry including the bytes to encode the size itself.

The trailing data at the end of the record is of the form:

<multibyte><entry1><entry2>...

Each entryN corresponds to a set bit in the extra data flags. The lowest order bit is the outermost entry and so on.

If you are in doubt about whether you got the trailing data correct, you should check that the size of the text after decompressing == the declared text size in the header (which is always 4096 bytes in all the MOBI files I've seen).

My inspect MOBI tool puts the bytes from each text record in the text/ sub directory.
kovidgoyal is offline   Reply With Quote
Old 07-31-2011, 07:49 PM   #50
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Kovid--Thanks for the explanation, and the debug code seems to be working now so I'll have another look at it tomorrow.
nickredding is offline   Reply With Quote
Old 08-01-2011, 10:16 PM   #51
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by nickredding View Post
Kovid--Thanks for the explanation, and the debug code seems to be working now so I'll have another look at it tomorrow.
I have abandoned this now. In spite of my modified MOBI writer.py generating a file that is virtually identical to that generated by Kindlegen v1.1 in terms of indexing and TBS, it still doesn't work. I'm quite happy with my external solution using ebook-convert-->OEB-->Kindlegen and I'll stick with that unless someone comes up with a miraculous cure.
nickredding is offline   Reply With Quote
Old 08-02-2011, 08:02 AM   #52
tylau0
Connoisseur
tylau0 began at the beginning.
 
Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
I was wondering if you could post your modified MOBI writer.py here. This would definitely help others who'd like to follow up this project.

Thanks.

Quote:
Originally Posted by nickredding View Post
I have abandoned this now. In spite of my modified MOBI writer.py generating a file that is virtually identical to that generated by Kindlegen v1.1 in terms of indexing and TBS, it still doesn't work. I'm quite happy with my external solution using ebook-convert-->OEB-->Kindlegen and I'll stick with that unless someone comes up with a miraculous cure.
tylau0 is offline   Reply With Quote
Old 08-02-2011, 11:20 AM   #53
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by tylau0 View Post
I was wondering if you could post your modified MOBI writer.py here. This would definitely help others who'd like to follow up this project.

Thanks.
Attached
Attached Files
File Type: zip writer.zip (24.6 KB, 406 views)
nickredding is offline   Reply With Quote
Old 08-04-2011, 01:14 PM   #54
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just to note, I've updated the new mobi writer code (in calibre.ebooks.mobi.writer2) to generate amazon style periodicals (with support for masthead, author and description metadata). They still exhibit the same problem as before, but if you are working on this further, I suggest you use the code in writer2 it's much better than the code in writer.py.
kovidgoyal is offline   Reply With Quote
Old 08-14-2011, 08:10 PM   #55
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by nickredding View Post
I have abandoned this now ...
Like a bug attracted to the light I went back to this yet again and I have found a solution. I had already discovered that Amazon uses two slightly different formats--one generated by Kindlegen and one generated by the Amazon periodical platform that is used for subscriptions. They are different (but they both work properly on the K3 Sections and Articles view). I have found the key aspects of the Kindlegen format and I can reproduce it using the Calibre MOBI output code. It turns out that (for Kindlegen-generated documents) the Kindle expects some TOC markup within the HTML as well as the binary table of contents information in the header. I'm going to have to look at Kovid's new MOBI writer code to see how the Kindle requirements can be accomodated without breaking any other MOBI readers.
nickredding is offline   Reply With Quote
Old 08-14-2011, 09:10 PM   #56
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is already code in the new MOBI writer that outputs <a /><a /> sequences at the start of each section/article. See the find_blocks and serialize_item methods in serializer.py
kovidgoyal is offline   Reply With Quote
Old 08-14-2011, 09:19 PM   #57
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Those empty anchors are redundant. I'm talking about actual full periodical and section TOCs in the markup.
nickredding is offline   Reply With Quote
Old 08-14-2011, 09:23 PM   #58
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Ah then you want to look at the HTMLTOCAdder class. You could probably write a periodical specific replacement for it.
kovidgoyal is offline   Reply With Quote
Old 08-14-2011, 09:54 PM   #59
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
OK thanks I'll look at that. One of the issues I have to deal with is I've made a lot of changes to the original MOBI writer.py file and I'm not sure which ones are actually relevant. I know for sure that the TOC markup HTML is critical (it was the last change I made and it provided the desired result) but I'll have to back out the other changes and see which ones have an effect. That, combined with the significant changes you've made for the new MOBI code, means there is still a lot of work to do, but at least I know there's a solution at the end of that.
nickredding is offline   Reply With Quote
Old 08-14-2011, 10:36 PM   #60
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I suggest starting by adding the TOC markup to the new writer. If that doesn't do it, then you can start working backwards on the rest of the changes.
kovidgoyal is offline   Reply With Quote
Reply

Tags
issue fix, kindle, kindlegen, periodical

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
conversion to azw? grapho Conversion 6 01-30-2011 10:01 AM
AZW to EPUB conversion - overlapping letters suecsi Calibre 4 10-16-2010 11:53 PM
PDF to prc/azw Batch Conversion xsolitudex PDF 2 09-04-2010 10:19 AM
PDF -> AZW conversion, weird character spacing beacher Amazon Kindle 7 08-17-2010 09:54 PM
AZW Conversion elliskatz Introduce Yourself 7 08-14-2010 05:47 AM


All times are GMT -4. The time now is 03:09 AM.


MobileRead.com is a privately owned, operated and funded community.