![]() |
#46 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Never mind, I see my algorithm for interpreting the extra data flags was incorrect, it just happened to work when the value was all 111s (which is true for kindlegen 1.2 and amazon periodical files, but not kindlegen1.1. I'll fix it in a bit.
|
![]() |
![]() |
![]() |
#47 | |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Quote:
The original mobiunpack code interpreted trailing data flags with more than one bit set above the lowest bit as indicating more than one TBS, but this didn't seem to be the correct interpretation in the case of Kindlegen and Amazon-generated documents (the HTML content goes right up to the one and only TBS), so I fixed the number of TBS at one regardless of the traing data flags. By the way, the last byte in a TBS is of the form 0x8n where n is the number of bytes in the TBS. If you look at the raw MOBI files you'll see that each HTML record is followed by exactly one TBS. EDIT: You posted the last while I was typing |
|
![]() |
![]() |
Advert | |
|
![]() |
#48 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fix committed. Note that I have no idea how to interpret the trailing data bytes corresponding to the 0b1000 flag bit.
|
![]() |
![]() |
![]() |
#49 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
FYI. From what I've seen, this is how the trailing data flags are to be interpreted
lowest bit (0b1) = multibyte overlap characters of the form <number of characters><char1><char2><char3> where number of characters is encoded in the bottom two bits of a single byte (there can never be more than three overlap multibyte characters, since in UTF-8 the maximum character size is 4 bytes) 0b10 - Indexing trailing bytes 0b100 - Uncrossable breaks Higher bits unknown The trailing data for each bit >= 0b10 is encoded in the form <data><size> where size is a backward encoded vwi and gives the full size of that entry including the bytes to encode the size itself. The trailing data at the end of the record is of the form: <multibyte><entry1><entry2>... Each entryN corresponds to a set bit in the extra data flags. The lowest order bit is the outermost entry and so on. If you are in doubt about whether you got the trailing data correct, you should check that the size of the text after decompressing == the declared text size in the header (which is always 4096 bytes in all the MOBI files I've seen). My inspect MOBI tool puts the bytes from each text record in the text/ sub directory. |
![]() |
![]() |
![]() |
#50 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Kovid--Thanks for the explanation, and the debug code seems to be working now so I'll have another look at it tomorrow.
|
![]() |
![]() |
Advert | |
|
![]() |
#51 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
I have abandoned this now. In spite of my modified MOBI writer.py generating a file that is virtually identical to that generated by Kindlegen v1.1 in terms of indexing and TBS, it still doesn't work. I'm quite happy with my external solution using ebook-convert-->OEB-->Kindlegen and I'll stick with that unless someone comes up with a miraculous cure.
|
![]() |
![]() |
![]() |
#52 | |
Connoisseur
![]() Posts: 82
Karma: 10
Join Date: Oct 2010
Device: Kindle
|
I was wondering if you could post your modified MOBI writer.py here. This would definitely help others who'd like to follow up this project.
Thanks. Quote:
|
|
![]() |
![]() |
![]() |
#53 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Attached
|
![]() |
![]() |
![]() |
#54 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just to note, I've updated the new mobi writer code (in calibre.ebooks.mobi.writer2) to generate amazon style periodicals (with support for masthead, author and description metadata). They still exhibit the same problem as before, but if you are working on this further, I suggest you use the code in writer2 it's much better than the code in writer.py.
|
![]() |
![]() |
![]() |
#55 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Like a bug attracted to the light I went back to this yet again and I have found a solution. I had already discovered that Amazon uses two slightly different formats--one generated by Kindlegen and one generated by the Amazon periodical platform that is used for subscriptions. They are different (but they both work properly on the K3 Sections and Articles view). I have found the key aspects of the Kindlegen format and I can reproduce it using the Calibre MOBI output code. It turns out that (for Kindlegen-generated documents) the Kindle expects some TOC markup within the HTML as well as the binary table of contents information in the header. I'm going to have to look at Kovid's new MOBI writer code to see how the Kindle requirements can be accomodated without breaking any other MOBI readers.
|
![]() |
![]() |
![]() |
#56 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is already code in the new MOBI writer that outputs <a /><a /> sequences at the start of each section/article. See the find_blocks and serialize_item methods in serializer.py
|
![]() |
![]() |
![]() |
#57 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
Those empty anchors are redundant. I'm talking about actual full periodical and section TOCs in the markup.
|
![]() |
![]() |
![]() |
#58 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Ah then you want to look at the HTMLTOCAdder class. You could probably write a periodical specific replacement for it.
|
![]() |
![]() |
![]() |
#59 |
onlinenewsreader.net
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 324
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
|
OK thanks I'll look at that. One of the issues I have to deal with is I've made a lot of changes to the original MOBI writer.py file and I'm not sure which ones are actually relevant. I know for sure that the TOC markup HTML is critical (it was the last change I made and it provided the desired result) but I'll have to back out the other changes and see which ones have an effect. That, combined with the significant changes you've made for the new MOBI code, means there is still a lot of work to do, but at least I know there's a solution at the end of that.
|
![]() |
![]() |
![]() |
#60 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I suggest starting by adding the TOC markup to the new writer. If that doesn't do it, then you can start working backwards on the rest of the changes.
|
![]() |
![]() |
![]() |
Tags |
issue fix, kindle, kindlegen, periodical |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion to azw? | grapho | Conversion | 6 | 01-30-2011 10:01 AM |
AZW to EPUB conversion - overlapping letters | suecsi | Calibre | 4 | 10-16-2010 11:53 PM |
PDF to prc/azw Batch Conversion | xsolitudex | 2 | 09-04-2010 10:19 AM | |
PDF -> AZW conversion, weird character spacing | beacher | Amazon Kindle | 7 | 08-17-2010 09:54 PM |
AZW Conversion | elliskatz | Introduce Yourself | 7 | 08-14-2010 05:47 AM |