MobileRead Forums - View Single Post

willemml · 08-26-2023, 05:48 PM

Quote:

Originally Posted by jhowell

The notebook folder name is based on the metadata of the KFX book being annotated. It is composed of the content_id, cde_content_type, and the string "notebook"; all separated by "!!". For example "EBEA035E6DB444159EF42DA7E5EEF8F6!!PDOC!!notebook" .

This much I had mostly figured out myself, but thank you for confirming and giving me the correct names for each part.

Quote:

Originally Posted by jhowell

The EPUB produced by KFX Input from a Scribe annotation notebook contains one XHTML file per annotation, each linking to an SVG image. The connection between a book page (in KFX format produced from PDF) and the associated annotation notebook page is provided by a file with the extension .yjr found in the .sdr folder associated with the KFX book. That file can be converted to JSON using KRDS - A parser for Kindle reader data store files.

Each annotated page will have an entry such as:

Code:

    "annotation.cache.object": {
        "annotation.personal.handwritten_note": [
            {
                "startPosition": "201.0:13974",
                "endPosition": "201.0:13974",
                "creationTime": "2023-08-26T09:09:38.130000",
                "lastModificationTime": "2023-08-26T09:09:38.130000",
                "template": "0\ufffc0",
                "handwritten_note_nbk_ref": "crEq-GhRTSa63nk5j3KC6Qw0"
            }
        ]
    },

The startPosition is a KFX position number that corresponds to the book page being annotated. The page number can be found by looking up the part of the position number following the colon in a content JSON file that can be optionally produced by the CLI of the KFX Input plugin. (The number will match a type 2 entry. Count type 2 entries in the file to find the page number.)

Ah, wonderful, this is exactly what I was looking for. Thank you. Good to know that pages are type 2 entries, I had vaguely determined this already from the JSON output of this plugin, but was not sure. Do you know what all the content types are? I have so far only come across content type 2, but I guess that is because currently all the files I am examining are from print replica books created from PDF files.

Quote:

Originally Posted by jhowell

The handwritten_note_nbk_ref is the KFX section ID of the associated annotation page in the notebook. Currently those IDs are not reflected in the EPUB generated by the KFX Input plugin for an annotation notebook. I will update the plugin to include this data in the EPUB so that these can be matched.

Thank you, that will help a lot with my scripting.

Quote:

Originally Posted by jhowell

The margins of the PDF page may be been trimmed during conversion to KFX format for delivery to the Scribe. Also the SVG produced will have the aspect ratio of the Scribe screen which might not match the PDF page. Because of this some image manipulation may be needed to properly overlay the SVG image onto the original PDF page.

If I were to extract the PDF from the KFX files I am generating I assume that would save me from dealing with trimming the PDFs for correct alignment? And is there a point I can align the SVG to (say top left or right or similar) of the PDF with an offset to account for aspect ratio change?

In the mean time I am trying to write a program that can convert from PDFs to write-on-able KFX files without going through the Kindle Create software (which for now means I am trying to create my own KPF files from scratch that contain the metadata to correctly map PDF pages) so that I can put them through the KFX Output plugin (which I realize still relies on Kindle Previewer for conversion, but I eventually want to write my own KPF to KFX converter as well.) I will be publishing all code on GitHub as soon as I have something that works a little bit. Is there any documentation outside of this thread and the code of the KFX plugins you have (which is the closest I could find to documentation on the KFX and KPF formats other than a rough overview) that could help me?