Quote:
Originally Posted by j.p.s
I think it was informative, and I am getting tired of people effectively disparaging the dissemination of information.
I do have some questions. Over time you have frequently mentioned PDF bookmarks.
Is that some internal Adobe term?
Are they somehow different from TOC entries? (you invoke functions get_toc and set_toc)
(I know that epub, amazon, and pdf have provisions for a "table of contents" that can be used by a reading device that is not displayed while paging through a document. Some documents have that or a TOC that is an inline part of the document or both)
If the above paragraph is not clear, please ask for clarification, or just try to sus it out, rather than attacking some nomenclatural deficiency. (Supplying corrections is of course welcome.)
|
Acrobat calls them
Bookmarks (as do most PDF viewers/editors).
PyMuPDF's '_toc()' functions only work with PDF, but manipulate what Adobe calls Bookmarks.
PyMuPDF's '_bookmark()' functions are only for reflowable documents (ePub, mobi, FB2, CBZ, SVG, XPS).
I haven't used it for anything but PDF so far. It looks like it has some interesting capabilities for the other document types, and has an OCR API to integrate with
Tesseract.
https://pymupdf.readthedocs.io/en/latest/about.html
I encounter many PDFs which lack any Bookmarks whatsoever, and/or have an inline Table of Contents that doesn't have page links to the respective chapters (to say nothing of linking footnotes or index entries).
I want to send these to my Scribe (and as of recently, the Kindle apps) and at least want to have a functional ToC in the converted Print Replica document (the default ToC is useless).
So next on my task list is to scan a PDF for chapter/section headings, and create Bookmarks that will get converted to the Print Replica ToC.