View Single Post
Old 05-02-2020, 07:39 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by willus View Post
If all you need to do is add a TOC/bookmarks, Coherent PDF (cpdf) is a better tool than k2pdfopt.


It's been at least 7 years since I had to add bookmarks to existing PDFs, so my info was probably a little out of date.

Quote:
Originally Posted by Hitch View Post
Yeah, but AFAIK, this poster is downloading websites and converting them to PDFs for readability on his Kindle. I'm not sure how he can use k2pdfopt to create a TOC like that. But, hey, it'll be interesting to know how it goes!
Hmmm... wonder if you could create a completed EPUB, generate a proper TOC, then pull the toc.ncx and get rid of all the xml cruft.

For example, Sigil formats the toc.ncx like:

Spoiler:
Code:
  <navMap>
    <navPoint id="navPoint-1" playOrder="1">
      <navLabel>
        <text>Part 1</text>
      </navLabel>
      <content src="Text/Section0001.xhtml#sigil_toc_id_1"/>
      <navPoint id="navPoint-2" playOrder="2">
        <navLabel>
          <text>Chapter 1</text>
        </navLabel>
        <content src="Text/Section0001.xhtml#sigil_toc_id_2"/>
      </navPoint>
    </navPoint>
    <navPoint id="navPoint-3" playOrder="3">
      <navLabel>
        <text>Part 2</text>
      </navLabel>
      <content src="Text/Section0001.xhtml#sigil_toc_id_3"/>
    </navPoint>
  </navMap>


Everything is already nested/indented in a certain way.

So you strip everything besides <text> and playOrder:

Code:
    playOrder="1"
        <text>Part 1</text>
      playOrder="2"
          <text>Chapter 1</text>
    playOrder="3"
        <text>Part 2</text>
Search and replace the Parts:

Search: ^\s+playOrder="(\d+)"\r\n[ ]{8}<text>(.+)</text>
Replace: 0 "\2" \1

Code:
0 "Part 1" 1
      playOrder="2"
          <text>Chapter 1</text>
0 "Part 2" 3
Key points being:
  • Blue grabs the chronological numbering of all the headings.
  • Red grabs how deep the levels are.
    • Note that Sigil's toc.ncx formats using 8 spaces for the first level (in PDF, the first level is considered "0").
  • Green grabs the actual chapter titles.

Then you adjust the red part for "10 spaces" = next level:

Search: ^\s+playOrder="(\d+)"\r\n[ ]{10}<text>(.+)</text>
Replace: 1 "\2" \1

Code:
0 "Part 1" 1
1 "Chapter 1" 2
0 "Part 2" 3
then you would just have to go through and manually change the (blue) numbers to match the PDFs pages... but at least the bulk of the formatting would be completed.

Last edited by Tex2002ans; 05-02-2020 at 08:22 PM.
Tex2002ans is offline   Reply With Quote